Skip to content
Snippets Groups Projects
Commit 7cb4d74c authored by Carson Wang's avatar Carson Wang Committed by Reynold Xin
Browse files

[SPARK-13185][SQL] Reuse Calendar object in DateTimeUtils.StringToDate method...

[SPARK-13185][SQL] Reuse Calendar object in DateTimeUtils.StringToDate method to improve performance

The java `Calendar` object is expensive to create. I have a sub query like this `SELECT a, b, c FROM table UV WHERE (datediff(UV.visitDate, '1997-01-01')>=0 AND datediff(UV.visitDate, '2015-01-01')<=0))`

The table stores `visitDate` as String type and has 3 billion records. A `Calendar` object is created every time `DateTimeUtils.stringToDate` is called. By reusing the `Calendar` object, I saw about 20 seconds performance improvement for this stage.

Author: Carson Wang <carson.wang@intel.com>

Closes #11090 from carsonwang/SPARK-13185.
parent 22e9723d
No related branches found
No related tags found
No related merge requests found
......@@ -59,6 +59,13 @@ object DateTimeUtils {
@transient lazy val defaultTimeZone = TimeZone.getDefault
// Reuse the Calendar object in each thread as it is expensive to create in each method call.
private val threadLocalGmtCalendar = new ThreadLocal[Calendar] {
override protected def initialValue: Calendar = {
Calendar.getInstance(TimeZoneGMT)
}
}
// Java TimeZone has no mention of thread safety. Use thread local instance to be safe.
private val threadLocalLocalTimeZone = new ThreadLocal[TimeZone] {
override protected def initialValue: TimeZone = {
......@@ -408,7 +415,8 @@ object DateTimeUtils {
segments(2) < 1 || segments(2) > 31) {
return None
}
val c = Calendar.getInstance(TimeZoneGMT)
val c = threadLocalGmtCalendar.get()
c.clear()
c.set(segments(0), segments(1) - 1, segments(2), 0, 0, 0)
c.set(Calendar.MILLISECOND, 0)
Some((c.getTimeInMillis / MILLIS_PER_DAY).toInt)
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment