Skip to content
Snippets Groups Projects
Commit ff7cc45f authored by Rajesh Balamohan's avatar Rajesh Balamohan Committed by Josh Rosen
Browse files

[SPARK-14091][CORE] Improve performance of SparkContext.getCallSite()

Currently SparkContext.getCallSite() makes a call to Utils.getCallSite().

```
 private[spark] def getCallSite(): CallSite = {
    val callSite = Utils.getCallSite()
    CallSite(
      Option(getLocalProperty(CallSite.SHORT_FORM)).getOrElse(callSite.shortForm),
      Option(getLocalProperty(CallSite.LONG_FORM)).getOrElse(callSite.longForm)
    )
  }
```
However, in some places utils.withDummyCallSite(sc) is invoked to avoid expensive threaddumps within getCallSite(). But Utils.getCallSite() is evaluated earlier causing threaddumps to be computed.

This can have severe impact on smaller queries (that finish in 10-20 seconds) having large number of RDDs.

Creating this patch for lazy evaluation of  getCallSite.

No new test cases are added. Following standalone test was tried out manually. Also, built entire spark binary and tried with few SQL queries in TPC-DS  and TPC-H in multi node cluster
```
def run(): Unit = {
    val conf = new SparkConf()
    val sc = new SparkContext("local[1]", "test-context", conf)
    val start: Long = System.currentTimeMillis();
    val confBroadcast = sc.broadcast(new SerializableConfiguration(new Configuration()))
    Utils.withDummyCallSite(sc) {
      //Large tables end up creating 5500 RDDs
      for(i <- 1 to 5000) {
       //ignore nulls in RDD as its mainly for testing callSite
        val testRDD = new HadoopRDD(sc, confBroadcast, None, null,
          classOf[NullWritable], classOf[Writable], 10)
      }
    }
    val end: Long = System.currentTimeMillis();
    println("Time taken : " + (end - start))
  }

def main(args: Array[String]): Unit = {
    run
  }
```

Author: Rajesh Balamohan <rbalamohan@apache.org>

Closes #11911 from rajeshbalamohan/SPARK-14091.
parent b554b3c4
No related branches found
No related tags found
No related merge requests found
......@@ -1737,7 +1737,7 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli
* has overridden the call site using `setCallSite()`, this will return the user's version.
*/
private[spark] def getCallSite(): CallSite = {
val callSite = Utils.getCallSite()
lazy val callSite = Utils.getCallSite()
CallSite(
Option(getLocalProperty(CallSite.SHORT_FORM)).getOrElse(callSite.shortForm),
Option(getLocalProperty(CallSite.LONG_FORM)).getOrElse(callSite.longForm)
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment