Skip to content
Snippets Groups Projects
  1. May 21, 2015
    • Hari Shreedharan's avatar
      [SPARK-7657] [YARN] Add driver logs links in application UI, in cluster mode. · 956c4c91
      Hari Shreedharan authored
      This PR adds the URLs to the driver logs to `SparkListenerApplicationStarted` event, which is later used by the `ExecutorsListener` to populate the URLs to the driver logs in its own state. This info is then used when the UI is rendered to display links to the logs.
      
      Author: Hari Shreedharan <hshreedharan@apache.org>
      
      Closes #6166 from harishreedharan/am-log-link and squashes the following commits:
      
      943fc4f [Hari Shreedharan] Merge remote-tracking branch 'asf/master' into am-log-link
      9e5c04b [Hari Shreedharan] Merge remote-tracking branch 'asf/master' into am-log-link
      b3f9b9d [Hari Shreedharan] Updated comment based on feedback.
      0840a95 [Hari Shreedharan] Move the result and sc.stop back to original location, minor import changes.
      537a2f7 [Hari Shreedharan] Add test to ensure the log urls are populated and valid.
      4033725 [Hari Shreedharan] Adding comments explaining how node reports are used to get the log urls.
      6c5c285 [Hari Shreedharan] Import order.
      346f4ea [Hari Shreedharan] Review feedback fixes.
      629c1dc [Hari Shreedharan] Cleanup.
      99fb1a3 [Hari Shreedharan] Send the log urls in App start event, to ensure that other listeners are not affected.
      c0de336 [Hari Shreedharan] Ensure new unit test cleans up after itself.
      50cdae3 [Hari Shreedharan] Added unit test, made the approach generic.
      402e8e4 [Hari Shreedharan] Use `NodeReport` to get the URL for the logs. Also, make the environment variables generic so other cluster managers can use them as well.
      1cf338f [Hari Shreedharan] [SPARK-7657][YARN] Add driver link in application UI, in cluster mode.
      956c4c91
    • Andrew Or's avatar
      [SPARK-7718] [SQL] Speed up partitioning by avoiding closure cleaning · 5287eec5
      Andrew Or authored
      According to yhuai we spent 6-7 seconds cleaning closures in a partitioning job that takes 12 seconds. Since we provide these closures in Spark we know for sure they are serializable, so we can bypass the cleaning.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #6256 from andrewor14/sql-partition-speed-up and squashes the following commits:
      
      a82b451 [Andrew Or] Fix style
      10f7e3e [Andrew Or] Avoid getting call sites and cleaning closures
      17e2943 [Andrew Or] Merge branch 'master' of github.com:apache/spark into sql-partition-speed-up
      523f042 [Andrew Or] Skip unnecessary Utils.getCallSites too
      f7fe143 [Andrew Or] Avoid unnecessary closure cleaning
      5287eec5
    • Sean Owen's avatar
      [SPARK-6416] [DOCS] RDD.fold() requires the operator to be commutative · 6e534026
      Sean Owen authored
      Document current limitation of rdd.fold.
      
      This does not resolve SPARK-6416 but just documents the issue.
      CC JoshRosen
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #6231 from srowen/SPARK-6416 and squashes the following commits:
      
      9fef39f [Sean Owen] Add comment to other languages; reword to highlight the difference from non-distributed collections and to not suggest it is a bug that is to be fixed
      da40d84 [Sean Owen] Document current limitation of rdd.fold.
      6e534026
    • Mingfei's avatar
      [SPARK-7389] [CORE] Tachyon integration improvement · 04940c49
      Mingfei authored
      Two main changes:
      
      Add two functions in ExternalBlockManager, which are putValues and getValues
      because the implementation may not rely on the putBytes and getBytes
      
      improve Tachyon integration.
      Currently, when putting data into Tachyon, Spark first serialize all data in one partition into a ByteBuffer, and then write into Tachyon, this will uses much memory and increase GC overhead
      
      when get data from Tachyon, getValues depends on getBytes, which also read all data into On heap byte arry, and result in much memory usage.
      This PR changes the approach of the two functions, make them read / write data by stream to reduce memory usage.
      
      In our testing,  when data size is huge, this patch reduces about 30% GC time and 70% full GC time, and total execution time reduces about 10%
      
      Author: Mingfei <mingfei.shi@intel.com>
      
      Closes #5908 from shimingfei/Tachyon-integration-rebase and squashes the following commits:
      
      033bc57 [Mingfei] modify accroding to comments
      747c69a [Mingfei] modify according to comments - format changes
      ce52c67 [Mingfei] put close() in a finally block
      d2c60bb [Mingfei] modify according to comments, some code style change
      4c11591 [Mingfei] modify according to comments split putIntoExternalBlockStore into two functions add default implementation for getValues and putValues
      cc0a32e [Mingfei] Make getValues read data from Tachyon by stream Make putValues write data to Tachyon by stream
      017593d [Mingfei] add getValues and putValues in ExternalBlockManager's Interface
      04940c49
  2. May 20, 2015
    • Hari Shreedharan's avatar
      [SPARK-7750] [WEBUI] Rename endpoints from `json` to `api` to allow fu… · a70bf06b
      Hari Shreedharan authored
      …rther extension to non-json outputs too.
      
      Author: Hari Shreedharan <hshreedharan@apache.org>
      
      Closes #6273 from harishreedharan/json-to-api and squashes the following commits:
      
      e14b73b [Hari Shreedharan] Rename `getJsonServlet` to `getServletHandler` i
      42f8acb [Hari Shreedharan] Import order fixes.
      2ef852f [Hari Shreedharan] [SPARK-7750][WebUI] Rename endpoints from `json` to `api` to allow further extension to non-json outputs too.
      a70bf06b
    • Josh Rosen's avatar
      [SPARK-7719] Re-add UnsafeShuffleWriterSuite test that was removed for Java 6 compat · 5196efff
      Josh Rosen authored
      This patch re-adds a test which was removed in 9ebb44f8 due to a Java 6 compatibility issue.  We now use Guava's `Iterators.emptyIterator()` in place of `Collections.emptyIterator()`, which isn't present in all Java 6 versions.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #6298 from JoshRosen/SPARK-7719-fix-java-6-test-code and squashes the following commits:
      
      5c9bd85 [Josh Rosen] Re-add UnsafeShuffleWriterSuite.emptyIterator() test which was removed due to Java 6 issue
      5196efff
    • Tathagata Das's avatar
      [SPARK-7767] [STREAMING] Added test for checkpoint serialization in StreamingContext.start() · 3c434cbf
      Tathagata Das authored
      Currently, the background checkpointing thread fails silently if the checkpoint is not serializable. It is hard to debug and therefore its best to fail fast at `start()` when checkpointing is enabled and the checkpoint is not serializable.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #6292 from tdas/SPARK-7767 and squashes the following commits:
      
      51304e6 [Tathagata Das] Addressed comments.
      c35237b [Tathagata Das] Added test for checkpoint serialization in StreamingContext.start()
      3c434cbf
    • Andrew Or's avatar
      [SPARK-7237] [SPARK-7741] [CORE] [STREAMING] Clean more closures that need cleaning · 9b84443d
      Andrew Or authored
      SPARK-7741 is the equivalent of SPARK-7237 in streaming. This is an alternative to #6268.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #6269 from andrewor14/clean-moar and squashes the following commits:
      
      c51c9ab [Andrew Or] Add periods (trivial)
      6c686ac [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-moar
      79a435b [Andrew Or] Fix tests
      d18c9f9 [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-moar
      65ef07b [Andrew Or] Fix tests?
      4b487a3 [Andrew Or] Add tests for closures passed to DStream operations
      328139b [Andrew Or] Do not forget foreachRDD
      5431f61 [Andrew Or] Clean streaming closures
      72b7b73 [Andrew Or] Clean core closures
      9b84443d
  3. May 19, 2015
    • Davies Liu's avatar
      [SPARK-7738] [SQL] [PySpark] add reader and writer API in Python · 4de74d26
      Davies Liu authored
      cc rxin, please take a quick look, I'm working on tests.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6238 from davies/readwrite and squashes the following commits:
      
      c7200eb [Davies Liu] update tests
      9cbf01b [Davies Liu] Merge branch 'master' of github.com:apache/spark into readwrite
      f0c5a04 [Davies Liu] use sqlContext.read.load
      5f68bc8 [Davies Liu] update tests
      6437e9a [Davies Liu] Merge branch 'master' of github.com:apache/spark into readwrite
      bcc6668 [Davies Liu] add reader amd writer API in Python
      4de74d26
    • Patrick Wendell's avatar
      [HOTFIX]: Java 6 Build Breaks · 9ebb44f8
      Patrick Wendell authored
      These were blocking RC1 so I fixed them manually.
      9ebb44f8
  4. May 18, 2015
    • Daoyuan Wang's avatar
      [SPARK-7150] SparkContext.range() and SQLContext.range() · c2437de1
      Daoyuan Wang authored
      This PR is based on #6081, thanks adrian-wang.
      
      Closes #6081
      
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6230 from davies/range and squashes the following commits:
      
      d3ce5fe [Davies Liu] add tests
      789eda5 [Davies Liu] add range() in Python
      4590208 [Davies Liu] Merge commit 'refs/pull/6081/head' of github.com:apache/spark into range
      cbf5200 [Daoyuan Wang] let's add python support in a separate PR
      f45e3b2 [Daoyuan Wang] remove redundant toLong
      617da76 [Daoyuan Wang] fix safe marge for corner cases
      867c417 [Daoyuan Wang] fix
      13dbe84 [Daoyuan Wang] update
      bd998ba [Daoyuan Wang] update comments
      d3a0c1b [Daoyuan Wang] add range api()
      c2437de1
    • Davies Liu's avatar
      [SPARK-7624] Revert #4147 · 4fb52f95
      Davies Liu authored
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6172 from davies/revert_4147 and squashes the following commits:
      
      3bfbbde [Davies Liu] Revert #4147
      4fb52f95
    • Andrew Or's avatar
      [SPARK-7501] [STREAMING] DAG visualization: show DStream operations · b93c97d7
      Andrew Or authored
      This is similar to #5999, but for streaming. Roughly 200 lines are tests.
      
      One thing to note here is that we already do some kind of scoping thing for call sites, so this patch adds the new RDD operation scoping logic in the same place. Also, this patch adds a `try finally` block to set the relevant variables in a safer way.
      
      tdas zsxwing
      
      ------------------------
      **Before**
      <img src="https://cloud.githubusercontent.com/assets/2133137/7625996/d88211b8-f9b4-11e4-90b9-e11baa52d6d7.png" width="450px"/>
      
      --------------------------
      **After**
      <img src="https://cloud.githubusercontent.com/assets/2133137/7625997/e0878f8c-f9b4-11e4-8df3-7dd611b13c87.png" width="650px"/>
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #6034 from andrewor14/dag-viz-streaming and squashes the following commits:
      
      932a64a [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming
      e685df9 [Andrew Or] Rename createRDDWith
      84d0656 [Andrew Or] Review feedback
      697c086 [Andrew Or] Fix tests
      53b9936 [Andrew Or] Set scopes for foreachRDD properly
      1881802 [Andrew Or] Refactor DStream scope names again
      af4ba8d [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming
      fd07d22 [Andrew Or] Make MQTT lower case
      f6de871 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming
      0ca1801 [Andrew Or] Remove a few unnecessary withScopes on aliases
      fa4e5fb [Andrew Or] Pass in input stream name rather than defining it from within
      1af0b0e [Andrew Or] Fix style
      074c00b [Andrew Or] Review comments
      d25a324 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming
      e4a93ac [Andrew Or] Fix tests?
      25416dc [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming
      9113183 [Andrew Or] Add tests for DStream scopes
      b3806ab [Andrew Or] Fix test
      bb80bbb [Andrew Or] Fix MIMA?
      5c30360 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming
      5703939 [Andrew Or] Rename operations that create InputDStreams
      7c4513d [Andrew Or] Group RDDs by DStream operations and batches
      bf0ab6e [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming
      05c2676 [Andrew Or] Wrap many more methods in withScope
      c121047 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-streaming
      65ef3e9 [Andrew Or] Fix NPE
      a0d3263 [Andrew Or] Scope streaming operations instead of RDD operations
      b93c97d7
    • Davies Liu's avatar
      [SPARK-6216] [PYSPARK] check python version of worker with driver · 32fbd297
      Davies Liu authored
      This PR revert #5404, change to pass the version of python in driver into JVM, check it in worker before deserializing closure, then it can works with different major version of Python.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #6203 from davies/py_version and squashes the following commits:
      
      b8fb76e [Davies Liu] fix test
      6ce5096 [Davies Liu] use string for version
      47c6278 [Davies Liu] check python version of worker with driver
      32fbd297
    • Andrew Or's avatar
      [SPARK-7627] [SPARK-7472] DAG visualization: style skipped stages · 563bfcc1
      Andrew Or authored
      This patch fixes two things:
      
      **SPARK-7627.** Cached RDDs no longer light up on the job page. This is a simple fix.
      **SPARK-7472.** Display skipped stages differently from normal stages.
      
      The latter is a major UX issue. Because we link the job viz to the stage viz even for skipped stages, the user may inadvertently click into the stage page of a skipped stage, which is empty.
      
      -------------------
      <img src="https://cloud.githubusercontent.com/assets/2133137/7675241/de1a3da6-fcea-11e4-8101-88055cef78c5.png" width="300px" />
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #6171 from andrewor14/dag-viz-skipped and squashes the following commits:
      
      f261797 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-skipped
      0eda358 [Andrew Or] Tweak skipped stage border color
      c604150 [Andrew Or] Tweak grayscale colors
      7010676 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-skipped
      762b541 [Andrew Or] Use special prefix for stage clusters to avoid collisions
      51c95b9 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-skipped
      b928cd4 [Andrew Or] Fix potential leak + write tests for it
      7c4c364 [Andrew Or] Show skipped stages differently
      7cc34ce [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-skipped
      c121fa2 [Andrew Or] Fix cache color
      563bfcc1
  5. May 17, 2015
    • zsxwing's avatar
      [SPARK-7693][Core] Remove "import scala.concurrent.ExecutionContext.Implicits.global" · ff71d34e
      zsxwing authored
      Learnt a lesson from SPARK-7655: Spark should avoid to use `scala.concurrent.ExecutionContext.Implicits.global` because the user may submit blocking actions to `scala.concurrent.ExecutionContext.Implicits.global` and exhaust all threads in it. This could crash Spark. So Spark should always use its own thread pools for safety.
      
      This PR removes all usages of `scala.concurrent.ExecutionContext.Implicits.global` and uses proper thread pools to replace them.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6223 from zsxwing/SPARK-7693 and squashes the following commits:
      
      a33ff06 [zsxwing] Decrease the max thread number from 1024 to 128
      cf4b3fc [zsxwing] Remove "import scala.concurrent.ExecutionContext.Implicits.global"
      ff71d34e
    • Josh Rosen's avatar
      [SPARK-7660] Wrap SnappyOutputStream to work around snappy-java bug · f2cc6b5b
      Josh Rosen authored
      This patch wraps `SnappyOutputStream` to ensure that `close()` is idempotent and to guard against write-after-`close()` bugs. This is a workaround for https://github.com/xerial/snappy-java/issues/107, a bug where a non-idempotent `close()` method can lead to stream corruption. We can remove this workaround if we upgrade to a snappy-java version that contains my fix for this bug, but in the meantime this patch offers a backportable Spark fix.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #6176 from JoshRosen/SPARK-7660-wrap-snappy and squashes the following commits:
      
      8b77aae [Josh Rosen] Wrap SnappyOutputStream to fix SPARK-7660
      f2cc6b5b
  6. May 16, 2015
    • zsxwing's avatar
      [SPARK-7655][Core] Deserializing value should not hold the TaskSchedulerImpl lock · 3b6ef2c5
      zsxwing authored
      We should not call `DirectTaskResult.value` when holding the `TaskSchedulerImpl` lock. It may cost dozens of seconds to deserialize a large object.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6195 from zsxwing/SPARK-7655 and squashes the following commits:
      
      21f502e [zsxwing] Add more comments
      e25fa88 [zsxwing] Add comments
      15010b5 [zsxwing] Deserialize value should not hold the TaskSchedulerImpl lock
      3b6ef2c5
    • zsxwing's avatar
      [SPARK-7655][Core][SQL] Remove... · 47e7ffe3
      zsxwing authored
      [SPARK-7655][Core][SQL] Remove 'scala.concurrent.ExecutionContext.Implicits.global' in 'ask' and 'BroadcastHashJoin'
      
      Because both `AkkaRpcEndpointRef.ask` and `BroadcastHashJoin` uses `scala.concurrent.ExecutionContext.Implicits.global`. However, because the tasks in `BroadcastHashJoin` are usually long-running tasks, which will occupy all threads in `global`. Then `ask` cannot get a chance to process the replies.
      
      For `ask`, actually the tasks are very simple, so we can use `MoreExecutors.sameThreadExecutor()`. For `BroadcastHashJoin`, it's better to use `ThreadUtils.newDaemonCachedThreadPool`.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6200 from zsxwing/SPARK-7655-2 and squashes the following commits:
      
      cfdc605 [zsxwing] Remove redundant imort and minor doc fix
      cf83153 [zsxwing] Add "sameThread" and "newDaemonCachedThreadPool with maxThreadNumber" to ThreadUtils
      08ad0ee [zsxwing] Remove 'scala.concurrent.ExecutionContext.Implicits.global' in 'ask' and 'BroadcastHashJoin'
      47e7ffe3
    • Nishkam Ravi's avatar
      [SPARK-7672] [CORE] Use int conversion in translating kryoserializer.buffer.mb... · 0ac8b01a
      Nishkam Ravi authored
      [SPARK-7672] [CORE] Use int conversion in translating kryoserializer.buffer.mb to kryoserializer.buffer
      
      In translating spark.kryoserializer.buffer.mb to spark.kryoserializer.buffer, use of toDouble will lead to "Fractional values not supported" error even when spark.kryoserializer.buffer.mb is an integer.
      ilganeli, andrewor14
      
      Author: Nishkam Ravi <nravi@cloudera.com>
      Author: nishkamravi2 <nishkamravi@gmail.com>
      Author: nravi <nravi@c1704.halxg.cloudera.com>
      
      Closes #6198 from nishkamravi2/master_nravi and squashes the following commits:
      
      171a53c [nishkamravi2] Update SparkConfSuite.scala
      5261bf6 [Nishkam Ravi] Add a test for deprecated config spark.kryoserializer.buffer.mb
      5190f79 [Nishkam Ravi] In translating from deprecated spark.kryoserializer.buffer.mb to spark.kryoserializer.buffer use int conversion since fractions are not permissible
      059ce82 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      eaa13b5 [nishkamravi2] Update Client.scala
      981afd2 [Nishkam Ravi] Check for read permission before initiating copy
      1b81383 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      0f1abd0 [nishkamravi2] Update Utils.scala
      474e3bf [nishkamravi2] Update DiskBlockManager.scala
      97c383e [nishkamravi2] Update Utils.scala
      8691e0c [Nishkam Ravi] Add a try/catch block around Utils.removeShutdownHook
      2be1e76 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      1c13b79 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      bad4349 [nishkamravi2] Update Main.java
      36a6f87 [Nishkam Ravi] Minor changes and bug fixes
      b7f4ae7 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      4a45d6a [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      458af39 [Nishkam Ravi] Locate the jar using getLocation, obviates the need to pass assembly path as an argument
      d9658d6 [Nishkam Ravi] Changes for SPARK-6406
      ccdc334 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      3faa7a4 [Nishkam Ravi] Launcher library changes (SPARK-6406)
      345206a [Nishkam Ravi] spark-class merge Merge branch 'master_nravi' of https://github.com/nishkamravi2/spark into master_nravi
      ac58975 [Nishkam Ravi] spark-class changes
      06bfeb0 [nishkamravi2] Update spark-class
      35af990 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      32c3ab3 [nishkamravi2] Update AbstractCommandBuilder.java
      4bd4489 [nishkamravi2] Update AbstractCommandBuilder.java
      746f35b [Nishkam Ravi] "hadoop" string in the assembly name should not be mandatory (everywhere else in spark we mandate spark-assembly*hadoop*.jar)
      bfe96e0 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      ee902fa [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      d453197 [nishkamravi2] Update NewHadoopRDD.scala
      6f41a1d [nishkamravi2] Update NewHadoopRDD.scala
      0ce2c32 [nishkamravi2] Update HadoopRDD.scala
      f7e33c2 [Nishkam Ravi] Merge branch 'master_nravi' of https://github.com/nishkamravi2/spark into master_nravi
      ba1eb8b [Nishkam Ravi] Try-catch block around the two occurrences of removeShutDownHook. Deletion of semi-redundant occurrences of expensive operation inShutDown.
      71d0e17 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      494d8c0 [nishkamravi2] Update DiskBlockManager.scala
      3c5ddba [nishkamravi2] Update DiskBlockManager.scala
      f0d12de [Nishkam Ravi] Workaround for IllegalStateException caused by recent changes to BlockManager.stop
      79ea8b4 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      b446edc [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      5c9a4cb [nishkamravi2] Update TaskSetManagerSuite.scala
      535295a [nishkamravi2] Update TaskSetManager.scala
      3e1b616 [Nishkam Ravi] Modify test for maxResultSize
      9f6583e [Nishkam Ravi] Changes to maxResultSize code (improve error message and add condition to check if maxResultSize > 0)
      5f8f9ed [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      636a9ff [nishkamravi2] Update YarnAllocator.scala
      8f76c8b [Nishkam Ravi] Doc change for yarn memory overhead
      35daa64 [Nishkam Ravi] Slight change in the doc for yarn memory overhead
      5ac2ec1 [Nishkam Ravi] Remove out
      dac1047 [Nishkam Ravi] Additional documentation for yarn memory overhead issue
      42c2c3d [Nishkam Ravi] Additional changes for yarn memory overhead issue
      362da5e [Nishkam Ravi] Additional changes for yarn memory overhead
      c726bd9 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      f00fa31 [Nishkam Ravi] Improving logging for AM memoryOverhead
      1cf2d1e [nishkamravi2] Update YarnAllocator.scala
      ebcde10 [Nishkam Ravi] Modify default YARN memory_overhead-- from an additive constant to a multiplier (redone to resolve merge conflicts)
      2e69f11 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      efd688a [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark
      2b630f9 [nravi] Accept memory input as "30g", "512M" instead of an int value, to be consistent with rest of Spark
      3bf8fad [nravi] Merge branch 'master' of https://github.com/apache/spark
      5423a03 [nravi] Merge branch 'master' of https://github.com/apache/spark
      eb663ca [nravi] Merge branch 'master' of https://github.com/apache/spark
      df2aeb1 [nravi] Improved fix for ConcurrentModificationIssue (Spark-1097, Hadoop-10456)
      6b840f0 [nravi] Undo the fix for SPARK-1758 (the problem is fixed)
      5108700 [nravi] Fix in Spark for the Concurrent thread modification issue (SPARK-1097, HADOOP-10456)
      681b36f [nravi] Fix for SPARK-1758: failing test org.apache.spark.JavaAPISuite.wholeTextFiles
      0ac8b01a
  7. May 15, 2015
    • Josh Rosen's avatar
      [SPARK-7563] OutputCommitCoordinator.stop() should only run on the driver · 2c04c8a1
      Josh Rosen authored
      This fixes a bug where an executor that exits can cause the driver's OutputCommitCoordinator to stop. To fix this, we use an `isDriver` flag and check it in `stop()`.
      
      See https://issues.apache.org/jira/browse/SPARK-7563 for more details.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #6197 from JoshRosen/SPARK-7563 and squashes the following commits:
      
      04b2cc5 [Josh Rosen] [SPARK-7563] OutputCommitCoordinator.stop() should only be executed on the driver
      2c04c8a1
    • Kay Ousterhout's avatar
      [SPARK-7676] Bug fix and cleanup of stage timeline view · e7454564
      Kay Ousterhout authored
      cc pwendell sarutak
      
      This commit cleans up some unnecessary code, eliminates the feature where when you mouse-over a box in the timeline, the corresponding task is highlighted in the table (because that feature is only useful in the rare case when you have a very small number of tasks, in which case it's easy to figure out the mapping anyway), and fixes a bug where nothing shows up if you try to visualize a stage with only 1 task.
      
      Author: Kay Ousterhout <kayousterhout@gmail.com>
      
      Closes #6202 from kayousterhout/SPARK-7676 and squashes the following commits:
      
      dfd29d4 [Kay Ousterhout] [SPARK-7676] Bug fix and cleanup of stage timeline view
      e7454564
    • Kousuke Saruta's avatar
      [SPARK-7296] Add timeline visualization for stages in the UI. · 9b6cf285
      Kousuke Saruta authored
      This PR builds on #2342 by adding a timeline view for the Stage page,
      showing how tasks spend their time.
      
      With this timeline, we can understand following things of a Stage.
      
      * When/where each task ran
      * Total duration of each task
      * Proportion of the time each task spends
      
      Also, this timeline view can scrollable and zoomable.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #5843 from sarutak/stage-page-timeline and squashes the following commits:
      
      4ba9604 [Kousuke Saruta] Fixed the order of legends
      16bb552 [Kousuke Saruta] Removed border of legend area
      2e5d605 [Kousuke Saruta] Modified warning message
      16cb2e6 [Kousuke Saruta] Merge branch 'master' of https://github.com/apache/spark into stage-page-timeline
      7ae328f [Kousuke Saruta] Modified code style
      d5f794a [Kousuke Saruta] Fixed performance issues more
      64e6642 [Kousuke Saruta] Merge branch 'master' of https://github.com/apache/spark into stage-page-timeline
      e4a3354 [Kousuke Saruta] minor code style change
      878e3b8 [Kousuke Saruta] Fixed a bug that tooltip remains
      b9d8f1b [Kousuke Saruta] Fixed performance issue
      ac8842b [Kousuke Saruta] Fixed layout
      2319739 [Kousuke Saruta] Modified appearances more
      81903ab [Kousuke Saruta] Modified appearances
      a79dcc3 [Kousuke Saruta] Modified appearance
      55a390c [Kousuke Saruta] Ignored scalastyle for a line-comment
      29eae3e [Kousuke Saruta] limited to longest 1000 tasks
      2a9e376 [Kousuke Saruta] Minor cleanup
      385b6d2 [Kousuke Saruta] Added link feature
      ba1ac3e [Kousuke Saruta] Fixed style
      2ae8520 [Kousuke Saruta] Updated bootstrap-tooltip.js from 2.2.2 to 2.3.2
      af430f1 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into stage-page-timeline
      e694b8e [Kousuke Saruta] Added timeline view to StagePage
      8f6610c [Kousuke Saruta] Fixed conflict
      b587cf2 [Kousuke Saruta] initial commit
      11fe67d [Kousuke Saruta] Fixed conflict
      79ac03d [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
      a91abd3 [Kousuke Saruta] Merge branch 'master' of https://github.com/apache/spark into timeline-viewer-feature
      ef34a5b [Kousuke Saruta] Implement tooltip using bootstrap
      b09d0c5 [Kousuke Saruta] Move `stroke` and `fill` attribute of rect elements to css
      d3c63c8 [Kousuke Saruta] Fixed a little bit bugs
      a36291b [Kousuke Saruta] Merge branch 'master' of https://github.com/apache/spark into timeline-viewer-feature
      28714b6 [Kousuke Saruta] Fixed highlight issue
      0dc4278 [Kousuke Saruta] Addressed most of Patrics's feedbacks
      8110acf [Kousuke Saruta] Added scroll limit to Job timeline
      974a64a [Kousuke Saruta] Removed unused function
      ee7a7f0 [Kousuke Saruta] Refactored
      6a91872 [Kousuke Saruta] Temporary commit
      6693f34 [Kousuke Saruta] Added link to job/stage box in the timeline in order to move to corresponding row when we click
      8f88222 [Kousuke Saruta] Added job/stage description
      aeed4b1 [Kousuke Saruta] Removed stage timeline
      fc1696c [Kousuke Saruta] Merge branch 'timeline-viewer-feature' of github.com:sarutak/spark into timeline-viewer-feature
      999ccd4 [Kousuke Saruta] Improved scalability
      0fc6a31 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
      19815ae [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
      68b7540 [Kousuke Saruta] Merge branch 'timeline-viewer-feature' of github.com:sarutak/spark into timeline-viewer-feature
      52b5f0b [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
      dec85db [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
      fcdab7d [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
      dab7cc1 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
      09cce97 [Kousuke Saruta] Cleanuped
      16f82cf [Kousuke Saruta] Cleanuped
      9fb522e [Kousuke Saruta] Cleanuped
      d05f2c2 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into timeline-viewer-feature
      e85e9aa [Kousuke Saruta] Cleanup: Added TimelineViewUtils.scala
      a76e569 [Kousuke Saruta] Removed unused setting in timeline-view.css
      5ce1b21 [Kousuke Saruta] Added vis.min.js, vis.min.css and vis.map to .rat-exclude
      082f709 [Kousuke Saruta] Added Timeline-View feature for Applications, Jobs and Stages
      9b6cf285
    • ehnalis's avatar
      [SPARK-7504] [YARN] NullPointerException when initializing SparkContext in YARN-cluster mode · 8e3822a0
      ehnalis authored
      Added a simple checking for SparkContext.
      Also added two rational checking against null at AM object.
      
      Author: ehnalis <zoltan.zvara@gmail.com>
      
      Closes #6083 from ehnalis/cluster and squashes the following commits:
      
      926bd96 [ehnalis] Moved check to SparkContext.
      7c89b6e [ehnalis] Remove false line.
      ea2a5fe [ehnalis] [SPARK-7504] [YARN] NullPointerException when initializing SparkContext in YARN-cluster mode
      4924e01 [ehnalis] [SPARK-7504] [YARN] NullPointerException when initializing SparkContext in YARN-cluster mode
      39e4fa3 [ehnalis] SPARK-7504 [YARN] NullPointerException when initializing SparkContext in YARN-cluster mode
      9f287c5 [ehnalis] [SPARK-7504] [YARN] NullPointerException when initializing SparkContext in YARN-cluster mode
      8e3822a0
    • Kousuke Saruta's avatar
      [SPARK-7664] [WEBUI] DAG visualization: Fix incorrect link paths of DAG. · ad92af9d
      Kousuke Saruta authored
      In JobPage, we can jump a StagePage when we click corresponding box of DAG viz but the link path is incorrect.
      
      When we click a box like as follows ...
      ![screenshot_from_2015-05-15 19 24 25](https://cloud.githubusercontent.com/assets/4736016/7651528/5f7ef824-fb3c-11e4-9518-8c9ade2dff7a.png)
      
      We jump to index page.
      ![screenshot_from_2015-05-15 19 24 45](https://cloud.githubusercontent.com/assets/4736016/7651534/6d666274-fb3c-11e4-971c-c3f2dc2b1da2.png)
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #6184 from sarutak/fix-link-path-of-dag-viz and squashes the following commits:
      
      faba3ba [Kousuke Saruta] Fix a incorrect link
      ad92af9d
    • Tim Ellison's avatar
      [CORE] Protect additional test vars from early GC · 270d4b51
      Tim Ellison authored
      Fix more places in which some test variables could be collected early by aggressive JVM optimization.
      Added a couple of comments to note where existing references are sufficient in the same test pattern.
      
      Author: Tim Ellison <t.p.ellison@gmail.com>
      
      Closes #6187 from tellison/DefeatEarlyGC and squashes the following commits:
      
      27329d9 [Tim Ellison] [CORE] Protect additional test vars from early GC
      270d4b51
    • Oleksii Kostyliev's avatar
      [SPARK-7233] [CORE] Detect REPL mode once · b1b9d580
      Oleksii Kostyliev authored
      <h3>Description</h3>
      Detect REPL mode once per JVM lifespan.
      Previous behavior was to check presence of interpreter mode every time a job was submitted. In the case of execution of multiple short-living jobs this was causing massive mutual blocks between submission threads.
      
      For more details please refer to https://issues.apache.org/jira/browse/SPARK-7233.
      
      <h3>Notes</h3>
      * I inverted the return value in case of catching an exception from `true` to `false`. It seems more logical to assume that if the REPL class is not found, we aren't in the interpreter mode.
      * I'd personally would call `classForName` with just a Spark classloader (`org.apache.spark.util.Utils#getSparkClassLoader`) but `org.apache.spark.util.Utils#getContextOrSparkClassLoader` is said to be preferable.
      * I struggled to come up with a concise, readable and clear unit test. Suggestions are welcome if you feel necessary.
      
      Author: Oleksii Kostyliev <etander@gmail.com>
      Author: Oleksii Kostyliev <okostyliev@thunderhead.com>
      
      Closes #5835 from preeze/SPARK-7233 and squashes the following commits:
      
      69bb9e4 [Oleksii Kostyliev] SPARK-7527: fixed explanatory comment to meet style-checker requirements
      26dcc24 [Oleksii Kostyliev] SPARK-7527: fixed explanatory comment to meet style-checker requirements
      c6f9685 [Oleksii Kostyliev] Merge remote-tracking branch 'remotes/upstream/master' into SPARK-7233
      b78a983 [Oleksii Kostyliev] SPARK-7527: revert the fix and let it be addressed separately at a later stage
      b64d441 [Oleksii Kostyliev] SPARK-7233: inline inInterpreter parameter into instantiateClass
      86e2606 [Oleksii Kostyliev] SPARK-7233, SPARK-7527: Handle interpreter mode properly.
      c7ee69c [Oleksii Kostyliev] Merge remote-tracking branch 'upstream/master' into SPARK-7233
      d6c07fc [Oleksii Kostyliev] SPARK-7233: properly handle the inverted meaning of isInInterpreter
      c319039 [Oleksii Kostyliev] SPARK-7233: move inInterpreter to Utils and make it lazy
      b1b9d580
    • zsxwing's avatar
      [SPARK-7650] [STREAMING] [WEBUI] Move streaming css and js files to the streaming project · cf842d42
      zsxwing authored
      cc tdas
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6160 from zsxwing/SPARK-7650 and squashes the following commits:
      
      fe6ae15 [zsxwing] Fix the import order
      a4ffd99 [zsxwing] Merge branch 'master' into SPARK-7650
      dc402b6 [zsxwing] Move streaming css and js files to the streaming project
      cf842d42
    • Kan Zhang's avatar
      [CORE] Remove unreachable Heartbeat message from Worker · daf4ae72
      Kan Zhang authored
      It doesn't look to me Heartbeat is sent to Worker from anyone.
      
      Author: Kan Zhang <kzhang@apache.org>
      
      Closes #6163 from kanzhang/deadwood and squashes the following commits:
      
      56be118 [Kan Zhang] [core] Remove unreachable Heartbeat message from Worker
      daf4ae72
    • Josh Rosen's avatar
  8. May 14, 2015
  9. May 13, 2015
    • Andrew Or's avatar
      [HOT FIX #6125] Do not wait for all stages to start rendering · 3113da9c
      Andrew Or authored
      zsxwing
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #6138 from andrewor14/dag-viz-clean-properly and squashes the following commits:
      
      19d4e98 [Andrew Or] Add synchronize
      02542d6 [Andrew Or] Rename overloaded variable
      d11bee1 [Andrew Or] Don't wait until all stages have started before rendering
      3113da9c
    • Josh Rosen's avatar
      [SPARK-7081] Faster sort-based shuffle path using binary processing cache-aware sort · 73bed408
      Josh Rosen authored
      This patch introduces a new shuffle manager that enhances the existing sort-based shuffle with a new cache-friendly sort algorithm that operates directly on binary data. The goals of this patch are to lower memory usage and Java object overheads during shuffle and to speed up sorting. It also lays groundwork for follow-up patches that will enable end-to-end processing of serialized records.
      
      The new shuffle manager, `UnsafeShuffleManager`, can be enabled by setting `spark.shuffle.manager=tungsten-sort` in SparkConf.
      
      The new shuffle manager uses directly-managed memory to implement several performance optimizations for certain types of shuffles. In cases where the new performance optimizations cannot be applied, the new shuffle manager delegates to SortShuffleManager to handle those shuffles.
      
      UnsafeShuffleManager's optimizations will apply when _all_ of the following conditions hold:
      
       - The shuffle dependency specifies no aggregation or output ordering.
       - The shuffle serializer supports relocation of serialized values (this is currently supported
         by KryoSerializer and Spark SQL's custom serializers).
       - The shuffle produces fewer than 16777216 output partitions.
       - No individual record is larger than 128 MB when serialized.
      
      In addition, extra spill-merging optimizations are automatically applied when the shuffle compression codec supports concatenation of serialized streams. This is currently supported by Spark's LZF serializer.
      
      At a high-level, UnsafeShuffleManager's design is similar to Spark's existing SortShuffleManager.  In sort-based shuffle, incoming records are sorted according to their target partition ids, then written to a single map output file. Reducers fetch contiguous regions of this file in order to read their portion of the map output. In cases where the map output data is too large to fit in memory, sorted subsets of the output can are spilled to disk and those on-disk files are merged to produce the final output file.
      
      UnsafeShuffleManager optimizes this process in several ways:
      
       - Its sort operates on serialized binary data rather than Java objects, which reduces memory consumption and GC overheads. This optimization requires the record serializer to have certain properties to allow serialized records to be re-ordered without requiring deserialization.  See SPARK-4550, where this optimization was first proposed and implemented, for more details.
      
       - It uses a specialized cache-efficient sorter (UnsafeShuffleExternalSorter) that sorts arrays of compressed record pointers and partition ids. By using only 8 bytes of space per record in the sorting array, this fits more of the array into cache.
      
       - The spill merging procedure operates on blocks of serialized records that belong to the same partition and does not need to deserialize records during the merge.
      
       - When the spill compression codec supports concatenation of compressed data, the spill merge simply concatenates the serialized and compressed spill partitions to produce the final output partition.  This allows efficient data copying methods, like NIO's `transferTo`, to be used and avoids the need to allocate decompression or copying buffers during the merge.
      
      The shuffle read path is unchanged.
      
      This patch is similar to [SPARK-4550](http://issues.apache.org/jira/browse/SPARK-4550) / #4450 but uses a slightly different implementation. The `unsafe`-based implementation featured in this patch lays the groundwork for followup patches that will enable sorting to operate on serialized data pages that will be prepared by Spark SQL's new `unsafe` operators (such as the new aggregation operator introduced in #5725).
      
      ### Future work
      
      There are several tasks that build upon this patch, which will be left to future work:
      
      - [SPARK-7271](https://issues.apache.org/jira/browse/SPARK-7271) Redesign / extend the shuffle interfaces to accept binary data as input. The goal here is to let us bypass serialization steps in cases where the sort input is produced by an operator that operates directly on binary data.
      - Extension / redesign of the `Serializer` API. We can add new methods which allow serializers to determine the size requirements for serializing objects and for serializing objects directly to a specified memory address (similar to how `UnsafeRowConverter` works in Spark SQL).
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/5868)
      <!-- Reviewable:end -->
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #5868 from JoshRosen/unsafe-sort and squashes the following commits:
      
      ef0a86e [Josh Rosen] Fix scalastyle errors
      7610f2f [Josh Rosen] Add tests for proper cleanup of shuffle data.
      d494ffe [Josh Rosen] Fix deserialization of JavaSerializer instances.
      52a9981 [Josh Rosen] Fix some bugs in the address packing code.
      51812a7 [Josh Rosen] Change shuffle manager sort name to tungsten-sort
      4023fa4 [Josh Rosen] Add @Private annotation to some Java classes.
      de40b9d [Josh Rosen] More comments to try to explain metrics code
      df07699 [Josh Rosen] Attempt to clarify confusing metrics update code
      5e189c6 [Josh Rosen] Track time spend closing / flushing files; split TimeTrackingOutputStream into separate file.
      d5779c6 [Josh Rosen] Merge remote-tracking branch 'origin/master' into unsafe-sort
      c2ce78e [Josh Rosen] Fix a missed usage of MAX_PARTITION_ID
      e3b8855 [Josh Rosen] Cleanup in UnsafeShuffleWriter
      4a2c785 [Josh Rosen] rename 'sort buffer' to 'pointer array'
      6276168 [Josh Rosen] Remove ability to disable spilling in UnsafeShuffleExternalSorter.
      57312c9 [Josh Rosen] Clarify fileBufferSize units
      2d4e4f4 [Josh Rosen] Address some minor comments in UnsafeShuffleExternalSorter.
      fdcac08 [Josh Rosen] Guard against overflow when expanding sort buffer.
      85da63f [Josh Rosen] Cleanup in UnsafeShuffleSorterIterator.
      0ad34da [Josh Rosen] Fix off-by-one in nextInt() call
      56781a1 [Josh Rosen] Rename UnsafeShuffleSorter to UnsafeShuffleInMemorySorter
      e995d1a [Josh Rosen] Introduce MAX_SHUFFLE_OUTPUT_PARTITIONS.
      e58a6b4 [Josh Rosen] Add more tests for PackedRecordPointer encoding.
      4f0b770 [Josh Rosen] Attempt to implement proper shuffle write metrics.
      d4e6d89 [Josh Rosen] Update to bit shifting constants
      69d5899 [Josh Rosen] Remove some unnecessary override vals
      8531286 [Josh Rosen] Add tests that automatically trigger spills.
      7c953f9 [Josh Rosen] Add test that covers UnsafeShuffleSortDataFormat.swap().
      e1855e5 [Josh Rosen] Fix a handful of misc. IntelliJ inspections
      39434f9 [Josh Rosen] Avoid integer multiplication overflow in getMemoryUsage (thanks FindBugs!)
      1e3ad52 [Josh Rosen] Delete unused ByteBufferOutputStream class.
      ea4f85f [Josh Rosen] Roll back an unnecessary change in Spillable.
      ae538dc [Josh Rosen] Document UnsafeShuffleManager.
      ec6d626 [Josh Rosen] Add notes on maximum # of supported shuffle partitions.
      0d4d199 [Josh Rosen] Bump up shuffle.memoryFraction to make tests pass.
      b3b1924 [Josh Rosen] Properly implement close() and flush() in DummySerializerInstance.
      1ef56c7 [Josh Rosen] Revise compression codec support in merger; test cross product of configurations.
      b57c17f [Josh Rosen] Disable some overly-verbose logs that rendered DEBUG useless.
      f780fb1 [Josh Rosen] Add test demonstrating which compression codecs support concatenation.
      4a01c45 [Josh Rosen] Remove unnecessary log message
      27b18b0 [Josh Rosen] That for inserting records AT the max record size.
      fcd9a3c [Josh Rosen] Add notes + tests for maximum record / page sizes.
      9d1ee7c [Josh Rosen] Fix MiMa excludes for ShuffleWriter change
      fd4bb9e [Josh Rosen] Use own ByteBufferOutputStream rather than Kryo's
      67d25ba [Josh Rosen] Update Exchange operator's copying logic to account for new shuffle manager
      8f5061a [Josh Rosen] Strengthen assertion to check partitioning
      01afc74 [Josh Rosen] Actually read data in UnsafeShuffleWriterSuite
      1929a74 [Josh Rosen] Update to reflect upstream ShuffleBlockManager -> ShuffleBlockResolver rename.
      e8718dd [Josh Rosen] Merge remote-tracking branch 'origin/master' into unsafe-sort
      9b7ebed [Josh Rosen] More defensive programming RE: cleaning up spill files and memory after errors
      7cd013b [Josh Rosen] Begin refactoring to enable proper tests for spilling.
      722849b [Josh Rosen] Add workaround for transferTo() bug in merging code; refactor tests.
      9883e30 [Josh Rosen] Merge remote-tracking branch 'origin/master' into unsafe-sort
      b95e642 [Josh Rosen] Refactor and document logic that decides when to spill.
      1ce1300 [Josh Rosen] More minor cleanup
      5e8cf75 [Josh Rosen] More minor cleanup
      e67f1ea [Josh Rosen] Remove upper type bound in ShuffleWriter interface.
      cfe0ec4 [Josh Rosen] Address a number of minor review comments:
      8a6fe52 [Josh Rosen] Rename UnsafeShuffleSpillWriter to UnsafeShuffleExternalSorter
      11feeb6 [Josh Rosen] Update TODOs related to shuffle write metrics.
      b674412 [Josh Rosen] Merge remote-tracking branch 'origin/master' into unsafe-sort
      aaea17b [Josh Rosen] Add comments to UnsafeShuffleSpillWriter.
      4f70141 [Josh Rosen] Fix merging; now passes UnsafeShuffleSuite tests.
      133c8c9 [Josh Rosen] WIP towards testing UnsafeShuffleWriter.
      f480fb2 [Josh Rosen] WIP in mega-refactoring towards shuffle-specific sort.
      57f1ec0 [Josh Rosen] WIP towards packed record pointers for use in optimized shuffle sort.
      69232fd [Josh Rosen] Enable compressible address encoding for off-heap mode.
      7ee918e [Josh Rosen] Re-order imports in tests
      3aeaff7 [Josh Rosen] More refactoring and cleanup; begin cleaning iterator interfaces
      3490512 [Josh Rosen] Misc. cleanup
      f156a8f [Josh Rosen] Hacky metrics integration; refactor some interfaces.
      2776aca [Josh Rosen] First passing test for ExternalSorter.
      5e100b2 [Josh Rosen] Super-messy WIP on external sort
      595923a [Josh Rosen] Remove some unused variables.
      8958584 [Josh Rosen] Fix bug in calculating free space in current page.
      f17fa8f [Josh Rosen] Add missing newline
      c2fca17 [Josh Rosen] Small refactoring of SerializerPropertiesSuite to enable test re-use:
      b8a09fe [Josh Rosen] Back out accidental log4j.properties change
      bfc12d3 [Josh Rosen] Add tests for serializer relocation property.
      240864c [Josh Rosen] Remove PrefixComputer and require prefix to be specified as part of insert()
      1433b42 [Josh Rosen] Store record length as int instead of long.
      026b497 [Josh Rosen] Re-use a buffer in UnsafeShuffleWriter
      0748458 [Josh Rosen] Port UnsafeShuffleWriter to Java.
      87e721b [Josh Rosen] Renaming and comments
      d3cc310 [Josh Rosen] Flag that SparkSqlSerializer2 supports relocation
      e2d96ca [Josh Rosen] Expand serializer API and use new function to help control when new UnsafeShuffle path is used.
      e267cee [Josh Rosen] Fix compilation of UnsafeSorterSuite
      9c6cf58 [Josh Rosen] Refactor to use DiskBlockObjectWriter.
      253f13e [Josh Rosen] More cleanup
      8e3ec20 [Josh Rosen] Begin code cleanup.
      4d2f5e1 [Josh Rosen] WIP
      3db12de [Josh Rosen] Minor simplification and sanity checks in UnsafeSorter
      767d3ca [Josh Rosen] Fix invalid range in UnsafeSorter.
      e900152 [Josh Rosen] Add test for empty iterator in UnsafeSorter
      57a4ea0 [Josh Rosen] Make initialSize configurable in UnsafeSorter
      abf7bfe [Josh Rosen] Add basic test case.
      81d52c5 [Josh Rosen] WIP on UnsafeSorter
      73bed408
    • Andrew Or's avatar
      [SPARK-7502] DAG visualization: gracefully handle removed stages · aa183787
      Andrew Or authored
      Old stages are removed without much feedback to the user. This happens very often in streaming. See screenshots below for more detail. zsxwing
      
      **Before**
      
      <img src="https://cloud.githubusercontent.com/assets/2133137/7621031/643cc1e0-f978-11e4-8f42-09decaac44a7.png" width="500px"/>
      
      -------------------------
      **After**
      <img src="https://cloud.githubusercontent.com/assets/2133137/7621037/6e37348c-f978-11e4-84a5-e44e154f9b13.png" width="400px"/>
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #6132 from andrewor14/dag-viz-remove-gracefully and squashes the following commits:
      
      43175cd [Andrew Or] Handle removed jobs and stages gracefully
      aa183787
    • Andrew Or's avatar
      [SPARK-7464] DAG visualization: highlight the same RDDs on hover · 44403414
      Andrew Or authored
      This is pretty useful for MLlib.
      
      <img src="https://cloud.githubusercontent.com/assets/2133137/7599650/c7d03dd8-f8b8-11e4-8c0a-0a89e786c90f.png" width="400px"/>
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #6100 from andrewor14/dag-viz-hover and squashes the following commits:
      
      fefe2af [Andrew Or] Link tooltips for nodes that belong to the same RDD
      90c6a7e [Andrew Or] Assign classes to clusters and nodes, not IDs
      44403414
    • Andrew Or's avatar
      [SPARK-7399] Spark compilation error for scala 2.11 · f88ac701
      Andrew Or authored
      Subsequent fix following #5966. I tried this out locally.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #6129 from andrewor14/211-compilation and squashes the following commits:
      
      713868f [Andrew Or] Fix compilation issue for scala 2.11
      f88ac701
    • Andrew Or's avatar
      [SPARK-7608] Clean up old state in RDDOperationGraphListener · f6e18388
      Andrew Or authored
      This is necessary for streaming and long-running Spark applications. zsxwing tdas
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #6125 from andrewor14/viz-listener-leak and squashes the following commits:
      
      8660949 [Andrew Or] Fix thing + add tests
      33c0843 [Andrew Or] Clean up old job state
      f6e18388
Loading