Skip to content
Snippets Groups Projects
  1. Aug 27, 2016
  2. Aug 26, 2016
  3. Aug 25, 2016
    • Marcelo Vanzin's avatar
      [SPARK-17240][CORE] Make SparkConf serializable again. · 9b5a1d1d
      Marcelo Vanzin authored
      Make the config reader transient, and initialize it lazily so that
      serialization works with both java and kryo (and hopefully any other
      custom serializer).
      
      Added unit test to make sure SparkConf remains serializable and the
      reader works with both built-in serializers.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #14813 from vanzin/SPARK-17240.
      9b5a1d1d
    • Sean Owen's avatar
      [SPARK-17193][CORE] HadoopRDD NPE at DEBUG log level when getLocationInfo == null · 2bcd5d5c
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Handle null from Hadoop getLocationInfo directly instead of catching (and logging) exception
      
      ## How was this patch tested?
      
      Jenkins tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #14760 from srowen/SPARK-17193.
      2bcd5d5c
  4. Aug 24, 2016
    • Alex Bozarth's avatar
      [SPARK-15083][WEB UI] History Server can OOM due to unlimited TaskUIData · 891ac2b9
      Alex Bozarth authored
      ## What changes were proposed in this pull request?
      
      Based on #12990 by tankkyo
      
      Since the History Server currently loads all application's data it can OOM if too many applications have a significant task count. `spark.ui.trimTasks` (default: false) can be set to true to trim tasks by `spark.ui.retainedTasks` (default: 10000)
      
      (This is a "quick fix" to help those running into the problem until a update of how the history server loads app data can be done)
      
      ## How was this patch tested?
      
      Manual testing and dev/run-tests
      
      ![spark-15083](https://cloud.githubusercontent.com/assets/13952758/17713694/fe82d246-63b0-11e6-9697-b87ea75ff4ef.png)
      
      Author: Alex Bozarth <ajbozart@us.ibm.com>
      
      Closes #14673 from ajbozarth/spark15083.
      891ac2b9
    • Sean Owen's avatar
      [SPARK-16781][PYSPARK] java launched by PySpark as gateway may not be the same... · 0b3a4be9
      Sean Owen authored
      [SPARK-16781][PYSPARK] java launched by PySpark as gateway may not be the same java used in the spark environment
      
      ## What changes were proposed in this pull request?
      
      Update to py4j 0.10.3 to enable JAVA_HOME support
      
      ## How was this patch tested?
      
      Pyspark tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #14748 from srowen/SPARK-16781.
      0b3a4be9
    • Weiqing Yang's avatar
      [MINOR][BUILD] Fix Java CheckStyle Error · 673a80d2
      Weiqing Yang authored
      ## What changes were proposed in this pull request?
      As Spark 2.0.1 will be released soon (mentioned in the spark dev mailing list), besides the critical bugs, it's better to fix the code style errors before the release.
      
      Before:
      ```
      ./dev/lint-java
      Checkstyle checks failed at following occurrences:
      [ERROR] src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java:[525] (sizes) LineLength: Line is longer than 100 characters (found 119).
      [ERROR] src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredNetworkWordCount.java:[64] (sizes) LineLength: Line is longer than 100 characters (found 103).
      ```
      After:
      ```
      ./dev/lint-java
      Using `mvn` from path: /usr/local/bin/mvn
      Checkstyle checks passed.
      ```
      ## How was this patch tested?
      Manual.
      
      Author: Weiqing Yang <yangweiqing001@gmail.com>
      
      Closes #14768 from Sherry302/fixjavastyle.
      673a80d2
  5. Aug 23, 2016
    • Tejas Patil's avatar
      [SPARK-16862] Configurable buffer size in `UnsafeSorterSpillReader` · c1937dd1
      Tejas Patil authored
      ## What changes were proposed in this pull request?
      
      Jira: https://issues.apache.org/jira/browse/SPARK-16862
      
      `BufferedInputStream` used in `UnsafeSorterSpillReader` uses the default 8k buffer to read data off disk. This PR makes it configurable to improve on disk reads. I have made the default value to be 1 MB as with that value I observed improved performance.
      
      ## How was this patch tested?
      
      I am relying on the existing unit tests.
      
      ## Performance
      
      After deploying this change to prod and setting the config to 1 mb, there was a 12% reduction in the CPU time and 19.5% reduction in CPU reservation time.
      
      Author: Tejas Patil <tejasp@fb.com>
      
      Closes #14726 from tejasapatil/spill_buffer_2.
      c1937dd1
  6. Aug 22, 2016
    • Eric Liang's avatar
      [SPARK-16550][SPARK-17042][CORE] Certain classes fail to deserialize in block manager replication · 8e223ea6
      Eric Liang authored
      ## What changes were proposed in this pull request?
      
      This is a straightforward clone of JoshRosen 's original patch. I have follow-up changes to fix block replication for repl-defined classes as well, but those appear to be flaking tests so I'm going to leave that for SPARK-17042
      
      ## How was this patch tested?
      
      End-to-end test in ReplSuite (also more tests in DistributedSuite from the original patch).
      
      Author: Eric Liang <ekl@databricks.com>
      
      Closes #14311 from ericl/spark-16550.
      8e223ea6
  7. Aug 21, 2016
    • wm624@hotmail.com's avatar
      [SPARK-17002][CORE] Document that spark.ssl.protocol. is required for SSL · e328f577
      wm624@hotmail.com authored
      ## What changes were proposed in this pull request?
      
      `spark.ssl.enabled`=true, but failing to set `spark.ssl.protocol` will fail and throw meaningless exception. `spark.ssl.protocol` is required when `spark.ssl.enabled`.
      
      Improvement: require `spark.ssl.protocol` when initializing SSLContext, otherwise throws an exception to indicate that.
      
      Remove the OrElse("default").
      
      Document this requirement in configure.md
      
      ## How was this patch tested?
      
      (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
      
      Manual tests:
      Build document and check document
      
      Configure `spark.ssl.enabled` only, it throws exception below:
      6/08/16 16:04:37 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(mwang); groups with view permissions: Set(); users  with modify permissions: Set(mwang); groups with modify permissions: Set()
      Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: spark.ssl.protocol is required when enabling SSL connections.
      	at scala.Predef$.require(Predef.scala:224)
      	at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:285)
      	at org.apache.spark.deploy.master.Master$.startRpcEnvAndEndpoint(Master.scala:1026)
      	at org.apache.spark.deploy.master.Master$.main(Master.scala:1011)
      	at org.apache.spark.deploy.master.Master.main(Master.scala)
      
      Configure `spark.ssl.protocol`  and `spark.ssl.protocol`
      It works fine.
      
      Author: wm624@hotmail.com <wm624@hotmail.com>
      
      Closes #14674 from wangmiao1981/ssl.
      e328f577
  8. Aug 20, 2016
    • Bryan Cutler's avatar
      [SPARK-12666][CORE] SparkSubmit packages fix for when 'default' conf doesn't... · 9f37d4ea
      Bryan Cutler authored
      [SPARK-12666][CORE] SparkSubmit packages fix for when 'default' conf doesn't exist in dependent module
      
      ## What changes were proposed in this pull request?
      
      Adding a "(runtime)" to the dependency configuration will set a fallback configuration to be used if the requested one is not found.  E.g. with the setting "default(runtime)", Ivy will look for the conf "default" in the module ivy file and if not found will look for the conf "runtime".  This can help with the case when using "sbt publishLocal" which does not write a "default" conf in the published ivy.xml file.
      
      ## How was this patch tested?
      used spark-submit with --packages option for a package published locally with no default conf, and a package resolved from Maven central.
      
      Author: Bryan Cutler <cutlerb@gmail.com>
      
      Closes #13428 from BryanCutler/fallback-package-conf-SPARK-12666.
      9f37d4ea
  9. Aug 19, 2016
    • Sital Kedia's avatar
      [SPARK-17113] [SHUFFLE] Job failure due to Executor OOM in offheap mode · cf0cce90
      Sital Kedia authored
      ## What changes were proposed in this pull request?
      
      This PR fixes executor OOM in offheap mode due to bug in Cooperative Memory Management for UnsafeExternSorter.  UnsafeExternalSorter was checking if memory page is being used by upstream by comparing the base object address of the current page with the base object address of upstream. However, in case of offheap memory allocation, the base object addresses are always null, so there was no spilling happening and eventually the operator would OOM.
      
      Following is the stack trace this issue addresses -
      java.lang.OutOfMemoryError: Unable to acquire 1220 bytes of memory, got 0
      	at org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:120)
      	at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:341)
      	at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:362)
      	at org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:93)
      	at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:170)
      
      ## How was this patch tested?
      
      Tested by running the failing job.
      
      Author: Sital Kedia <skedia@fb.com>
      
      Closes #14693 from sitalkedia/fix_offheap_oom.
      cf0cce90
    • Kousuke Saruta's avatar
      [SPARK-11227][CORE] UnknownHostException can be thrown when NameNode HA is enabled. · 071eaaf9
      Kousuke Saruta authored
      ## What changes were proposed in this pull request?
      
      If the following conditions are satisfied, executors don't load properties in `hdfs-site.xml` and UnknownHostException can be thrown.
      
      (1) NameNode HA is enabled
      (2) spark.eventLogging is disabled or logging path is NOT on HDFS
      (3) Using Standalone or Mesos for the cluster manager
      (4) There are no code to load `HdfsCondition` class in the driver regardless of directly or indirectly.
      (5) The tasks access to HDFS
      
      (There might be some more conditions...)
      
      For example, following code causes UnknownHostException when the conditions above are satisfied.
      ```
      sc.textFile("<path on HDFS>").collect
      
      ```
      
      ```
      java.lang.IllegalArgumentException: java.net.UnknownHostException: hacluster
      	at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:378)
      	at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310)
      	at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
      	at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678)
      	at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619)
      	at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
      	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
      	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
      	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
      	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
      	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
      	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170)
      	at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:656)
      	at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:438)
      	at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:411)
      	at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:986)
      	at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:986)
      	at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:177)
      	at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:177)
      	at scala.Option.map(Option.scala:146)
      	at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:177)
      	at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:213)
      	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:209)
      	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:102)
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
      	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
      	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
      	at org.apache.spark.scheduler.Task.run(Task.scala:85)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: java.net.UnknownHostException: hacluster
      ```
      
      But following code doesn't cause the Exception because `textFile` method loads `HdfsConfiguration` indirectly.
      
      ```
      sc.textFile("<path on HDFS>").collect
      ```
      
      When a job includes some operations which access to HDFS, the object of `org.apache.hadoop.Configuration` is wrapped by `SerializableConfiguration`,  serialized and broadcasted from driver to executors and each executor deserialize the object with `loadDefaults` false so HDFS related properties should be set before broadcasted.
      
      ## How was this patch tested?
      Tested manually on my standalone cluster.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #13738 from sarutak/SPARK-11227.
      071eaaf9
    • Alex Bozarth's avatar
      [SPARK-16673][WEB UI] New Executor Page removed conditional for Logs and Thread Dump columns · e98eb214
      Alex Bozarth authored
      ## What changes were proposed in this pull request?
      
      When #13670 switched `ExecutorsPage` to use JQuery DataTables it incidentally removed the conditional for the Logs and Thread Dump columns. I reimplemented the conditional display of the Logs and Thread dump columns as it was before the switch.
      
      ## How was this patch tested?
      
      Manually tested and dev/run-tests
      
      ![both](https://cloud.githubusercontent.com/assets/13952758/17186879/da8dd1a8-53eb-11e6-8b0c-d0ff0156a9a7.png)
      ![dump](https://cloud.githubusercontent.com/assets/13952758/17186881/dab08a04-53eb-11e6-8b1c-50ffd0bf2ae8.png)
      ![logs](https://cloud.githubusercontent.com/assets/13952758/17186880/dab04d00-53eb-11e6-8754-68dd64d6d9f4.png)
      
      Author: Alex Bozarth <ajbozart@us.ibm.com>
      
      Closes #14382 from ajbozarth/spark16673.
      e98eb214
    • Nick Lavers's avatar
      [SPARK-16961][CORE] Fixed off-by-one error that biased randomizeInPlace · 5377fc62
      Nick Lavers authored
      JIRA issue link:
      https://issues.apache.org/jira/browse/SPARK-16961
      
      Changed one line of Utils.randomizeInPlace to allow elements to stay in place.
      
      Created a unit test that runs a Pearson's chi squared test to determine whether the output diverges significantly from a uniform distribution.
      
      Author: Nick Lavers <nick.lavers@videoamp.com>
      
      Closes #14551 from nicklavers/SPARK-16961-randomizeInPlace.
      5377fc62
  10. Aug 17, 2016
    • Steve Loughran's avatar
      [SPARK-16736][CORE][SQL] purge superfluous fs calls · cc97ea18
      Steve Loughran authored
      A review of the code, working back from Hadoop's `FileSystem.exists()` and `FileSystem.isDirectory()` code, then removing uses of the calls when superfluous.
      
      1. delete is harmless if called on a nonexistent path, so don't do any checks before deletes
      1. any `FileSystem.exists()`  check before `getFileStatus()` or `open()` is superfluous as the operation itself does the check. Instead the `FileNotFoundException` is caught and triggers the downgraded path. When a `FileNotFoundException` was thrown before, the code still creates a new FNFE with the error messages. Though now the inner exceptions are nested, for easier diagnostics.
      
      Initially, relying on Jenkins test runs.
      
      One troublespot here is that some of the codepaths are clearly error situations; it's not clear that they have coverage anyway. Trying to create the failure conditions in tests would be ideal, but it will also be hard.
      
      Author: Steve Loughran <stevel@apache.org>
      
      Closes #14371 from steveloughran/cloud/SPARK-16736-superfluous-fs-calls.
      cc97ea18
  11. Aug 15, 2016
    • Marcelo Vanzin's avatar
      [SPARK-16671][CORE][SQL] Consolidate code to do variable substitution. · 5da6c4b2
      Marcelo Vanzin authored
      Both core and sql have slightly different code that does variable substitution
      of config values. This change refactors that code and encapsulates the logic
      of reading config values and expading variables in a new helper class, which
      can be configured so that both core and sql can use it without losing existing
      functionality, and allows for easier testing and makes it easier to add more
      features in the future.
      
      Tested with existing and new unit tests, and by running spark-shell with
      some configs referencing variables and making sure it behaved as expected.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #14468 from vanzin/SPARK-16671.
      5da6c4b2
    • Stavros Kontopoulos's avatar
      [SPARK-11714][MESOS] Make Spark on Mesos honor port restrictions on coarse grain mode · 1a028bde
      Stavros Kontopoulos authored
      - Make mesos coarse grained scheduler accept port offers and pre-assign ports
      
      Previous attempt was for fine grained: https://github.com/apache/spark/pull/10808
      
      Author: Stavros Kontopoulos <stavros.kontopoulos@lightbend.com>
      Author: Stavros Kontopoulos <stavros.kontopoulos@typesafe.com>
      
      Closes #11157 from skonto/honour_ports_coarse.
      1a028bde
  12. Aug 14, 2016
    • Zhenglai Zhang's avatar
      [WIP][MINOR][TYPO] Fix several trivival typos · 2a3d286f
      Zhenglai Zhang authored
      ## What changes were proposed in this pull request?
      
      * Fixed one typo `"overriden"` as `"overridden"`, also make sure no other same typo.
      * Fixed one typo `"lowcase"` as `"lowercase"`, also make sure no other same typo.
      
      ## How was this patch tested?
      
      Since the change is very tiny, so I just make sure compilation is successful.
      I am new to the spark community,  please feel free to let me do other necessary steps.
      
      Thanks in advance!
      
      ----
      Updated: Found another typo `lowcase` later and fixed then in the same patch
      
      Author: Zhenglai Zhang <zhenglaizhang@hotmail.com>
      
      Closes #14622 from zhenglaizhang/fixtypo.
      2a3d286f
  13. Aug 13, 2016
    • Xin Ren's avatar
      [MINOR][CORE] fix warnings on depreciated methods in... · 7f7133bd
      Xin Ren authored
      [MINOR][CORE] fix warnings on depreciated methods in MesosClusterSchedulerSuite and DiskBlockObjectWriterSuite
      
      ## What changes were proposed in this pull request?
      
      Fixed warnings below after scanning through warnings during build:
      
      ```
      [warn] /home/jenkins/workspace/SparkPullRequestBuilder/core/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterSchedulerSuite.scala:34: imported `Utils' is permanently hidden by definition of object Utils in package mesos
      [warn] import org.apache.spark.scheduler.cluster.mesos.Utils
      [warn]                                                 ^
      ```
      
      and
      ```
      [warn] /home/jenkins/workspace/SparkPullRequestBuilder/core/src/test/scala/org/apache/spark/storage/DiskBlockObjectWriterSuite.scala:113: method shuffleBytesWritten in class ShuffleWriteMetrics is deprecated: use bytesWritten instead
      [warn]     assert(writeMetrics.shuffleBytesWritten === file.length())
      [warn]                         ^
      [warn] /home/jenkins/workspace/SparkPullRequestBuilder/core/src/test/scala/org/apache/spark/storage/DiskBlockObjectWriterSuite.scala:119: method shuffleBytesWritten in class ShuffleWriteMetrics is deprecated: use bytesWritten instead
      [warn]     assert(writeMetrics.shuffleBytesWritten === file.length())
      [warn]                         ^
      [warn] /home/jenkins/workspace/SparkPullRequestBuilder/core/src/test/scala/org/apache/spark/storage/DiskBlockObjectWriterSuite.scala:131: method shuffleBytesWritten in class ShuffleWriteMetrics is deprecated: use bytesWritten instead
      [warn]     assert(writeMetrics.shuffleBytesWritten === file.length())
      [warn]                         ^
      [warn] /home/jenkins/workspace/SparkPullRequestBuilder/core/src/test/scala/org/apache/spark/storage/DiskBlockObjectWriterSuite.scala:135: method shuffleBytesWritten in class ShuffleWriteMetrics is deprecated: use bytesWritten instead
      [warn]     assert(writeMetrics.shuffleBytesWritten === file.length())
      [warn]                         ^
      ```
      
      ## How was this patch tested?
      
      Tested manually on local laptop.
      
      Author: Xin Ren <iamshrek@126.com>
      
      Closes #14609 from keypointt/suiteWarnings.
      7f7133bd
  14. Aug 12, 2016
  15. Aug 11, 2016
    • Jeff Zhang's avatar
      [SPARK-13081][PYSPARK][SPARK_SUBMIT] Allow set pythonExec of driver and executor through conf… · 7a9e25c3
      Jeff Zhang authored
      Before this PR, user have to export environment variable to specify the python of driver & executor which is not so convenient for users. This PR is trying to allow user to specify python through configuration "--pyspark-driver-python" & "--pyspark-executor-python"
      
      Manually test in local & yarn mode for pyspark-shell and pyspark batch mode.
      
      Author: Jeff Zhang <zjffdu@apache.org>
      
      Closes #13146 from zjffdu/SPARK-13081.
      7a9e25c3
    • huangzhaowei's avatar
      [SPARK-16868][WEB UI] Fix executor be both dead and alive on executor ui. · 4ec5c360
      huangzhaowei authored
      ## What changes were proposed in this pull request?
      In a heavy pressure of the spark application, since the executor will register it to driver block manager twice(because of heart beats), the executor will show as picture show:
      ![image](https://cloud.githubusercontent.com/assets/7404824/17467245/c1359094-5d4e-11e6-843a-f6d6347e1bf6.png)
      
      ## How was this patch tested?
      NA
      
      Details in: [SPARK-16868](https://issues.apache.org/jira/browse/SPARK-16868)
      
      Author: huangzhaowei <carlmartinmax@gmail.com>
      
      Closes #14530 from SaintBacchus/SPARK-16868.
      4ec5c360
    • Bryan Cutler's avatar
      [SPARK-13602][CORE] Add shutdown hook to DriverRunner to prevent driver process leak · 1c9a386c
      Bryan Cutler authored
      ## What changes were proposed in this pull request?
      
      Added shutdown hook to DriverRunner to kill the driver process in case the Worker JVM exits suddenly and the `WorkerWatcher` was unable to properly catch this.  Did some cleanup to consolidate driver state management and setting of finalized vars within the running thread.
      
      ## How was this patch tested?
      
      Added unit tests to verify that final state and exception variables are set accordingly for successfull, failed, and errors in the driver process.  Retrofitted existing test to verify killing of mocked process ends with the correct state and stops properly
      
      Manually tested (with deploy-mode=cluster) that the shutdown hook is called by forcibly exiting the `Worker` and various points in the code with the `WorkerWatcher` both disabled and enabled.  Also, manually killed the driver through the ui and verified that the `DriverRunner` interrupted, killed the process and exited properly.
      
      Author: Bryan Cutler <cutlerb@gmail.com>
      
      Closes #11746 from BryanCutler/DriverRunner-shutdown-hook-SPARK-13602.
      1c9a386c
    • Michael Gummelt's avatar
      [SPARK-16952] don't lookup spark home directory when executor uri is set · 4d496802
      Michael Gummelt authored
      ## What changes were proposed in this pull request?
      
      remove requirement to set spark.mesos.executor.home when spark.executor.uri is used
      
      ## How was this patch tested?
      
      unit tests
      
      Author: Michael Gummelt <mgummelt@mesosphere.io>
      
      Closes #14552 from mgummelt/fix-spark-home.
      4d496802
  16. Aug 10, 2016
    • jerryshao's avatar
      [SPARK-14743][YARN] Add a configurable credential manager for Spark running on YARN · ab648c00
      jerryshao authored
      ## What changes were proposed in this pull request?
      
      Add a configurable token manager for Spark on running on yarn.
      
      ### Current Problems ###
      
      1. Supported token provider is hard-coded, currently only hdfs, hbase and hive are supported and it is impossible for user to add new token provider without code changes.
      2. Also this problem exits in timely token renewer and updater.
      
      ### Changes In This Proposal ###
      
      In this proposal, to address the problems mentioned above and make the current code more cleaner and easier to understand, mainly has 3 changes:
      
      1. Abstract a `ServiceTokenProvider` as well as `ServiceTokenRenewable` interface for token provider. Each service wants to communicate with Spark through token way needs to implement this interface.
      2. Provide a `ConfigurableTokenManager` to manage all the register token providers, also token renewer and updater. Also this class offers the API for other modules to obtain tokens, get renewal interval and so on.
      3. Implement 3 built-in token providers `HDFSTokenProvider`, `HiveTokenProvider` and `HBaseTokenProvider` to keep the same semantics as supported today. Whether to load in these built-in token providers is controlled by configuration "spark.yarn.security.tokens.${service}.enabled", by default for all the built-in token providers are loaded.
      
      ### Behavior Changes ###
      
      For the end user there's no behavior change, we still use the same configuration `spark.yarn.security.tokens.${service}.enabled` to decide which token provider is enabled (hbase or hive).
      
      For user implemented token provider (assume the name of token provider is "test") needs to add into this class should have two configurations:
      
      1. `spark.yarn.security.tokens.test.enabled` to true
      2. `spark.yarn.security.tokens.test.class` to the full qualified class name.
      
      So we still keep the same semantics as current code while add one new configuration.
      
      ### Current Status ###
      
      - [x] token provider interface and management framework.
      - [x] implement built-in token providers (hdfs, hbase, hive).
      - [x] Coverage of unit test.
      - [x] Integrated test with security cluster.
      
      ## How was this patch tested?
      
      Unit test and integrated test.
      
      Please suggest and review, any comment is greatly appreciated.
      
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #14065 from jerryshao/SPARK-16342.
      ab648c00
    • Rajesh Balamohan's avatar
      [SPARK-12920][CORE] Honor "spark.ui.retainedStages" to reduce mem-pressure · bd2c12fb
      Rajesh Balamohan authored
      When large number of jobs are run concurrently with Spark thrift server, thrift server starts running at high CPU due to GC pressure. Job UI retention causes memory pressure with large jobs. https://issues.apache.org/jira/secure/attachment/12783302/SPARK-12920.profiler_job_progress_listner.png has the profiler snapshot. This PR honors `spark.ui.retainedStages` strictly to reduce memory pressure.
      
      Manual and unit tests
      
      Author: Rajesh Balamohan <rbalamohan@apache.org>
      
      Closes #10846 from rajeshbalamohan/SPARK-12920.
      bd2c12fb
    • Liang-Chi Hsieh's avatar
      [SPARK-15639] [SPARK-16321] [SQL] Push down filter at RowGroups level for parquet reader · 19af298b
      Liang-Chi Hsieh authored
      ## What changes were proposed in this pull request?
      
      The base class `SpecificParquetRecordReaderBase` used for vectorized parquet reader will try to get pushed-down filters from the given configuration. This pushed-down filters are used for RowGroups-level filtering. However, we don't set up the filters to push down into the configuration. In other words, the filters are not actually pushed down to do RowGroups-level filtering. This patch is to fix this and tries to set up the filters for pushing down to configuration for the reader.
      
      The benchmark that excludes the time of writing Parquet file:
      
          test("Benchmark for Parquet") {
            val N = 500 << 12
              withParquetTable((0 until N).map(i => (101, i)), "t") {
                val benchmark = new Benchmark("Parquet reader", N)
                benchmark.addCase("reading Parquet file", 10) { iter =>
                  sql("SELECT _1 FROM t where t._1 < 100").collect()
                }
                benchmark.run()
            }
          }
      
      `withParquetTable` in default will run tests for vectorized reader non-vectorized readers. I only let it run vectorized reader.
      
      When we set the block size of parquet as 1024 to have multiple row groups. The benchmark is:
      
      Before this patch:
      
      The retrieved row groups: 8063
      
          Java HotSpot(TM) 64-Bit Server VM 1.8.0_71-b15 on Linux 3.19.0-25-generic
          Intel(R) Core(TM) i7-5557U CPU  3.10GHz
          Parquet reader:                          Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
          ------------------------------------------------------------------------------------------------
          reading Parquet file                           825 / 1233          2.5         402.6       1.0X
      
      After this patch:
      
      The retrieved row groups: 0
      
          Java HotSpot(TM) 64-Bit Server VM 1.8.0_71-b15 on Linux 3.19.0-25-generic
          Intel(R) Core(TM) i7-5557U CPU  3.10GHz
          Parquet reader:                          Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
          ------------------------------------------------------------------------------------------------
          reading Parquet file                           306 /  503          6.7         149.6       1.0X
      
      Next, I run the benchmark for non-pushdown case using the same benchmark code but with disabled pushdown configuration. This time the parquet block size is default value.
      
      Before this patch:
      
          Java HotSpot(TM) 64-Bit Server VM 1.8.0_71-b15 on Linux 3.19.0-25-generic
          Intel(R) Core(TM) i7-5557U CPU  3.10GHz
          Parquet reader:                          Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
          ------------------------------------------------------------------------------------------------
          reading Parquet file                           136 /  238         15.0          66.5       1.0X
      
      After this patch:
      
          Java HotSpot(TM) 64-Bit Server VM 1.8.0_71-b15 on Linux 3.19.0-25-generic
          Intel(R) Core(TM) i7-5557U CPU  3.10GHz
          Parquet reader:                          Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
          ------------------------------------------------------------------------------------------------
          reading Parquet file                           124 /  193         16.5          60.7       1.0X
      
      For non-pushdown case, from the results, I think this patch doesn't affect normal code path.
      
      I've manually output the `totalRowCount` in `SpecificParquetRecordReaderBase` to see if this patch actually filter the row-groups. When running the above benchmark:
      
      After this patch:
          `totalRowCount = 0`
      
      Before this patch:
          `totalRowCount = 1024000`
      
      ## How was this patch tested?
      Existing tests should be passed.
      
      Author: Liang-Chi Hsieh <simonh@tw.ibm.com>
      
      Closes #13701 from viirya/vectorized-reader-push-down-filter2.
      19af298b
    • Timothy Chen's avatar
      [SPARK-16927][SPARK-16923] Override task properties at dispatcher. · eca58755
      Timothy Chen authored
      ## What changes were proposed in this pull request?
      
      - enable setting default properties for all jobs submitted through the dispatcher [SPARK-16927]
      - remove duplication of conf vars on cluster submitted jobs [SPARK-16923] (this is a small fix, so I'm including in the same PR)
      
      ## How was this patch tested?
      
      mesos/spark integration test suite
      manual testing
      
      Author: Timothy Chen <tnachen@gmail.com>
      
      Closes #14511 from mgummelt/override-props.
      eca58755
  17. Aug 09, 2016
    • Andrew Ash's avatar
      Make logDir easily copy/paste-able · 121643bc
      Andrew Ash authored
      In many terminals double-clicking and dragging also includes the trailing period.  Simply remove this to make the value more easily copy/pasteable.
      
      Example value:
      `hdfs://mybox-123.net.example.com:8020/spark-events.`
      
      Author: Andrew Ash <andrew@andrewash.com>
      
      Closes #14566 from ash211/patch-9.
      121643bc
    • Josh Rosen's avatar
      [SPARK-16956] Make ApplicationState.MAX_NUM_RETRY configurable · b89b3a5c
      Josh Rosen authored
      ## What changes were proposed in this pull request?
      
      This patch introduces a new configuration, `spark.deploy.maxExecutorRetries`, to let users configure an obscure behavior in the standalone master where the master will kill Spark applications which have experienced too many back-to-back executor failures. The current setting is a hardcoded constant (10); this patch replaces that with a new cluster-wide configuration.
      
      **Background:** This application-killing was added in 6b5980da (from September 2012) and I believe that it was designed to prevent a faulty application whose executors could never launch from DOS'ing the Spark cluster via an infinite series of executor launch attempts. In a subsequent patch (#1360), this feature was refined to prevent applications which have running executors from being killed by this code path.
      
      **Motivation for making this configurable:** Previously, if a Spark Standalone application experienced more than `ApplicationState.MAX_NUM_RETRY` executor failures and was left with no executors running then the Spark master would kill that application, but this behavior is problematic in environments where the Spark executors run on unstable infrastructure and can all simultaneously die. For instance, if your Spark driver runs on an on-demand EC2 instance while all workers run on ephemeral spot instances then it's possible for all executors to die at the same time while the driver stays alive. In this case, it may be desirable to keep the Spark application alive so that it can recover once new workers and executors are available. In order to accommodate this use-case, this patch modifies the Master to never kill faulty applications if `spark.deploy.maxExecutorRetries` is negative.
      
      I'd like to merge this patch into master, branch-2.0, and branch-1.6.
      
      ## How was this patch tested?
      
      I tested this manually using `spark-shell` and `local-cluster` mode. This is a tricky feature to unit test and historically this code has not changed very often, so I'd prefer to skip the additional effort of adding a testing framework and would rather rely on manual tests and review for now.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #14544 from JoshRosen/add-setting-for-max-executor-failures.
      b89b3a5c
    • Michael Gummelt's avatar
      [SPARK-16809] enable history server links in dispatcher UI · 62e62124
      Michael Gummelt authored
      ## What changes were proposed in this pull request?
      
      Links the Spark Mesos Dispatcher UI to the history server UI
      
      - adds spark.mesos.dispatcher.historyServer.url
      - explicitly generates frameworkIDs for the launched drivers, so the dispatcher knows how to correlate drivers and frameworkIDs
      
      ## How was this patch tested?
      
      manual testing
      
      Author: Michael Gummelt <mgummelt@mesosphere.io>
      Author: Sergiusz Urbaniak <sur@mesosphere.io>
      
      Closes #14414 from mgummelt/history-server.
      62e62124
    • Sun Rui's avatar
      [SPARK-16522][MESOS] Spark application throws exception on exit. · af710e5b
      Sun Rui authored
      ## What changes were proposed in this pull request?
      Spark applications running on Mesos throw exception upon exit. For details, refer to https://issues.apache.org/jira/browse/SPARK-16522.
      
      I am not sure if there is any better fix, so wait for review comments.
      
      ## How was this patch tested?
      Manual test. Observed that the exception is gone upon application exit.
      
      Author: Sun Rui <sunrui2016@gmail.com>
      
      Closes #14175 from sun-rui/SPARK-16522.
      af710e5b
    • Sean Owen's avatar
      [SPARK-16606][CORE] Misleading warning for SparkContext.getOrCreate "WARN... · 801e4d09
      Sean Owen authored
      [SPARK-16606][CORE] Misleading warning for SparkContext.getOrCreate "WARN SparkContext: Use an existing SparkContext, some configuration may not take effect."
      
      ## What changes were proposed in this pull request?
      
      SparkContext.getOrCreate shouldn't warn about ignored config if
      
      - it wasn't ignored because a new context is created with it or
      - no config was actually provided
      
      ## How was this patch tested?
      
      Jenkins + existing tests.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #14533 from srowen/SPARK-16606.
      801e4d09
  18. Aug 08, 2016
    • Holden Karau's avatar
      [SPARK-16779][TRIVIAL] Avoid using postfix operators where they do not add... · 9216901d
      Holden Karau authored
      [SPARK-16779][TRIVIAL] Avoid using postfix operators where they do not add much and remove whitelisting
      
      ## What changes were proposed in this pull request?
      
      Avoid using postfix operation for command execution in SQLQuerySuite where it wasn't whitelisted and audit existing whitelistings removing postfix operators from most places. Some notable places where postfix operation remains is in the XML parsing & time units (seconds, millis, etc.) where it arguably can improve readability.
      
      ## How was this patch tested?
      
      Existing tests.
      
      Author: Holden Karau <holden@us.ibm.com>
      
      Closes #14407 from holdenk/SPARK-16779.
      9216901d
    • Tathagata Das's avatar
      [SPARK-16953] Make requestTotalExecutors public Developer API to be consistent... · 86502390
      Tathagata Das authored
      [SPARK-16953] Make requestTotalExecutors public Developer API to be consistent with requestExecutors/killExecutors
      
      ## What changes were proposed in this pull request?
      
      RequestExecutors and killExecutor are public developer APIs for managing the number of executors allocated to the SparkContext. For consistency, requestTotalExecutors should also be a public Developer API, as it provides similar functionality. In fact, using requestTotalExecutors is more convenient that requestExecutors as the former is idempotent and the latter is not.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #14541 from tdas/SPARK-16953.
      86502390
    • Tejas Patil's avatar
      [SPARK-16919] Configurable update interval for console progress bar · e076fb05
      Tejas Patil authored
      ## What changes were proposed in this pull request?
      
      Currently the update interval for the console progress bar is hardcoded. This PR makes it configurable for users.
      
      ## How was this patch tested?
      
      Ran a long running job and with a high value of update interval, the updates were shown less frequently.
      
      Author: Tejas Patil <tejasp@fb.com>
      
      Closes #14507 from tejasapatil/SPARK-16919.
      e076fb05
  19. Aug 07, 2016
  20. Aug 06, 2016
    • Josh Rosen's avatar
      [SPARK-16925] Master should call schedule() after all executor exit events, not only failures · 4f5f9b67
      Josh Rosen authored
      ## What changes were proposed in this pull request?
      
      This patch fixes a bug in Spark's standalone Master which could cause applications to hang if tasks cause executors to exit with zero exit codes.
      
      As an example of the bug, run
      
      ```
      sc.parallelize(1 to 1, 1).foreachPartition { _ => System.exit(0) }
      ```
      
      on a standalone cluster which has a single Spark application. This will cause all executors to die but those executors won't be replaced unless another Spark application or worker joins or leaves the cluster (or if an executor exits with a non-zero exit code). This behavior is caused by a bug in how the Master handles the `ExecutorStateChanged` event: the current implementation calls `schedule()` only if the executor exited with a non-zero exit code, so a task which causes a JVM to unexpectedly exit "cleanly" will skip the `schedule()` call.
      
      This patch addresses this by modifying the `ExecutorStateChanged` to always unconditionally call `schedule()`. This should be safe because it should always be safe to call `schedule()`; adding extra `schedule()` calls can only affect performance and should not introduce correctness bugs.
      
      ## How was this patch tested?
      
      I added a regression test in `DistributedSuite`.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #14510 from JoshRosen/SPARK-16925.
      4f5f9b67
Loading