Skip to content
Snippets Groups Projects
  1. Feb 09, 2017
    • Shixiong Zhu's avatar
      [SPARK-19481] [REPL] [MAVEN] Avoid to leak SparkContext in Signaling.cancelOnInterrupt · 303f00a4
      Shixiong Zhu authored
      ## What changes were proposed in this pull request?
      
      `Signaling.cancelOnInterrupt` leaks a SparkContext per call and it makes ReplSuite unstable.
      
      This PR adds `SparkContext.getActive` to allow `Signaling.cancelOnInterrupt` to get the active `SparkContext` to avoid the leak.
      
      ## How was this patch tested?
      
      Jenkins
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #16825 from zsxwing/SPARK-19481.
      303f00a4
  2. Dec 03, 2016
    • hyukjinkwon's avatar
      [SPARK-18685][TESTS] Fix URI and release resources after opening in tests at... · d1312fb7
      hyukjinkwon authored
      [SPARK-18685][TESTS] Fix URI and release resources after opening in tests at ExecutorClassLoaderSuite
      
      ## What changes were proposed in this pull request?
      
      This PR fixes two problems as below:
      
      - Close `BufferedSource` after `Source.fromInputStream(...)` to release resource and make the tests pass on Windows in `ExecutorClassLoaderSuite`
      
        ```
        [info] Exception encountered when attempting to run a suite with class name: org.apache.spark.repl.ExecutorClassLoaderSuite *** ABORTED *** (7 seconds, 333 milliseconds)
        [info]   java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-77b2f37b-6405-47c4-af1c-4a6a206511f2
        [info]   at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1010)
        [info]   at org.apache.spark.repl.ExecutorClassLoaderSuite.afterAll(ExecutorClassLoaderSuite.scala:76)
        [info]   at org.scalatest.BeforeAndAfterAll$class.afterAll(BeforeAndAfterAll.scala:213)
        ...
        ```
      
      - Fix URI correctly so that related tests can be passed on Windows.
      
        ```
        [info] - child first *** FAILED *** (78 milliseconds)
        [info]   java.net.URISyntaxException: Illegal character in authority at index 7: file://C:\projects\spark\target\tmp\spark-00b66070-0548-463c-b6f3-8965d173da9b
        [info]   at java.net.URI$Parser.fail(URI.java:2848)
        [info]   at java.net.URI$Parser.parseAuthority(URI.java:3186)
        ...
        [info] - parent first *** FAILED *** (15 milliseconds)
        [info]   java.net.URISyntaxException: Illegal character in authority at index 7: file://C:\projects\spark\target\tmp\spark-00b66070-0548-463c-b6f3-8965d173da9b
        [info]   at java.net.URI$Parser.fail(URI.java:2848)
        [info]   at java.net.URI$Parser.parseAuthority(URI.java:3186)
        ...
        [info] - child first can fall back *** FAILED *** (0 milliseconds)
        [info]   java.net.URISyntaxException: Illegal character in authority at index 7: file://C:\projects\spark\target\tmp\spark-00b66070-0548-463c-b6f3-8965d173da9b
        [info]   at java.net.URI$Parser.fail(URI.java:2848)
        [info]   at java.net.URI$Parser.parseAuthority(URI.java:3186)
        ...
        [info] - child first can fail *** FAILED *** (0 milliseconds)
        [info]   java.net.URISyntaxException: Illegal character in authority at index 7: file://C:\projects\spark\target\tmp\spark-00b66070-0548-463c-b6f3-8965d173da9b
        [info]   at java.net.URI$Parser.fail(URI.java:2848)
        [info]   at java.net.URI$Parser.parseAuthority(URI.java:3186)
        ...
        [info] - resource from parent *** FAILED *** (0 milliseconds)
        [info]   java.net.URISyntaxException: Illegal character in authority at index 7: file://C:\projects\spark\target\tmp\spark-00b66070-0548-463c-b6f3-8965d173da9b
        [info]   at java.net.URI$Parser.fail(URI.java:2848)
        [info]   at java.net.URI$Parser.parseAuthority(URI.java:3186)
        ...
        [info] - resources from parent *** FAILED *** (0 milliseconds)
        [info]   java.net.URISyntaxException: Illegal character in authority at index 7: file://C:\projects\spark\target\tmp\spark-00b66070-0548-463c-b6f3-8965d173da9b
        [info]   at java.net.URI$Parser.fail(URI.java:2848)
        [info]   at java.net.URI$Parser.parseAuthority(URI.java:3186)
        ```
      
      ## How was this patch tested?
      
      Manually tested via AppVeyor.
      
      **Before**
      https://ci.appveyor.com/project/spark-test/spark/build/102-rpel-ExecutorClassLoaderSuite
      
      **After**
      https://ci.appveyor.com/project/spark-test/spark/build/108-rpel-ExecutorClassLoaderSuite
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #16116 from HyukjinKwon/close-after-open.
      Unverified
      d1312fb7
  3. Aug 17, 2016
    • Steve Loughran's avatar
      [SPARK-16736][CORE][SQL] purge superfluous fs calls · cc97ea18
      Steve Loughran authored
      A review of the code, working back from Hadoop's `FileSystem.exists()` and `FileSystem.isDirectory()` code, then removing uses of the calls when superfluous.
      
      1. delete is harmless if called on a nonexistent path, so don't do any checks before deletes
      1. any `FileSystem.exists()`  check before `getFileStatus()` or `open()` is superfluous as the operation itself does the check. Instead the `FileNotFoundException` is caught and triggers the downgraded path. When a `FileNotFoundException` was thrown before, the code still creates a new FNFE with the error messages. Though now the inner exceptions are nested, for easier diagnostics.
      
      Initially, relying on Jenkins test runs.
      
      One troublespot here is that some of the codepaths are clearly error situations; it's not clear that they have coverage anyway. Trying to create the failure conditions in tests would be ideal, but it will also be hard.
      
      Author: Steve Loughran <stevel@apache.org>
      
      Closes #14371 from steveloughran/cloud/SPARK-16736-superfluous-fs-calls.
      cc97ea18
  4. Aug 08, 2016
    • Holden Karau's avatar
      [SPARK-16779][TRIVIAL] Avoid using postfix operators where they do not add... · 9216901d
      Holden Karau authored
      [SPARK-16779][TRIVIAL] Avoid using postfix operators where they do not add much and remove whitelisting
      
      ## What changes were proposed in this pull request?
      
      Avoid using postfix operation for command execution in SQLQuerySuite where it wasn't whitelisted and audit existing whitelistings removing postfix operators from most places. Some notable places where postfix operation remains is in the XML parsing & time units (seconds, millis, etc.) where it arguably can improve readability.
      
      ## How was this patch tested?
      
      Existing tests.
      
      Author: Holden Karau <holden@us.ibm.com>
      
      Closes #14407 from holdenk/SPARK-16779.
      9216901d
  5. May 31, 2016
  6. Apr 22, 2016
    • Reynold Xin's avatar
      [SPARK-10001] Consolidate Signaling and SignalLogger. · c089c6f4
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      This is a follow-up to #12557, with the following changes:
      
      1. Fixes some of the style issues.
      2. Merges Signaling and SignalLogger into a new class called SignalUtils. It was pretty confusing to have Signaling and Signal in one file, and it was also confusing to have two classes named Signaling and one called the other.
      3. Made logging registration idempotent.
      
      ## How was this patch tested?
      N/A.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #12605 from rxin/SPARK-10001.
      c089c6f4
    • Jakob Odersky's avatar
      [SPARK-10001] [CORE] Interrupt tasks in repl with Ctrl+C · 80127935
      Jakob Odersky authored
      ## What changes were proposed in this pull request?
      
      Improve signal handling to allow interrupting running tasks from the REPL (with Ctrl+C).
      If no tasks are running or Ctrl+C is pressed twice, the signal is forwarded to the default handler resulting in the usual termination of the application.
      
      This PR is a rewrite of -- and therefore closes #8216 -- as per piaozhexiu's request
      
      ## How was this patch tested?
      Signal handling is not easily testable therefore no unit tests were added. Nevertheless, the new functionality is implemented in a best-effort approach, soft-failing in case signals aren't available on a specific OS.
      
      Author: Jakob Odersky <jakob@odersky.com>
      
      Closes #12557 from jodersky/SPARK-10001-sigint.
      80127935
  7. Apr 20, 2016
    • jerryshao's avatar
      [SPARK-14725][CORE] Remove HttpServer class · 90cbc82f
      jerryshao authored
      ## What changes were proposed in this pull request?
      
      This proposal removes the class `HttpServer`, with the changing of internal file/jar/class transmission to RPC layer, currently there's no code using this `HttpServer`, so here propose to remove it.
      
      ## How was this patch tested?
      
      Unit test is verified locally.
      
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #12526 from jerryshao/SPARK-14725.
      90cbc82f
  8. Apr 12, 2016
  9. Apr 06, 2016
    • Marcelo Vanzin's avatar
      [SPARK-14134][CORE] Change the package name used for shading classes. · 21d5ca12
      Marcelo Vanzin authored
      The current package name uses a dash, which is a little weird but seemed
      to work. That is, until a new test tried to mock a class that references
      one of those shaded types, and then things started failing.
      
      Most changes are just noise to fix the logging configs.
      
      For reference, SPARK-8815 also raised this issue, although at the time it
      did not cause any issues in Spark, so it was not addressed.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #11941 from vanzin/SPARK-14134.
      21d5ca12
  10. Mar 17, 2016
  11. Jan 05, 2016
  12. Dec 31, 2015
  13. Dec 24, 2015
    • Kazuaki Ishizaki's avatar
      [SPARK-12311][CORE] Restore previous value of "os.arch" property in test... · 39204661
      Kazuaki Ishizaki authored
      [SPARK-12311][CORE] Restore previous value of "os.arch" property in test suites after forcing to set specific value to "os.arch" property
      
      Restore the original value of os.arch property after each test
      
      Since some of tests forced to set the specific value to os.arch property, we need to set the original value.
      
      Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com>
      
      Closes #10289 from kiszk/SPARK-12311.
      39204661
  14. Dec 18, 2015
    • Marcelo Vanzin's avatar
      [SPARK-12350][CORE] Don't log errors when requested stream is not found. · 27828182
      Marcelo Vanzin authored
      If a client requests a non-existent stream, just send a failure message
      back, without logging any error on the server side (since it's not a
      server error).
      
      On the executor side, avoid error logs by translating any errors during
      transfer to a `ClassNotFoundException`, so that loading the class is
      retried on a the parent class loader. This can mask IO errors during
      transmission, but the most common cause is that the class is not
      served by the remote end.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #10337 from vanzin/SPARK-12350.
      27828182
  15. Dec 10, 2015
    • Marcelo Vanzin's avatar
      [SPARK-11563][CORE][REPL] Use RpcEnv to transfer REPL-generated classes. · 4a46b885
      Marcelo Vanzin authored
      This avoids bringing up yet another HTTP server on the driver, and
      instead reuses the file server already managed by the driver's
      RpcEnv. As a bonus, the repl now inherits the security features of
      the network library.
      
      There's also a small change to create the directory for storing classes
      under the root temp dir for the application (instead of directly
      under java.io.tmpdir).
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #9923 from vanzin/SPARK-11563.
      4a46b885
  16. Nov 24, 2015
  17. Nov 20, 2015
  18. Nov 11, 2015
  19. May 29, 2015
    • Andrew Or's avatar
      [SPARK-7558] Demarcate tests in unit-tests.log · 9eb222c1
      Andrew Or authored
      Right now `unit-tests.log` are not of much value because we can't tell where the test boundaries are easily. This patch adds log statements before and after each test to outline the test boundaries, e.g.:
      
      ```
      ===== TEST OUTPUT FOR o.a.s.serializer.KryoSerializerSuite: 'kryo with parallelize for primitive arrays' =====
      
      15/05/27 12:36:39.596 pool-1-thread-1-ScalaTest-running-KryoSerializerSuite INFO SparkContext: Starting job: count at KryoSerializerSuite.scala:230
      15/05/27 12:36:39.596 dag-scheduler-event-loop INFO DAGScheduler: Got job 3 (count at KryoSerializerSuite.scala:230) with 4 output partitions (allowLocal=false)
      15/05/27 12:36:39.596 dag-scheduler-event-loop INFO DAGScheduler: Final stage: ResultStage 3(count at KryoSerializerSuite.scala:230)
      15/05/27 12:36:39.596 dag-scheduler-event-loop INFO DAGScheduler: Parents of final stage: List()
      15/05/27 12:36:39.597 dag-scheduler-event-loop INFO DAGScheduler: Missing parents: List()
      15/05/27 12:36:39.597 dag-scheduler-event-loop INFO DAGScheduler: Submitting ResultStage 3 (ParallelCollectionRDD[5] at parallelize at KryoSerializerSuite.scala:230), which has no missing parents
      
      ...
      
      15/05/27 12:36:39.624 pool-1-thread-1-ScalaTest-running-KryoSerializerSuite INFO DAGScheduler: Job 3 finished: count at KryoSerializerSuite.scala:230, took 0.028563 s
      15/05/27 12:36:39.625 pool-1-thread-1-ScalaTest-running-KryoSerializerSuite INFO KryoSerializerSuite:
      
      ***** FINISHED o.a.s.serializer.KryoSerializerSuite: 'kryo with parallelize for primitive arrays' *****
      
      ...
      ```
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #6441 from andrewor14/demarcate-tests and squashes the following commits:
      
      879b060 [Andrew Or] Fix compile after rebase
      d622af7 [Andrew Or] Merge branch 'master' of github.com:apache/spark into demarcate-tests
      017c8ba [Andrew Or] Merge branch 'master' of github.com:apache/spark into demarcate-tests
      7790b6c [Andrew Or] Fix tests after logical merge conflict
      c7460c0 [Andrew Or] Merge branch 'master' of github.com:apache/spark into demarcate-tests
      c43ffc4 [Andrew Or] Fix tests?
      8882581 [Andrew Or] Fix tests
      ee22cda [Andrew Or] Fix log message
      fa9450e [Andrew Or] Merge branch 'master' of github.com:apache/spark into demarcate-tests
      12d1e1b [Andrew Or] Various whitespace changes (minor)
      69cbb24 [Andrew Or] Make all test suites extend SparkFunSuite instead of FunSuite
      bbce12e [Andrew Or] Fix manual things that cannot be covered through automation
      da0b12f [Andrew Or] Add core tests as dependencies in all modules
      f7d29ce [Andrew Or] Introduce base abstract class for all test suites
      9eb222c1
  20. Apr 09, 2015
  21. Mar 24, 2015
    • Josh Rosen's avatar
      [SPARK-6209] Clean up connections in ExecutorClassLoader after failing to load... · 7215aa74
      Josh Rosen authored
      [SPARK-6209] Clean up connections in ExecutorClassLoader after failing to load classes (master branch PR)
      
      ExecutorClassLoader does not ensure proper cleanup of network connections that it opens. If it fails to load a class, it may leak partially-consumed InputStreams that are connected to the REPL's HTTP class server, causing that server to exhaust its thread pool, which can cause the entire job to hang.  See [SPARK-6209](https://issues.apache.org/jira/browse/SPARK-6209) for more details, including a bug reproduction.
      
      This patch fixes this issue by ensuring proper cleanup of these resources.  It also adds logging for unexpected error cases.
      
      This PR is an extended version of #4935 and adds a regression test.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #4944 from JoshRosen/executorclassloader-leak-master-branch and squashes the following commits:
      
      e0e3c25 [Josh Rosen] Wrap try block around getReponseCode; re-enable keep-alive by closing error stream
      961c284 [Josh Rosen] Roll back changes that were added to get the regression test to fail
      7ee2261 [Josh Rosen] Add a failing regression test
      e2d70a3 [Josh Rosen] Properly clean up after errors in ExecutorClassLoader
      7215aa74
  22. Feb 02, 2015
    • Jacek Lewandowski's avatar
      Spark 3883: SSL support for HttpServer and Akka · cfea3003
      Jacek Lewandowski authored
      SPARK-3883: SSL support for Akka connections and Jetty based file servers.
      
      This story introduced the following changes:
      - Introduced SSLOptions object which holds the SSL configuration and can build the appropriate configuration for Akka or Jetty. SSLOptions can be created by parsing SparkConf entries at a specified namespace.
      - SSLOptions is created and kept by SecurityManager
      - All Akka actor address creation snippets based on interpolated strings were replaced by a dedicated methods from AkkaUtils. Those methods select the proper Akka protocol - whether akka.tcp or akka.ssl.tcp
      - Added tests cases for AkkaUtils, FileServer, SSLOptions and SecurityManager
      - Added a way to use node local SSL configuration by executors and driver in standalone mode. It can be done by specifying spark.ssl.useNodeLocalConf in SparkConf.
      - Made CoarseGrainedExecutorBackend not overwrite the settings which are executor startup configuration - they are passed anyway from Worker
      
      Refer to https://github.com/apache/spark/pull/3571 for discussion and details
      
      Author: Jacek Lewandowski <lewandowski.jacek@gmail.com>
      Author: Jacek Lewandowski <jacek.lewandowski@datastax.com>
      
      Closes #3571 from jacek-lewandowski/SPARK-3883-master and squashes the following commits:
      
      9ef4ed1 [Jacek Lewandowski] Merge pull request #2 from jacek-lewandowski/SPARK-3883-docs2
      fb31b49 [Jacek Lewandowski] SPARK-3883: Added SSL setup documentation
      2532668 [Jacek Lewandowski] SPARK-3883: Refactored AkkaUtils.protocol method to not use Try
      90a8762 [Jacek Lewandowski] SPARK-3883: Refactored methods to resolve Akka address and made it possible to easily configure multiple communication layers for SSL
      72b2541 [Jacek Lewandowski] SPARK-3883: A reference to the fallback SSLOptions can be provided when constructing SSLOptions
      93050f4 [Jacek Lewandowski] SPARK-3883: SSL support for HttpServer and Akka
      cfea3003
  23. Feb 01, 2015
  24. Jan 06, 2015
    • Sean Owen's avatar
      SPARK-4159 [CORE] Maven build doesn't run JUnit test suites · 4cba6eb4
      Sean Owen authored
      This PR:
      
      - Reenables `surefire`, and copies config from `scalatest` (which is itself an old fork of `surefire`, so similar)
      - Tells `surefire` to test only Java tests
      - Enables `surefire` and `scalatest` for all children, and in turn eliminates some duplication.
      
      For me this causes the Scala and Java tests to be run once each, it seems, as desired. It doesn't affect the SBT build but works for Maven. I still need to verify that all of the Scala tests and Java tests are being run.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #3651 from srowen/SPARK-4159 and squashes the following commits:
      
      2e8a0af [Sean Owen] Remove specialized SPARK_HOME setting for REPL, YARN tests as it appears to be obsolete
      12e4558 [Sean Owen] Append to unit-test.log instead of overwriting, so that both surefire and scalatest output is preserved. Also standardize/correct comments a bit.
      e6f8601 [Sean Owen] Reenable Java tests by reenabling surefire with config cloned from scalatest; centralize test config in the parent
      4cba6eb4
  25. Nov 11, 2014
    • Prashant Sharma's avatar
      Support cross building for Scala 2.11 · daaca14c
      Prashant Sharma authored
      Let's give this another go using a version of Hive that shades its JLine dependency.
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #3159 from pwendell/scala-2.11-prashant and squashes the following commits:
      
      e93aa3e [Patrick Wendell] Restoring -Phive-thriftserver profile and cleaning up build script.
      f65d17d [Patrick Wendell] Fixing build issue due to merge conflict
      a8c41eb [Patrick Wendell] Reverting dev/run-tests back to master state.
      7a6eb18 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into scala-2.11-prashant
      583aa07 [Prashant Sharma] REVERT ME: removed hive thirftserver
      3680e58 [Prashant Sharma] Revert "REVERT ME: Temporarily removing some Cli tests."
      935fb47 [Prashant Sharma] Revert "Fixed by disabling a few tests temporarily."
      925e90f [Prashant Sharma] Fixed by disabling a few tests temporarily.
      2fffed3 [Prashant Sharma] Exclude groovy from sbt build, and also provide a way for such instances in future.
      8bd4e40 [Prashant Sharma] Switched to gmaven plus, it fixes random failures observer with its predecessor gmaven.
      5272ce5 [Prashant Sharma] SPARK_SCALA_VERSION related bugs.
      2121071 [Patrick Wendell] Migrating version detection to PySpark
      b1ed44d [Patrick Wendell] REVERT ME: Temporarily removing some Cli tests.
      1743a73 [Patrick Wendell] Removing decimal test that doesn't work with Scala 2.11
      f5cad4e [Patrick Wendell] Add Scala 2.11 docs
      210d7e1 [Patrick Wendell] Revert "Testing new Hive version with shaded jline"
      48518ce [Patrick Wendell] Remove association of Hive and Thriftserver profiles.
      e9d0a06 [Patrick Wendell] Revert "Enable thritfserver for Scala 2.10 only"
      67ec364 [Patrick Wendell] Guard building of thriftserver around Scala 2.10 check
      8502c23 [Patrick Wendell] Enable thritfserver for Scala 2.10 only
      e22b104 [Patrick Wendell] Small fix in pom file
      ec402ab [Patrick Wendell] Various fixes
      0be5a9d [Patrick Wendell] Testing new Hive version with shaded jline
      4eaec65 [Prashant Sharma] Changed scripts to ignore target.
      5167bea [Prashant Sharma] small correction
      a4fcac6 [Prashant Sharma] Run against scala 2.11 on jenkins.
      80285f4 [Prashant Sharma] MAven equivalent of setting spark.executor.extraClasspath during tests.
      034b369 [Prashant Sharma] Setting test jars on executor classpath during tests from sbt.
      d4874cb [Prashant Sharma] Fixed Python Runner suite. null check should be first case in scala 2.11.
      6f50f13 [Prashant Sharma] Fixed build after rebasing with master. We should use ${scala.binary.version} instead of just 2.10
      e56ca9d [Prashant Sharma] Print an error if build for 2.10 and 2.11 is spotted.
      937c0b8 [Prashant Sharma] SCALA_VERSION -> SPARK_SCALA_VERSION
      cb059b0 [Prashant Sharma] Code review
      0476e5e [Prashant Sharma] Scala 2.11 support with repl and all build changes.
      daaca14c
  26. Oct 09, 2014
    • Sean Owen's avatar
      SPARK-3811 [CORE] More robust / standard Utils.deleteRecursively, Utils.createTempDir · 363baaca
      Sean Owen authored
      I noticed a few issues with how temp directories are created and deleted:
      
      *Minor*
      
      * Guava's `Files.createTempDir()` plus `File.deleteOnExit()` is used in many tests to make a temp dir, but `Utils.createTempDir()` seems to be the standard Spark mechanism
      * Call to `File.deleteOnExit()` could be pushed into `Utils.createTempDir()` as well, along with this replacement
      * _I messed up the message in an exception in `Utils` in SPARK-3794; fixed here_
      
      *Bit Less Minor*
      
      * `Utils.deleteRecursively()` fails immediately if any `IOException` occurs, instead of trying to delete any remaining files and subdirectories. I've observed this leave temp dirs around. I suggest changing it to continue in the face of an exception and throw one of the possibly several exceptions that occur at the end.
      * `Utils.createTempDir()` will add a JVM shutdown hook every time the method is called. Even if the subdir is the parent of another parent dir, since this check is inside the hook. However `Utils` manages a set of all dirs to delete on shutdown already, called `shutdownDeletePaths`. A single hook can be registered to delete all of these on exit. This is how Tachyon temp paths are cleaned up in `TachyonBlockManager`.
      
      I noticed a few other things that might be changed but wanted to ask first:
      
      * Shouldn't the set of dirs to delete be `File`, not just `String` paths?
      * `Utils` manages the set of `TachyonFile` that have been registered for deletion, but the shutdown hook is managed in `TachyonBlockManager`. Should this logic not live together, and not in `Utils`? it's more specific to Tachyon, and looks a slight bit odd to import in such a generic place.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #2670 from srowen/SPARK-3811 and squashes the following commits:
      
      071ae60 [Sean Owen] Update per @vanzin's review
      da0146d [Sean Owen] Make Utils.deleteRecursively try to delete all paths even when an exception occurs; use one shutdown hook instead of one per method call to delete temp dirs
      3a0faa4 [Sean Owen] Standardize on Utils.createTempDir instead of Files.createTempDir
      363baaca
  27. Oct 08, 2014
    • Ahir Reddy's avatar
      [SPARK-3836] [REPL] Spark REPL optionally propagate internal exceptions · c7818434
      Ahir Reddy authored
      Optionally have the repl throw exceptions generated by interpreted code, instead of swallowing the exception and returning it as text output. This is useful when embedding the repl, otherwise it's not possible to know when user code threw an exception.
      
      Author: Ahir Reddy <ahirreddy@gmail.com>
      
      Closes #2695 from ahirreddy/repl-throw-exceptions and squashes the following commits:
      
      bad25ee [Ahir Reddy] Style Fixes
      f0e5b44 [Ahir Reddy] Fixed style
      0d4413d [Ahir Reddy] propogate excetions from repl
      c7818434
  28. Oct 01, 2014
    • Reynold Xin's avatar
      [SPARK-3748] Log thread name in unit test logs · 3888ee2f
      Reynold Xin authored
      Thread names are useful for correlating failures.
      
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #2600 from rxin/log4j and squashes the following commits:
      
      83ffe88 [Reynold Xin] [SPARK-3748] Log thread name in unit test logs
      3888ee2f
  29. Sep 11, 2014
    • witgo's avatar
      SPARK-2482: Resolve sbt warnings during build · 33c7a738
      witgo authored
      At the same time, import the `scala.language.postfixOps` and ` org.scalatest.time.SpanSugar._` cause `scala.language.postfixOps` doesn't work
      
      Author: witgo <witgo@qq.com>
      
      Closes #1330 from witgo/sbt_warnings3 and squashes the following commits:
      
      179ba61 [witgo] Resolve sbt warnings during build
      33c7a738
  30. Sep 06, 2014
  31. Sep 02, 2014
    • Andrew Or's avatar
      [SPARK-1919] Fix Windows spark-shell --jars · 8f1f9aaf
      Andrew Or authored
      We were trying to add `file:/C:/path/to/my.jar` to the class path. We should add `C:/path/to/my.jar` instead. Tested on Windows 8.1.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #2211 from andrewor14/windows-shell-jars and squashes the following commits:
      
      262c6a2 [Andrew Or] Oops... Add the new code to the correct place
      0d5a0c1 [Andrew Or] Format jar path only for adding to shell classpath
      42bd626 [Andrew Or] Remove unnecessary code
      0049f1b [Andrew Or] Remove embarrassing log messages
      b1755a0 [Andrew Or] Format jar paths properly before adding them to the classpath
      8f1f9aaf
  32. Aug 30, 2014
    • Marcelo Vanzin's avatar
      [SPARK-2889] Create Hadoop config objects consistently. · b6cf1348
      Marcelo Vanzin authored
      Different places in the code were instantiating Configuration / YarnConfiguration objects in different ways. This could lead to confusion for people who actually expected "spark.hadoop.*" options to end up in the configs used by Spark code, since that would only happen for the SparkContext's config.
      
      This change modifies most places to use SparkHadoopUtil to initialize configs, and make that method do the translation that previously was only done inside SparkContext.
      
      The places that were not changed fall in one of the following categories:
      - Test code where this doesn't really matter
      - Places deep in the code where plumbing SparkConf would be too difficult for very little gain
      - Default values for arguments - since the caller can provide their own config in that case
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #1843 from vanzin/SPARK-2889 and squashes the following commits:
      
      52daf35 [Marcelo Vanzin] Merge branch 'master' into SPARK-2889
      f179013 [Marcelo Vanzin] Merge branch 'master' into SPARK-2889
      51e71cf [Marcelo Vanzin] Add test to ensure that overriding Yarn configs works.
      53f9506 [Marcelo Vanzin] Add DeveloperApi annotation.
      3d345cb [Marcelo Vanzin] Restore old method for backwards compat.
      fc45067 [Marcelo Vanzin] Merge branch 'master' into SPARK-2889
      0ac3fdf [Marcelo Vanzin] Merge branch 'master' into SPARK-2889
      3f26760 [Marcelo Vanzin] Compilation fix.
      f16cadd [Marcelo Vanzin] Initialize config in SparkHadoopUtil.
      b8ab173 [Marcelo Vanzin] Update Utils API to take a Configuration argument.
      1e7003f [Marcelo Vanzin] Replace explicit Configuration instantiation with SparkHadoopUtil.
      b6cf1348
  33. Aug 27, 2014
    • Chip Senkbeil's avatar
      [SPARK-3256] Added support for :cp <jar> that was broken in Scala 2.10.x for REPL · 191d7cf2
      Chip Senkbeil authored
      As seen with [SI-6502](https://issues.scala-lang.org/browse/SI-6502) of Scala, the _:cp_ command was broken in Scala 2.10.x. As the Spark shell is a friendly wrapper on top of the Scala REPL, it is also affected by this problem.
      
      My solution was to alter the internal classpath and invalidate any new entries. I also had to add the ability to add new entries to the parent classloader of the interpreter (SparkIMain's global).
      
      The advantage of this versus wiping the interpreter and replaying all of the commands is that you don't have to worry about rerunning heavy Spark-related commands (going to the cluster) or potentially reloading data that might have changed. Instead, you get to work from where you left off.
      
      Until this is fixed upstream for 2.10.x, I had to use reflection to alter the internal compiler classpath.
      
      The solution now looks like this:
      ![screen shot 2014-08-13 at 3 46 02 pm](https://cloud.githubusercontent.com/assets/2481802/3912625/f02b1440-232c-11e4-9bf6-bafb3e352d14.png)
      
      Author: Chip Senkbeil <rcsenkbe@us.ibm.com>
      
      Closes #1929 from rcsenkbeil/FixReplClasspathSupport and squashes the following commits:
      
      f420cbf [Chip Senkbeil] Added SparkContext.addJar calls to support executing code on remote clusters
      a826795 [Chip Senkbeil] Updated AddUrlsToClasspath to use 'new Run' suggestion over hackish compiler error
      2ff1d86 [Chip Senkbeil] Added compilation failure on symbols hack to get Scala classes to load correctly
      a220639 [Chip Senkbeil] Added support for :cp <jar> that was broken in Scala 2.10.x for REPL
      191d7cf2
  34. Aug 06, 2014
    • Andrew Or's avatar
      [SPARK-2157] Enable tight firewall rules for Spark · 09f7e458
      Andrew Or authored
      The goal of this PR is to allow users of Spark to write tight firewall rules for their clusters. This is currently not possible because Spark uses random ports in many places, notably the communication between executors and drivers. The changes in this PR are based on top of ash211's changes in #1107.
      
      The list covered here may or may not be the complete set of port needed for Spark to operate perfectly. However, as of the latest commit there are no known sources of random ports (except in tests). I have not documented a few of the more obscure configs.
      
      My spark-env.sh looks like this:
      ```
      export SPARK_MASTER_PORT=6060
      export SPARK_WORKER_PORT=7070
      export SPARK_MASTER_WEBUI_PORT=9090
      export SPARK_WORKER_WEBUI_PORT=9091
      ```
      and my spark-defaults.conf looks like this:
      ```
      spark.master spark://andrews-mbp:6060
      spark.driver.port 5001
      spark.fileserver.port 5011
      spark.broadcast.port 5021
      spark.replClassServer.port 5031
      spark.blockManager.port 5041
      spark.executor.port 5051
      ```
      
      Author: Andrew Or <andrewor14@gmail.com>
      Author: Andrew Ash <andrew@andrewash.com>
      
      Closes #1777 from andrewor14/configure-ports and squashes the following commits:
      
      621267b [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports
      8a6b820 [Andrew Or] Use a random UI port during tests
      7da0493 [Andrew Or] Fix tests
      523c30e [Andrew Or] Add test for isBindCollision
      b97b02a [Andrew Or] Minor fixes
      c22ad00 [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports
      93d359f [Andrew Or] Executors connect to wrong port when collision occurs
      d502e5f [Andrew Or] Handle port collisions when creating Akka systems
      a2dd05c [Andrew Or] Patrick's comment nit
      86461e2 [Andrew Or] Remove spark.executor.env.port and spark.standalone.client.port
      1d2d5c6 [Andrew Or] Fix ports for standalone cluster mode
      cb3be88 [Andrew Or] Various doc fixes (broken link, format etc.)
      e837cde [Andrew Or] Remove outdated TODOs
      bfbab28 [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports
      de1b207 [Andrew Or] Update docs to reflect new ports
      b565079 [Andrew Or] Add spark.ports.maxRetries
      2551eb2 [Andrew Or] Remove spark.worker.watcher.port
      151327a [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports
      9868358 [Andrew Or] Add a few miscellaneous ports
      6016e77 [Andrew Or] Add spark.executor.port
      8d836e6 [Andrew Or] Also document SPARK_{MASTER/WORKER}_WEBUI_PORT
      4d9e6f3 [Andrew Or] Fix super subtle bug
      3f8e51b [Andrew Or] Correct erroneous docs...
      e111d08 [Andrew Or] Add names for UI services
      470f38c [Andrew Or] Special case non-"Address already in use" exceptions
      1d7e408 [Andrew Or] Treat 0 ports specially + return correct ConnectionManager port
      ba32280 [Andrew Or] Minor fixes
      6b550b0 [Andrew Or] Assorted fixes
      73fbe89 [Andrew Or] Move start service logic to Utils
      ec676f4 [Andrew Or] Merge branch 'SPARK-2157' of github.com:ash211/spark into configure-ports
      038a579 [Andrew Ash] Trust the server start function to report the port the service started on
      7c5bdc4 [Andrew Ash] Fix style issue
      0347aef [Andrew Ash] Unify port fallback logic to a single place
      24a4c32 [Andrew Ash] Remove type on val to match surrounding style
      9e4ad96 [Andrew Ash] Reformat for style checker
      5d84e0e [Andrew Ash] Document new port configuration options
      066dc7a [Andrew Ash] Fix up HttpServer port increments
      cad16da [Andrew Ash] Add fallover increment logic for HttpServer
      c5a0568 [Andrew Ash] Fix ConnectionManager to retry with increment
      b80d2fd [Andrew Ash] Make Spark's block manager port configurable
      17c79bb [Andrew Ash] Add a configuration option for spark-shell's class server
      f34115d [Andrew Ash] SPARK-1176 Add port configuration for HttpBroadcast
      49ee29b [Andrew Ash] SPARK-1174 Add port configuration for HttpFileServer
      1c0981a [Andrew Ash] Make port in HttpServer configurable
      09f7e458
  35. Aug 02, 2014
    • Andrew Or's avatar
      [SPARK-2454] Do not ship spark home to Workers · 148af608
      Andrew Or authored
      When standalone Workers launch executors, they inherit the Spark home set by the driver. This means if the worker machines do not share the same directory structure as the driver node, the Workers will attempt to run scripts (e.g. bin/compute-classpath.sh) that do not exist locally and fail. This is a common scenario if the driver is launched from outside of the cluster.
      
      The solution is to simply not pass the driver's Spark home to the Workers. This PR further makes an attempt to avoid overloading the usages of `spark.home`, which is now only used for setting executor Spark home on Mesos and in python.
      
      This is based on top of #1392 and originally reported by YanTangZhai. Tested on standalone cluster.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #1734 from andrewor14/spark-home-reprise and squashes the following commits:
      
      f71f391 [Andrew Or] Revert changes in python
      1c2532c [Andrew Or] Merge branch 'master' of github.com:apache/spark into spark-home-reprise
      188fc5d [Andrew Or] Avoid using spark.home where possible
      09272b7 [Andrew Or] Always use Worker's working directory as spark home
      148af608
  36. Aug 01, 2014
    • Prashant Sharma's avatar
      SPARK-2632, SPARK-2576. Fixed by only importing what is necessary during class definition. · 14991011
      Prashant Sharma authored
      Without this patch, it imports everything available in the scope.
      
      ```scala
      
      scala> val a = 10l
      val a = 10l
      a: Long = 10
      
      scala> import a._
      import a._
      import a._
      
      scala> case class A(a: Int) // show
      case class A(a: Int) // show
      class $read extends Serializable {
        def <init>() = {
          super.<init>;
          ()
        };
        class $iwC extends Serializable {
          def <init>() = {
            super.<init>;
            ()
          };
          class $iwC extends Serializable {
            def <init>() = {
              super.<init>;
              ()
            };
            import org.apache.spark.SparkContext._;
            class $iwC extends Serializable {
              def <init>() = {
                super.<init>;
                ()
              };
              val $VAL5 = $line5.$read.INSTANCE;
              import $VAL5.$iw.$iw.$iw.$iw.a;
              class $iwC extends Serializable {
                def <init>() = {
                  super.<init>;
                  ()
                };
                import a._;
                class $iwC extends Serializable {
                  def <init>() = {
                    super.<init>;
                    ()
                  };
                  class $iwC extends Serializable {
                    def <init>() = {
                      super.<init>;
                      ()
                    };
                    case class A extends scala.Product with scala.Serializable {
                      <caseaccessor> <paramaccessor> val a: Int = _;
                      def <init>(a: Int) = {
                        super.<init>;
                        ()
                      }
                    }
                  };
                  val $iw = new $iwC.<init>
                };
                val $iw = new $iwC.<init>
              };
              val $iw = new $iwC.<init>
            };
            val $iw = new $iwC.<init>
          };
          val $iw = new $iwC.<init>
        };
        val $iw = new $iwC.<init>
      }
      object $read extends scala.AnyRef {
        def <init>() = {
          super.<init>;
          ()
        };
        val INSTANCE = new $read.<init>
      }
      defined class A
      ```
      
      With this patch, it just imports  only the necessary.
      
      ```scala
      
      scala> val a = 10l
      val a = 10l
      a: Long = 10
      
      scala> import a._
      import a._
      import a._
      
      scala> case class A(a: Int) // show
      case class A(a: Int) // show
      class $read extends Serializable {
        def <init>() = {
          super.<init>;
          ()
        };
        class $iwC extends Serializable {
          def <init>() = {
            super.<init>;
            ()
          };
          class $iwC extends Serializable {
            def <init>() = {
              super.<init>;
              ()
            };
            case class A extends scala.Product with scala.Serializable {
              <caseaccessor> <paramaccessor> val a: Int = _;
              def <init>(a: Int) = {
                super.<init>;
                ()
              }
            }
          };
          val $iw = new $iwC.<init>
        };
        val $iw = new $iwC.<init>
      }
      object $read extends scala.AnyRef {
        def <init>() = {
          super.<init>;
          ()
        };
        val INSTANCE = new $read.<init>
      }
      defined class A
      
      scala>
      
      ```
      
      This patch also adds a `:fallback` mode on being enabled it will restore the spark-shell's 1.0.0 behaviour.
      
      Author: Prashant Sharma <scrapcodes@gmail.com>
      Author: Yin Huai <huai@cse.ohio-state.edu>
      Author: Prashant Sharma <prashant.s@imaginea.com>
      
      Closes #1635 from ScrapCodes/repl-fix-necessary-imports and squashes the following commits:
      
      b1968d2 [Prashant Sharma] Added toschemaRDD to test case.
      0b712bb [Yin Huai] Add a REPL test to test importing a method.
      02ad8ff [Yin Huai] Add a REPL test for importing SQLContext.createSchemaRDD.
      ed6d0c7 [Prashant Sharma] Added a fallback mode, incase users run into issues while using repl.
      b63d3b2 [Prashant Sharma] SPARK-2632, SPARK-2576. Fixed by only importing what is necessary during class definition.
      14991011
  37. Jul 31, 2014
    • Timothy Hunter's avatar
      [SPARK-2762] SparkILoop leaks memory in multi-repl configurations · 92ca910e
      Timothy Hunter authored
      This pull request is a small refactor so that a partial function (hence a closure) is not created. Instead, a regular function is used. The behavior of the code is not changed.
      
      Author: Timothy Hunter <timhunter@databricks.com>
      
      Closes #1674 from thunterdb/closure_issue and squashes the following commits:
      
      e1e664d [Timothy Hunter] simplify closure
      92ca910e
  38. Jul 22, 2014
    • Prashant Sharma's avatar
      [SPARK-2452] Create a new valid for each instead of using lineId. · 81fec992
      Prashant Sharma authored
      Author: Prashant Sharma <prashant@apache.org>
      
      Closes #1441 from ScrapCodes/SPARK-2452/multi-statement and squashes the following commits:
      
      26c5c72 [Prashant Sharma] Added a test case.
      7e8d28d [Prashant Sharma] SPARK-2452, create a new valid for each  instead of using lineId, because Line ids can be same sometimes.
      81fec992
  39. Jul 04, 2014
Loading