Skip to content
Snippets Groups Projects
  1. Sep 13, 2017
    • Armin's avatar
      [SPARK-21970][CORE] Fix Redundant Throws Declarations in Java Codebase · b6ef1f57
      Armin authored
      ## What changes were proposed in this pull request?
      
      1. Removing all redundant throws declarations from Java codebase.
      2. Removing dead code made visible by this from `ShuffleExternalSorter#closeAndGetSpills`
      
      ## How was this patch tested?
      
      Build still passes.
      
      Author: Armin <me@obrown.io>
      
      Closes #19182 from original-brownbear/SPARK-21970.
      b6ef1f57
    • caoxuewen's avatar
      [SPARK-21963][CORE][TEST] Create temp file should be delete after use · ca00cc70
      caoxuewen authored
      ## What changes were proposed in this pull request?
      
      After you create a temporary table, you need to delete it, otherwise it will leave a file similar to the file name ‘SPARK194465907929586320484966temp’.
      
      ## How was this patch tested?
      
      N / A
      
      Author: caoxuewen <cao.xuewen@zte.com.cn>
      
      Closes #19174 from heary-cao/DeleteTempFile.
      ca00cc70
  2. Sep 01, 2017
    • Sean Owen's avatar
      [SPARK-14280][BUILD][WIP] Update change-version.sh and pom.xml to add Scala... · 12ab7f7e
      Sean Owen authored
      [SPARK-14280][BUILD][WIP] Update change-version.sh and pom.xml to add Scala 2.12 profiles and enable 2.12 compilation
      
      …build; fix some things that will be warnings or errors in 2.12; restore Scala 2.12 profile infrastructure
      
      ## What changes were proposed in this pull request?
      
      This change adds back the infrastructure for a Scala 2.12 build, but does not enable it in the release or Python test scripts.
      
      In order to make that meaningful, it also resolves compile errors that the code hits in 2.12 only, in a way that still works with 2.11.
      
      It also updates dependencies to the earliest minor release of dependencies whose current version does not yet support Scala 2.12. This is in a sense covered by other JIRAs under the main umbrella, but implemented here. The versions below still work with 2.11, and are the _latest_ maintenance release in the _earliest_ viable minor release.
      
      - Scalatest 2.x -> 3.0.3
      - Chill 0.8.0 -> 0.8.4
      - Clapper 1.0.x -> 1.1.2
      - json4s 3.2.x -> 3.4.2
      - Jackson 2.6.x -> 2.7.9 (required by json4s)
      
      This change does _not_ fully enable a Scala 2.12 build:
      
      - It will also require dropping support for Kafka before 0.10. Easy enough, just didn't do it yet here
      - It will require recreating `SparkILoop` and `Main` for REPL 2.12, which is SPARK-14650. Possible to do here too.
      
      What it does do is make changes that resolve much of the remaining gap without affecting the current 2.11 build.
      
      ## How was this patch tested?
      
      Existing tests and build. Manually tested with `./dev/change-scala-version.sh 2.12` to verify it compiles, modulo the exceptions above.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #18645 from srowen/SPARK-14280.
      12ab7f7e
  3. Aug 28, 2017
    • pgandhi's avatar
      [SPARK-21798] No config to replace deprecated SPARK_CLASSPATH config for... · 24e6c187
      pgandhi authored
      [SPARK-21798] No config to replace deprecated SPARK_CLASSPATH config for launching daemons like History Server
      
      History Server Launch uses SparkClassCommandBuilder for launching the server. It is observed that SPARK_CLASSPATH has been removed and deprecated. For spark-submit this takes a different route and spark.driver.extraClasspath takes care of specifying additional jars in the classpath that were previously specified in the SPARK_CLASSPATH. Right now the only way specify the additional jars for launching daemons such as history server is using SPARK_DIST_CLASSPATH (https://spark.apache.org/docs/latest/hadoop-provided.html) but this I presume is a distribution classpath. It would be nice to have a similar config like spark.driver.extraClasspath for launching daemons similar to history server.
      
      Added new environment variable SPARK_DAEMON_CLASSPATH to set classpath for launching daemons. Tested and verified for History Server and Standalone Mode.
      
      ## How was this patch tested?
      Initially, history server start script would fail for the reason being that it could not find the required jars for launching the server in the java classpath. Same was true for running Master and Worker in standalone mode. By adding the environment variable SPARK_DAEMON_CLASSPATH to the java classpath, both the daemons(History Server, Standalone daemons) are starting up and running.
      
      Author: pgandhi <pgandhi@yahoo-inc.com>
      Author: pgandhi999 <parthkgandhi9@gmail.com>
      
      Closes #19047 from pgandhi999/master.
      24e6c187
  4. Aug 25, 2017
    • Marcelo Vanzin's avatar
      [SPARK-17742][CORE] Fail launcher app handle if child process exits with error. · 628bdeab
      Marcelo Vanzin authored
      This is a follow up to cba826d0; that commit set the app handle state
      to "LOST" when the child process exited, but that can be ambiguous. This
      change sets the state to "FAILED" if the exit code was non-zero and
      the handle state wasn't a failure state, or "LOST" if the exit status
      was zero.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #19012 from vanzin/SPARK-17742.
      628bdeab
    • Sean Owen's avatar
      [MINOR][BUILD] Fix build warnings and Java lint errors · de7af295
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Fix build warnings and Java lint errors. This just helps a bit in evaluating (new) warnings in another PR I have open.
      
      ## How was this patch tested?
      
      Existing tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #19051 from srowen/JavaWarnings.
      de7af295
  5. Aug 22, 2017
  6. Aug 15, 2017
    • Marcelo Vanzin's avatar
      [SPARK-17742][CORE] Handle child process exit in SparkLauncher. · cba826d0
      Marcelo Vanzin authored
      Currently the launcher handle does not monitor the child spark-submit
      process it launches; this means that if the child exits with an error,
      the handle's state will never change, and an application will not know
      that the application has failed.
      
      This change adds code to monitor the child process, and changes the
      handle state appropriately when the child process exits.
      
      Tested with added unit tests.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #18877 from vanzin/SPARK-17742.
      cba826d0
  7. Aug 09, 2017
    • vinodkc's avatar
      [SPARK-21665][CORE] Need to close resources after use · 83fe3b5e
      vinodkc authored
      ## What changes were proposed in this pull request?
      Resources in Core - SparkSubmitArguments.scala, Spark-launcher - AbstractCommandBuilder.java, resource-managers- YARN - Client.scala are released
      
      ## How was this patch tested?
      No new test cases added, Unit test have been passed
      
      Author: vinodkc <vinod.kc.in@gmail.com>
      
      Closes #18880 from vinodkc/br_fixresouceleak.
      83fe3b5e
  8. Aug 02, 2017
    • Marcelo Vanzin's avatar
      [SPARK-21490][CORE] Make sure SparkLauncher redirects needed streams. · 9456176d
      Marcelo Vanzin authored
      The code was failing to account for some cases when setting up log
      redirection. For example, if a user redirected only stdout to a file,
      the launcher code would leave stderr without redirection, which could
      lead to child processes getting stuck because stderr wasn't being
      read.
      
      So detect cases where only one of the streams is redirected, and
      redirect the other stream to the log as appropriate.
      
      For the old "launch()" API, redirection of the unconfigured stream
      only happens if the user has explicitly requested for log redirection.
      Log redirection is on by default with "startApplication()".
      
      Most of the change is actually adding new unit tests to make sure the
      different cases work as expected. As part of that, I moved some tests
      that were in the core/ module to the launcher/ module instead, since
      they don't depend on spark-submit.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #18696 from vanzin/SPARK-21490.
      9456176d
  9. Aug 01, 2017
  10. Jul 18, 2017
  11. Jul 13, 2017
    • Sean Owen's avatar
      [SPARK-19810][BUILD][CORE] Remove support for Scala 2.10 · 425c4ada
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      - Remove Scala 2.10 build profiles and support
      - Replace some 2.10 support in scripts with commented placeholders for 2.12 later
      - Remove deprecated API calls from 2.10 support
      - Remove usages of deprecated context bounds where possible
      - Remove Scala 2.10 workarounds like ScalaReflectionLock
      - Other minor Scala warning fixes
      
      ## How was this patch tested?
      
      Existing tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #17150 from srowen/SPARK-19810.
      425c4ada
  12. Jun 01, 2017
    • Marcelo Vanzin's avatar
      [SPARK-20922][CORE] Add whitelist of classes that can be deserialized by the launcher. · 8efc6e98
      Marcelo Vanzin authored
      Blindly deserializing classes using Java serialization opens the code up to
      issues in other libraries, since just deserializing data from a stream may
      end up execution code (think readObject()).
      
      Since the launcher protocol is pretty self-contained, there's just a handful
      of classes it legitimately needs to deserialize, and they're in just two
      packages, so add a filter that throws errors if classes from any other
      package show up in the stream.
      
      This also maintains backwards compatibility (the updated launcher code can
      still communicate with the backend code in older Spark releases).
      
      Tested with new and existing unit tests.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #18166 from vanzin/SPARK-20922.
      8efc6e98
  13. May 22, 2017
    • Marcelo Vanzin's avatar
      [SPARK-20814][MESOS] Restore support for spark.executor.extraClassPath. · df64fa79
      Marcelo Vanzin authored
      Restore code that was removed as part of SPARK-17979, but instead of
      using the deprecated env variable name to propagate the class path, use
      a new one.
      
      Verified by running "./bin/spark-class o.a.s.executor.CoarseGrainedExecutorBackend"
      manually.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #18037 from vanzin/SPARK-20814.
      df64fa79
  14. May 20, 2017
  15. Apr 24, 2017
  16. Mar 10, 2017
  17. Feb 19, 2017
  18. Feb 16, 2017
    • Sean Owen's avatar
      [SPARK-19550][BUILD][CORE][WIP] Remove Java 7 support · 0e240549
      Sean Owen authored
      - Move external/java8-tests tests into core, streaming, sql and remove
      - Remove MaxPermGen and related options
      - Fix some reflection / TODOs around Java 8+ methods
      - Update doc references to 1.7/1.8 differences
      - Remove Java 7/8 related build profiles
      - Update some plugins for better Java 8 compatibility
      - Fix a few Java-related warnings
      
      For the future:
      
      - Update Java 8 examples to fully use Java 8
      - Update Java tests to use lambdas for simplicity
      - Update Java internal implementations to use lambdas
      
      ## How was this patch tested?
      
      Existing tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #16871 from srowen/SPARK-19493.
      Unverified
      0e240549
  19. Jan 18, 2017
  20. Dec 21, 2016
    • Ryan Williams's avatar
      [SPARK-17807][CORE] split test-tags into test-JAR · afd9bc1d
      Ryan Williams authored
      Remove spark-tag's compile-scope dependency (and, indirectly, spark-core's compile-scope transitive-dependency) on scalatest by splitting test-oriented tags into spark-tags' test JAR.
      
      Alternative to #16303.
      
      Author: Ryan Williams <ryan.blake.williams@gmail.com>
      
      Closes #16311 from ryan-williams/tt.
      afd9bc1d
  21. Dec 14, 2016
  22. Dec 08, 2016
  23. Dec 02, 2016
  24. Nov 16, 2016
    • Holden Karau's avatar
      [SPARK-1267][SPARK-18129] Allow PySpark to be pip installed · a36a76ac
      Holden Karau authored
      ## What changes were proposed in this pull request?
      
      This PR aims to provide a pip installable PySpark package. This does a bunch of work to copy the jars over and package them with the Python code (to prevent challenges from trying to use different versions of the Python code with different versions of the JAR). It does not currently publish to PyPI but that is the natural follow up (SPARK-18129).
      
      Done:
      - pip installable on conda [manual tested]
      - setup.py installed on a non-pip managed system (RHEL) with YARN [manual tested]
      - Automated testing of this (virtualenv)
      - packaging and signing with release-build*
      
      Possible follow up work:
      - release-build update to publish to PyPI (SPARK-18128)
      - figure out who owns the pyspark package name on prod PyPI (is it someone with in the project or should we ask PyPI or should we choose a different name to publish with like ApachePySpark?)
      - Windows support and or testing ( SPARK-18136 )
      - investigate details of wheel caching and see if we can avoid cleaning the wheel cache during our test
      - consider how we want to number our dev/snapshot versions
      
      Explicitly out of scope:
      - Using pip installed PySpark to start a standalone cluster
      - Using pip installed PySpark for non-Python Spark programs
      
      *I've done some work to test release-build locally but as a non-committer I've just done local testing.
      ## How was this patch tested?
      
      Automated testing with virtualenv, manual testing with conda, a system wide install, and YARN integration.
      
      release-build changes tested locally as a non-committer (no testing of upload artifacts to Apache staging websites)
      
      Author: Holden Karau <holden@us.ibm.com>
      Author: Juliet Hougland <juliet@cloudera.com>
      Author: Juliet Hougland <not@myemail.com>
      
      Closes #15659 from holdenk/SPARK-1267-pip-install-pyspark.
      a36a76ac
  25. Aug 31, 2016
    • Jeff Zhang's avatar
      [SPARK-17178][SPARKR][SPARKSUBMIT] Allow to set sparkr shell command through --conf · fa634793
      Jeff Zhang authored
      ## What changes were proposed in this pull request?
      
      Allow user to set sparkr shell command through --conf spark.r.shell.command
      
      ## How was this patch tested?
      
      Unit test is added and also verify it manually through
      ```
      bin/sparkr --master yarn-client --conf spark.r.shell.command=/usr/local/bin/R
      ```
      
      Author: Jeff Zhang <zjffdu@apache.org>
      
      Closes #14744 from zjffdu/SPARK-17178.
      fa634793
  26. Aug 11, 2016
    • Jeff Zhang's avatar
      [SPARK-13081][PYSPARK][SPARK_SUBMIT] Allow set pythonExec of driver and executor through conf… · 7a9e25c3
      Jeff Zhang authored
      Before this PR, user have to export environment variable to specify the python of driver & executor which is not so convenient for users. This PR is trying to allow user to specify python through configuration "--pyspark-driver-python" & "--pyspark-executor-python"
      
      Manually test in local & yarn mode for pyspark-shell and pyspark batch mode.
      
      Author: Jeff Zhang <zjffdu@apache.org>
      
      Closes #13146 from zjffdu/SPARK-13081.
      7a9e25c3
  27. Jul 19, 2016
  28. Jul 11, 2016
    • Reynold Xin's avatar
      [SPARK-16477] Bump master version to 2.1.0-SNAPSHOT · ffcb6e05
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      After SPARK-16476 (committed earlier today as #14128), we can finally bump the version number.
      
      ## How was this patch tested?
      N/A
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #14130 from rxin/SPARK-16477.
      ffcb6e05
  29. Jun 08, 2016
    • Sandeep Singh's avatar
      [MINOR] Fix Java Lint errors introduced by #13286 and #13280 · f958c1c3
      Sandeep Singh authored
      ## What changes were proposed in this pull request?
      
      revived #13464
      
      Fix Java Lint errors introduced by #13286 and #13280
      Before:
      ```
      Using `mvn` from path: /Users/pichu/Project/spark/build/apache-maven-3.3.9/bin/mvn
      Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
      Checkstyle checks failed at following occurrences:
      [ERROR] src/main/java/org/apache/spark/launcher/LauncherServer.java:[340,5] (whitespace) FileTabCharacter: Line contains a tab character.
      [ERROR] src/main/java/org/apache/spark/launcher/LauncherServer.java:[341,5] (whitespace) FileTabCharacter: Line contains a tab character.
      [ERROR] src/main/java/org/apache/spark/launcher/LauncherServer.java:[342,5] (whitespace) FileTabCharacter: Line contains a tab character.
      [ERROR] src/main/java/org/apache/spark/launcher/LauncherServer.java:[343,5] (whitespace) FileTabCharacter: Line contains a tab character.
      [ERROR] src/main/java/org/apache/spark/sql/streaming/OutputMode.java:[41,28] (naming) MethodName: Method name 'Append' must match pattern '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
      [ERROR] src/main/java/org/apache/spark/sql/streaming/OutputMode.java:[52,28] (naming) MethodName: Method name 'Complete' must match pattern '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
      [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java:[61,8] (imports) UnusedImports: Unused import - org.apache.parquet.schema.PrimitiveType.
      [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java:[62,8] (imports) UnusedImports: Unused import - org.apache.parquet.schema.Type.
      ```
      
      ## How was this patch tested?
      ran `dev/lint-java` locally
      
      Author: Sandeep Singh <sandeep@techaddict.me>
      
      Closes #13559 from techaddict/minor-3.
      f958c1c3
  30. Jun 06, 2016
    • Subroto Sanyal's avatar
      [SPARK-15652][LAUNCHER] Added a new State (LOST) for the listeners of SparkLauncher · c409e23a
      Subroto Sanyal authored
      ## What changes were proposed in this pull request?
      This situation can happen when the LauncherConnection gets an exception while reading through the socket and terminating silently without notifying making the client/listener think that the job is still in previous state.
      The fix force sends a notification to client that the job finished with unknown status and let client handle it accordingly.
      
      ## How was this patch tested?
      Added a unit test.
      
      Author: Subroto Sanyal <ssanyal@datameer.com>
      
      Closes #13497 from subrotosanyal/SPARK-15652-handle-spark-submit-jvm-crash.
      c409e23a
  31. Jun 03, 2016
    • Devaraj K's avatar
      [SPARK-15665][CORE] spark-submit --kill and --status are not working · efd3b11a
      Devaraj K authored
      ## What changes were proposed in this pull request?
      --kill and --status were not considered while handling in OptionParser and due to that it was failing. Now handling the --kill and --status options as part of OptionParser.handle.
      
      ## How was this patch tested?
      Added a test org.apache.spark.launcher.SparkSubmitCommandBuilderSuite.testCliKillAndStatus() and also I have verified these manually by running --kill and --status commands.
      
      Author: Devaraj K <devaraj@apache.org>
      
      Closes #13407 from devaraj-kavali/SPARK-15665.
      efd3b11a
  32. May 22, 2016
  33. May 20, 2016
    • wm624@hotmail.com's avatar
      [SPARK-15360][SPARK-SUBMIT] Should print spark-submit usage when no arguments is specified · fe2fcb48
      wm624@hotmail.com authored
      (Please fill in changes proposed in this fix)
      In 2.0, ./bin/spark-submit doesn't print out usage, but it raises an exception.
      In this PR, an exception handling is added in the Main.java when the exception is thrown. In the handling code, if there is no additional argument, it prints out usage.
      
      (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
      Manually tested.
      ./bin/spark-submit
      Usage: spark-submit [options] <app jar | python file> [app arguments]
      Usage: spark-submit --kill [submission ID] --master [spark://...]
      Usage: spark-submit --status [submission ID] --master [spark://...]
      Usage: spark-submit run-example [options] example-class [example args]
      
      Options:
        --master MASTER_URL         spark://host:port, mesos://host:port, yarn, or local.
        --deploy-mode DEPLOY_MODE   Whether to launch the driver program locally ("client") or
                                    on one of the worker machines inside the cluster ("cluster")
                                    (Default: client).
        --class CLASS_NAME          Your application's main class (for Java / Scala apps).
        --name NAME                 A name of your application.
        --jars JARS                 Comma-separated list of local jars to include on the driver
                                    and executor classpaths.
        --packages                  Comma-separated list of maven coordinates of jars to include
                                    on the driver and executor classpaths. Will search the local
                                    maven repo, then maven central and any additional remote
                                    repositories given by --repositories. The format for the
                                    coordinates should be groupId:artifactId:version.
      
      Author: wm624@hotmail.com <wm624@hotmail.com>
      
      Closes #13163 from wangmiao1981/submit.
      fe2fcb48
  34. May 17, 2016
  35. May 10, 2016
    • Marcelo Vanzin's avatar
      [SPARK-11249][LAUNCHER] Throw error if app resource is not provided. · 0b9cae42
      Marcelo Vanzin authored
      Without this, the code would build an invalid spark-submit command line,
      and a more cryptic error would be presented to the user. Also, expose
      a constant that allows users to set a dummy resource in cases where
      they don't need an actual resource file; for backwards compatibility,
      that uses the same "spark-internal" resource that Spark itself uses.
      
      Tested via unit tests, run-example, spark-shell, and running the
      thrift server with mixed spark and hive command line arguments.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #12909 from vanzin/SPARK-11249.
      0b9cae42
  36. May 09, 2016
  37. Apr 30, 2016
    • Marcelo Vanzin's avatar
      [SPARK-14391][LAUNCHER] Fix launcher communication test, take 2. · 73c20bf3
      Marcelo Vanzin authored
      There's actually a race here: the state of the handler was changed before
      the connection was set, so the test code could be notified of the state
      change, wake up, and still see the connection as null, triggering the assert.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #12785 from vanzin/SPARK-14391.
      73c20bf3
Loading