Skip to content
Snippets Groups Projects
  1. Apr 25, 2017
  2. Apr 14, 2017
  3. Mar 28, 2017
  4. Mar 21, 2017
  5. Dec 22, 2016
  6. Dec 15, 2016
  7. Dec 08, 2016
  8. Nov 28, 2016
  9. Nov 16, 2016
    • Holden Karau's avatar
      [SPARK-1267][SPARK-18129] Allow PySpark to be pip installed · 6a3cbbc0
      Holden Karau authored
      ## What changes were proposed in this pull request?
      
      This PR aims to provide a pip installable PySpark package. This does a bunch of work to copy the jars over and package them with the Python code (to prevent challenges from trying to use different versions of the Python code with different versions of the JAR). It does not currently publish to PyPI but that is the natural follow up (SPARK-18129).
      
      Done:
      - pip installable on conda [manual tested]
      - setup.py installed on a non-pip managed system (RHEL) with YARN [manual tested]
      - Automated testing of this (virtualenv)
      - packaging and signing with release-build*
      
      Possible follow up work:
      - release-build update to publish to PyPI (SPARK-18128)
      - figure out who owns the pyspark package name on prod PyPI (is it someone with in the project or should we ask PyPI or should we choose a different name to publish with like ApachePySpark?)
      - Windows support and or testing ( SPARK-18136 )
      - investigate details of wheel caching and see if we can avoid cleaning the wheel cache during our test
      - consider how we want to number our dev/snapshot versions
      
      Explicitly out of scope:
      - Using pip installed PySpark to start a standalone cluster
      - Using pip installed PySpark for non-Python Spark programs
      
      *I've done some work to test release-build locally but as a non-committer I've just done local testing.
      ## How was this patch tested?
      
      Automated testing with virtualenv, manual testing with conda, a system wide install, and YARN integration.
      
      release-build changes tested locally as a non-committer (no testing of upload artifacts to Apache staging websites)
      
      Author: Holden Karau <holden@us.ibm.com>
      Author: Juliet Hougland <juliet@cloudera.com>
      Author: Juliet Hougland <not@myemail.com>
      
      Closes #15659 from holdenk/SPARK-1267-pip-install-pyspark.
      6a3cbbc0
  10. Aug 31, 2016
    • Jeff Zhang's avatar
      [SPARK-17178][SPARKR][SPARKSUBMIT] Allow to set sparkr shell command through --conf · fa634793
      Jeff Zhang authored
      ## What changes were proposed in this pull request?
      
      Allow user to set sparkr shell command through --conf spark.r.shell.command
      
      ## How was this patch tested?
      
      Unit test is added and also verify it manually through
      ```
      bin/sparkr --master yarn-client --conf spark.r.shell.command=/usr/local/bin/R
      ```
      
      Author: Jeff Zhang <zjffdu@apache.org>
      
      Closes #14744 from zjffdu/SPARK-17178.
      fa634793
  11. Aug 11, 2016
    • Jeff Zhang's avatar
      [SPARK-13081][PYSPARK][SPARK_SUBMIT] Allow set pythonExec of driver and executor through conf… · 7a9e25c3
      Jeff Zhang authored
      Before this PR, user have to export environment variable to specify the python of driver & executor which is not so convenient for users. This PR is trying to allow user to specify python through configuration "--pyspark-driver-python" & "--pyspark-executor-python"
      
      Manually test in local & yarn mode for pyspark-shell and pyspark batch mode.
      
      Author: Jeff Zhang <zjffdu@apache.org>
      
      Closes #13146 from zjffdu/SPARK-13081.
      7a9e25c3
  12. Jul 19, 2016
  13. Jul 11, 2016
    • Reynold Xin's avatar
      [SPARK-16477] Bump master version to 2.1.0-SNAPSHOT · ffcb6e05
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      After SPARK-16476 (committed earlier today as #14128), we can finally bump the version number.
      
      ## How was this patch tested?
      N/A
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #14130 from rxin/SPARK-16477.
      ffcb6e05
  14. Jun 08, 2016
    • Sandeep Singh's avatar
      [MINOR] Fix Java Lint errors introduced by #13286 and #13280 · f958c1c3
      Sandeep Singh authored
      ## What changes were proposed in this pull request?
      
      revived #13464
      
      Fix Java Lint errors introduced by #13286 and #13280
      Before:
      ```
      Using `mvn` from path: /Users/pichu/Project/spark/build/apache-maven-3.3.9/bin/mvn
      Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
      Checkstyle checks failed at following occurrences:
      [ERROR] src/main/java/org/apache/spark/launcher/LauncherServer.java:[340,5] (whitespace) FileTabCharacter: Line contains a tab character.
      [ERROR] src/main/java/org/apache/spark/launcher/LauncherServer.java:[341,5] (whitespace) FileTabCharacter: Line contains a tab character.
      [ERROR] src/main/java/org/apache/spark/launcher/LauncherServer.java:[342,5] (whitespace) FileTabCharacter: Line contains a tab character.
      [ERROR] src/main/java/org/apache/spark/launcher/LauncherServer.java:[343,5] (whitespace) FileTabCharacter: Line contains a tab character.
      [ERROR] src/main/java/org/apache/spark/sql/streaming/OutputMode.java:[41,28] (naming) MethodName: Method name 'Append' must match pattern '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
      [ERROR] src/main/java/org/apache/spark/sql/streaming/OutputMode.java:[52,28] (naming) MethodName: Method name 'Complete' must match pattern '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
      [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java:[61,8] (imports) UnusedImports: Unused import - org.apache.parquet.schema.PrimitiveType.
      [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java:[62,8] (imports) UnusedImports: Unused import - org.apache.parquet.schema.Type.
      ```
      
      ## How was this patch tested?
      ran `dev/lint-java` locally
      
      Author: Sandeep Singh <sandeep@techaddict.me>
      
      Closes #13559 from techaddict/minor-3.
      f958c1c3
  15. Jun 06, 2016
    • Subroto Sanyal's avatar
      [SPARK-15652][LAUNCHER] Added a new State (LOST) for the listeners of SparkLauncher · c409e23a
      Subroto Sanyal authored
      ## What changes were proposed in this pull request?
      This situation can happen when the LauncherConnection gets an exception while reading through the socket and terminating silently without notifying making the client/listener think that the job is still in previous state.
      The fix force sends a notification to client that the job finished with unknown status and let client handle it accordingly.
      
      ## How was this patch tested?
      Added a unit test.
      
      Author: Subroto Sanyal <ssanyal@datameer.com>
      
      Closes #13497 from subrotosanyal/SPARK-15652-handle-spark-submit-jvm-crash.
      c409e23a
  16. Jun 03, 2016
    • Devaraj K's avatar
      [SPARK-15665][CORE] spark-submit --kill and --status are not working · efd3b11a
      Devaraj K authored
      ## What changes were proposed in this pull request?
      --kill and --status were not considered while handling in OptionParser and due to that it was failing. Now handling the --kill and --status options as part of OptionParser.handle.
      
      ## How was this patch tested?
      Added a test org.apache.spark.launcher.SparkSubmitCommandBuilderSuite.testCliKillAndStatus() and also I have verified these manually by running --kill and --status commands.
      
      Author: Devaraj K <devaraj@apache.org>
      
      Closes #13407 from devaraj-kavali/SPARK-15665.
      efd3b11a
  17. May 22, 2016
  18. May 20, 2016
    • wm624@hotmail.com's avatar
      [SPARK-15360][SPARK-SUBMIT] Should print spark-submit usage when no arguments is specified · fe2fcb48
      wm624@hotmail.com authored
      (Please fill in changes proposed in this fix)
      In 2.0, ./bin/spark-submit doesn't print out usage, but it raises an exception.
      In this PR, an exception handling is added in the Main.java when the exception is thrown. In the handling code, if there is no additional argument, it prints out usage.
      
      (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
      Manually tested.
      ./bin/spark-submit
      Usage: spark-submit [options] <app jar | python file> [app arguments]
      Usage: spark-submit --kill [submission ID] --master [spark://...]
      Usage: spark-submit --status [submission ID] --master [spark://...]
      Usage: spark-submit run-example [options] example-class [example args]
      
      Options:
        --master MASTER_URL         spark://host:port, mesos://host:port, yarn, or local.
        --deploy-mode DEPLOY_MODE   Whether to launch the driver program locally ("client") or
                                    on one of the worker machines inside the cluster ("cluster")
                                    (Default: client).
        --class CLASS_NAME          Your application's main class (for Java / Scala apps).
        --name NAME                 A name of your application.
        --jars JARS                 Comma-separated list of local jars to include on the driver
                                    and executor classpaths.
        --packages                  Comma-separated list of maven coordinates of jars to include
                                    on the driver and executor classpaths. Will search the local
                                    maven repo, then maven central and any additional remote
                                    repositories given by --repositories. The format for the
                                    coordinates should be groupId:artifactId:version.
      
      Author: wm624@hotmail.com <wm624@hotmail.com>
      
      Closes #13163 from wangmiao1981/submit.
      fe2fcb48
  19. May 17, 2016
  20. May 10, 2016
    • Marcelo Vanzin's avatar
      [SPARK-11249][LAUNCHER] Throw error if app resource is not provided. · 0b9cae42
      Marcelo Vanzin authored
      Without this, the code would build an invalid spark-submit command line,
      and a more cryptic error would be presented to the user. Also, expose
      a constant that allows users to set a dummy resource in cases where
      they don't need an actual resource file; for backwards compatibility,
      that uses the same "spark-internal" resource that Spark itself uses.
      
      Tested via unit tests, run-example, spark-shell, and running the
      thrift server with mixed spark and hive command line arguments.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #12909 from vanzin/SPARK-11249.
      0b9cae42
  21. May 09, 2016
  22. Apr 30, 2016
    • Marcelo Vanzin's avatar
      [SPARK-14391][LAUNCHER] Fix launcher communication test, take 2. · 73c20bf3
      Marcelo Vanzin authored
      There's actually a race here: the state of the handler was changed before
      the connection was set, so the test code could be notified of the state
      change, wake up, and still see the connection as null, triggering the assert.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #12785 from vanzin/SPARK-14391.
      73c20bf3
  23. Apr 29, 2016
  24. Apr 28, 2016
  25. Apr 07, 2016
    • Dhruve Ashar's avatar
      [SPARK-12384] Enables spark-clients to set the min(-Xms) and max(*.memory config) j… · 033d8081
      Dhruve Ashar authored
      ## What changes were proposed in this pull request?
      
      Currently Spark clients are started with the same memory setting for Xms and Xms leading to reserving unnecessary higher amounts of memory.
      This behavior is changed and the clients can now specify an initial heap size using the extraJavaOptions in the config for driver,executor and am individually.
       Note, that only -Xms can be provided through this config option, if the client wants to set the max size(-Xmx), this has to be done via the *.memory configuration knobs which are currently supported.
      
      ## How was this patch tested?
      
      Monitored executor and yarn logs in debug mode to verify the commands through which they are being launched in client and cluster mode. The driver memory was verified locally using jps -v. Setting up -Xmx parameter in the javaExtraOptions raises exception with the info provided.
      
      Author: Dhruve Ashar <dhruveashar@gmail.com>
      
      Closes #12115 from dhruve/impr/SPARK-12384.
      033d8081
  26. Apr 06, 2016
    • Marcelo Vanzin's avatar
      [SPARK-14134][CORE] Change the package name used for shading classes. · 21d5ca12
      Marcelo Vanzin authored
      The current package name uses a dash, which is a little weird but seemed
      to work. That is, until a new test tried to mock a class that references
      one of those shaded types, and then things started failing.
      
      Most changes are just noise to fix the logging configs.
      
      For reference, SPARK-8815 also raised this issue, although at the time it
      did not cause any issues in Spark, so it was not addressed.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #11941 from vanzin/SPARK-14134.
      21d5ca12
    • Marcelo Vanzin's avatar
      [SPARK-14391][LAUNCHER] Increase test timeouts. · de479260
      Marcelo Vanzin authored
      Most of the time tests should still pass really quickly; it's just
      when machines are overloaded that the tests may take a little time,
      but that's still preferable over just failing the test.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #12210 from vanzin/SPARK-14391.
      de479260
  27. Apr 04, 2016
    • Marcelo Vanzin's avatar
      [SPARK-13579][BUILD] Stop building the main Spark assembly. · 24d7d2e4
      Marcelo Vanzin authored
      This change modifies the "assembly/" module to just copy needed
      dependencies to its build directory, and modifies the packaging
      script to pick those up (and remove duplicate jars packages in the
      examples module).
      
      I also made some minor adjustments to dependencies to remove some
      test jars from the final packaging, and remove jars that conflict with each
      other when packaged separately (e.g. servlet api).
      
      Also note that this change restores guava in applications' classpaths, even
      though it's still shaded inside Spark. This is now needed for the Hadoop
      libraries that are packaged with Spark, which now are not processed by
      the shade plugin.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #11796 from vanzin/SPARK-13579.
      24d7d2e4
Loading