Skip to content
Snippets Groups Projects
  1. Jul 18, 2017
  2. Jul 13, 2017
    • Sean Owen's avatar
      [SPARK-19810][BUILD][CORE] Remove support for Scala 2.10 · 425c4ada
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      - Remove Scala 2.10 build profiles and support
      - Replace some 2.10 support in scripts with commented placeholders for 2.12 later
      - Remove deprecated API calls from 2.10 support
      - Remove usages of deprecated context bounds where possible
      - Remove Scala 2.10 workarounds like ScalaReflectionLock
      - Other minor Scala warning fixes
      
      ## How was this patch tested?
      
      Existing tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #17150 from srowen/SPARK-19810.
      425c4ada
  3. Jun 01, 2017
    • Marcelo Vanzin's avatar
      [SPARK-20922][CORE] Add whitelist of classes that can be deserialized by the launcher. · 8efc6e98
      Marcelo Vanzin authored
      Blindly deserializing classes using Java serialization opens the code up to
      issues in other libraries, since just deserializing data from a stream may
      end up execution code (think readObject()).
      
      Since the launcher protocol is pretty self-contained, there's just a handful
      of classes it legitimately needs to deserialize, and they're in just two
      packages, so add a filter that throws errors if classes from any other
      package show up in the stream.
      
      This also maintains backwards compatibility (the updated launcher code can
      still communicate with the backend code in older Spark releases).
      
      Tested with new and existing unit tests.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #18166 from vanzin/SPARK-20922.
      8efc6e98
  4. May 22, 2017
    • Marcelo Vanzin's avatar
      [SPARK-20814][MESOS] Restore support for spark.executor.extraClassPath. · df64fa79
      Marcelo Vanzin authored
      Restore code that was removed as part of SPARK-17979, but instead of
      using the deprecated env variable name to propagate the class path, use
      a new one.
      
      Verified by running "./bin/spark-class o.a.s.executor.CoarseGrainedExecutorBackend"
      manually.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #18037 from vanzin/SPARK-20814.
      df64fa79
  5. May 20, 2017
  6. Apr 24, 2017
  7. Mar 10, 2017
  8. Feb 19, 2017
  9. Feb 16, 2017
    • Sean Owen's avatar
      [SPARK-19550][BUILD][CORE][WIP] Remove Java 7 support · 0e240549
      Sean Owen authored
      - Move external/java8-tests tests into core, streaming, sql and remove
      - Remove MaxPermGen and related options
      - Fix some reflection / TODOs around Java 8+ methods
      - Update doc references to 1.7/1.8 differences
      - Remove Java 7/8 related build profiles
      - Update some plugins for better Java 8 compatibility
      - Fix a few Java-related warnings
      
      For the future:
      
      - Update Java 8 examples to fully use Java 8
      - Update Java tests to use lambdas for simplicity
      - Update Java internal implementations to use lambdas
      
      ## How was this patch tested?
      
      Existing tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #16871 from srowen/SPARK-19493.
      0e240549
  10. Jan 18, 2017
  11. Dec 21, 2016
    • Ryan Williams's avatar
      [SPARK-17807][CORE] split test-tags into test-JAR · afd9bc1d
      Ryan Williams authored
      Remove spark-tag's compile-scope dependency (and, indirectly, spark-core's compile-scope transitive-dependency) on scalatest by splitting test-oriented tags into spark-tags' test JAR.
      
      Alternative to #16303.
      
      Author: Ryan Williams <ryan.blake.williams@gmail.com>
      
      Closes #16311 from ryan-williams/tt.
      afd9bc1d
  12. Dec 14, 2016
  13. Dec 08, 2016
  14. Dec 02, 2016
  15. Nov 16, 2016
    • Holden Karau's avatar
      [SPARK-1267][SPARK-18129] Allow PySpark to be pip installed · a36a76ac
      Holden Karau authored
      ## What changes were proposed in this pull request?
      
      This PR aims to provide a pip installable PySpark package. This does a bunch of work to copy the jars over and package them with the Python code (to prevent challenges from trying to use different versions of the Python code with different versions of the JAR). It does not currently publish to PyPI but that is the natural follow up (SPARK-18129).
      
      Done:
      - pip installable on conda [manual tested]
      - setup.py installed on a non-pip managed system (RHEL) with YARN [manual tested]
      - Automated testing of this (virtualenv)
      - packaging and signing with release-build*
      
      Possible follow up work:
      - release-build update to publish to PyPI (SPARK-18128)
      - figure out who owns the pyspark package name on prod PyPI (is it someone with in the project or should we ask PyPI or should we choose a different name to publish with like ApachePySpark?)
      - Windows support and or testing ( SPARK-18136 )
      - investigate details of wheel caching and see if we can avoid cleaning the wheel cache during our test
      - consider how we want to number our dev/snapshot versions
      
      Explicitly out of scope:
      - Using pip installed PySpark to start a standalone cluster
      - Using pip installed PySpark for non-Python Spark programs
      
      *I've done some work to test release-build locally but as a non-committer I've just done local testing.
      ## How was this patch tested?
      
      Automated testing with virtualenv, manual testing with conda, a system wide install, and YARN integration.
      
      release-build changes tested locally as a non-committer (no testing of upload artifacts to Apache staging websites)
      
      Author: Holden Karau <holden@us.ibm.com>
      Author: Juliet Hougland <juliet@cloudera.com>
      Author: Juliet Hougland <not@myemail.com>
      
      Closes #15659 from holdenk/SPARK-1267-pip-install-pyspark.
      a36a76ac
  16. Aug 31, 2016
    • Jeff Zhang's avatar
      [SPARK-17178][SPARKR][SPARKSUBMIT] Allow to set sparkr shell command through --conf · fa634793
      Jeff Zhang authored
      ## What changes were proposed in this pull request?
      
      Allow user to set sparkr shell command through --conf spark.r.shell.command
      
      ## How was this patch tested?
      
      Unit test is added and also verify it manually through
      ```
      bin/sparkr --master yarn-client --conf spark.r.shell.command=/usr/local/bin/R
      ```
      
      Author: Jeff Zhang <zjffdu@apache.org>
      
      Closes #14744 from zjffdu/SPARK-17178.
      fa634793
  17. Aug 11, 2016
    • Jeff Zhang's avatar
      [SPARK-13081][PYSPARK][SPARK_SUBMIT] Allow set pythonExec of driver and executor through conf… · 7a9e25c3
      Jeff Zhang authored
      Before this PR, user have to export environment variable to specify the python of driver & executor which is not so convenient for users. This PR is trying to allow user to specify python through configuration "--pyspark-driver-python" & "--pyspark-executor-python"
      
      Manually test in local & yarn mode for pyspark-shell and pyspark batch mode.
      
      Author: Jeff Zhang <zjffdu@apache.org>
      
      Closes #13146 from zjffdu/SPARK-13081.
      7a9e25c3
  18. Jul 19, 2016
  19. Jul 11, 2016
    • Reynold Xin's avatar
      [SPARK-16477] Bump master version to 2.1.0-SNAPSHOT · ffcb6e05
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      After SPARK-16476 (committed earlier today as #14128), we can finally bump the version number.
      
      ## How was this patch tested?
      N/A
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #14130 from rxin/SPARK-16477.
      ffcb6e05
  20. Jun 08, 2016
    • Sandeep Singh's avatar
      [MINOR] Fix Java Lint errors introduced by #13286 and #13280 · f958c1c3
      Sandeep Singh authored
      ## What changes were proposed in this pull request?
      
      revived #13464
      
      Fix Java Lint errors introduced by #13286 and #13280
      Before:
      ```
      Using `mvn` from path: /Users/pichu/Project/spark/build/apache-maven-3.3.9/bin/mvn
      Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
      Checkstyle checks failed at following occurrences:
      [ERROR] src/main/java/org/apache/spark/launcher/LauncherServer.java:[340,5] (whitespace) FileTabCharacter: Line contains a tab character.
      [ERROR] src/main/java/org/apache/spark/launcher/LauncherServer.java:[341,5] (whitespace) FileTabCharacter: Line contains a tab character.
      [ERROR] src/main/java/org/apache/spark/launcher/LauncherServer.java:[342,5] (whitespace) FileTabCharacter: Line contains a tab character.
      [ERROR] src/main/java/org/apache/spark/launcher/LauncherServer.java:[343,5] (whitespace) FileTabCharacter: Line contains a tab character.
      [ERROR] src/main/java/org/apache/spark/sql/streaming/OutputMode.java:[41,28] (naming) MethodName: Method name 'Append' must match pattern '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
      [ERROR] src/main/java/org/apache/spark/sql/streaming/OutputMode.java:[52,28] (naming) MethodName: Method name 'Complete' must match pattern '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
      [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java:[61,8] (imports) UnusedImports: Unused import - org.apache.parquet.schema.PrimitiveType.
      [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java:[62,8] (imports) UnusedImports: Unused import - org.apache.parquet.schema.Type.
      ```
      
      ## How was this patch tested?
      ran `dev/lint-java` locally
      
      Author: Sandeep Singh <sandeep@techaddict.me>
      
      Closes #13559 from techaddict/minor-3.
      f958c1c3
  21. Jun 06, 2016
    • Subroto Sanyal's avatar
      [SPARK-15652][LAUNCHER] Added a new State (LOST) for the listeners of SparkLauncher · c409e23a
      Subroto Sanyal authored
      ## What changes were proposed in this pull request?
      This situation can happen when the LauncherConnection gets an exception while reading through the socket and terminating silently without notifying making the client/listener think that the job is still in previous state.
      The fix force sends a notification to client that the job finished with unknown status and let client handle it accordingly.
      
      ## How was this patch tested?
      Added a unit test.
      
      Author: Subroto Sanyal <ssanyal@datameer.com>
      
      Closes #13497 from subrotosanyal/SPARK-15652-handle-spark-submit-jvm-crash.
      c409e23a
  22. Jun 03, 2016
    • Devaraj K's avatar
      [SPARK-15665][CORE] spark-submit --kill and --status are not working · efd3b11a
      Devaraj K authored
      ## What changes were proposed in this pull request?
      --kill and --status were not considered while handling in OptionParser and due to that it was failing. Now handling the --kill and --status options as part of OptionParser.handle.
      
      ## How was this patch tested?
      Added a test org.apache.spark.launcher.SparkSubmitCommandBuilderSuite.testCliKillAndStatus() and also I have verified these manually by running --kill and --status commands.
      
      Author: Devaraj K <devaraj@apache.org>
      
      Closes #13407 from devaraj-kavali/SPARK-15665.
      efd3b11a
  23. May 22, 2016
  24. May 20, 2016
    • wm624@hotmail.com's avatar
      [SPARK-15360][SPARK-SUBMIT] Should print spark-submit usage when no arguments is specified · fe2fcb48
      wm624@hotmail.com authored
      (Please fill in changes proposed in this fix)
      In 2.0, ./bin/spark-submit doesn't print out usage, but it raises an exception.
      In this PR, an exception handling is added in the Main.java when the exception is thrown. In the handling code, if there is no additional argument, it prints out usage.
      
      (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
      Manually tested.
      ./bin/spark-submit
      Usage: spark-submit [options] <app jar | python file> [app arguments]
      Usage: spark-submit --kill [submission ID] --master [spark://...]
      Usage: spark-submit --status [submission ID] --master [spark://...]
      Usage: spark-submit run-example [options] example-class [example args]
      
      Options:
        --master MASTER_URL         spark://host:port, mesos://host:port, yarn, or local.
        --deploy-mode DEPLOY_MODE   Whether to launch the driver program locally ("client") or
                                    on one of the worker machines inside the cluster ("cluster")
                                    (Default: client).
        --class CLASS_NAME          Your application's main class (for Java / Scala apps).
        --name NAME                 A name of your application.
        --jars JARS                 Comma-separated list of local jars to include on the driver
                                    and executor classpaths.
        --packages                  Comma-separated list of maven coordinates of jars to include
                                    on the driver and executor classpaths. Will search the local
                                    maven repo, then maven central and any additional remote
                                    repositories given by --repositories. The format for the
                                    coordinates should be groupId:artifactId:version.
      
      Author: wm624@hotmail.com <wm624@hotmail.com>
      
      Closes #13163 from wangmiao1981/submit.
      fe2fcb48
  25. May 17, 2016
  26. May 10, 2016
    • Marcelo Vanzin's avatar
      [SPARK-11249][LAUNCHER] Throw error if app resource is not provided. · 0b9cae42
      Marcelo Vanzin authored
      Without this, the code would build an invalid spark-submit command line,
      and a more cryptic error would be presented to the user. Also, expose
      a constant that allows users to set a dummy resource in cases where
      they don't need an actual resource file; for backwards compatibility,
      that uses the same "spark-internal" resource that Spark itself uses.
      
      Tested via unit tests, run-example, spark-shell, and running the
      thrift server with mixed spark and hive command line arguments.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #12909 from vanzin/SPARK-11249.
      0b9cae42
  27. May 09, 2016
  28. Apr 30, 2016
    • Marcelo Vanzin's avatar
      [SPARK-14391][LAUNCHER] Fix launcher communication test, take 2. · 73c20bf3
      Marcelo Vanzin authored
      There's actually a race here: the state of the handler was changed before
      the connection was set, so the test code could be notified of the state
      change, wake up, and still see the connection as null, triggering the assert.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #12785 from vanzin/SPARK-14391.
      73c20bf3
  29. Apr 29, 2016
  30. Apr 28, 2016
  31. Apr 07, 2016
    • Dhruve Ashar's avatar
      [SPARK-12384] Enables spark-clients to set the min(-Xms) and max(*.memory config) j… · 033d8081
      Dhruve Ashar authored
      ## What changes were proposed in this pull request?
      
      Currently Spark clients are started with the same memory setting for Xms and Xms leading to reserving unnecessary higher amounts of memory.
      This behavior is changed and the clients can now specify an initial heap size using the extraJavaOptions in the config for driver,executor and am individually.
       Note, that only -Xms can be provided through this config option, if the client wants to set the max size(-Xmx), this has to be done via the *.memory configuration knobs which are currently supported.
      
      ## How was this patch tested?
      
      Monitored executor and yarn logs in debug mode to verify the commands through which they are being launched in client and cluster mode. The driver memory was verified locally using jps -v. Setting up -Xmx parameter in the javaExtraOptions raises exception with the info provided.
      
      Author: Dhruve Ashar <dhruveashar@gmail.com>
      
      Closes #12115 from dhruve/impr/SPARK-12384.
      033d8081
  32. Apr 06, 2016
    • Marcelo Vanzin's avatar
      [SPARK-14134][CORE] Change the package name used for shading classes. · 21d5ca12
      Marcelo Vanzin authored
      The current package name uses a dash, which is a little weird but seemed
      to work. That is, until a new test tried to mock a class that references
      one of those shaded types, and then things started failing.
      
      Most changes are just noise to fix the logging configs.
      
      For reference, SPARK-8815 also raised this issue, although at the time it
      did not cause any issues in Spark, so it was not addressed.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #11941 from vanzin/SPARK-14134.
      21d5ca12
    • Marcelo Vanzin's avatar
      [SPARK-14391][LAUNCHER] Increase test timeouts. · de479260
      Marcelo Vanzin authored
      Most of the time tests should still pass really quickly; it's just
      when machines are overloaded that the tests may take a little time,
      but that's still preferable over just failing the test.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #12210 from vanzin/SPARK-14391.
      de479260
  33. Apr 04, 2016
    • Marcelo Vanzin's avatar
      [SPARK-13579][BUILD] Stop building the main Spark assembly. · 24d7d2e4
      Marcelo Vanzin authored
      This change modifies the "assembly/" module to just copy needed
      dependencies to its build directory, and modifies the packaging
      script to pick those up (and remove duplicate jars packages in the
      examples module).
      
      I also made some minor adjustments to dependencies to remove some
      test jars from the final packaging, and remove jars that conflict with each
      other when packaged separately (e.g. servlet api).
      
      Also note that this change restores guava in applications' classpaths, even
      though it's still shaded inside Spark. This is now needed for the Hadoop
      libraries that are packaged with Spark, which now are not processed by
      the shade plugin.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #11796 from vanzin/SPARK-13579.
      24d7d2e4
  34. Apr 03, 2016
    • Dongjoon Hyun's avatar
      [SPARK-14355][BUILD] Fix typos in Exception/Testcase/Comments and static analysis results · 3f749f7e
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      This PR contains the following 5 types of maintenance fix over 59 files (+94 lines, -93 lines).
      - Fix typos(exception/log strings, testcase name, comments) in 44 lines.
      - Fix lint-java errors (MaxLineLength) in 6 lines. (New codes after SPARK-14011)
      - Use diamond operators in 40 lines. (New codes after SPARK-13702)
      - Fix redundant semicolon in 5 lines.
      - Rename class `InferSchemaSuite` to `CSVInferSchemaSuite` in CSVInferSchemaSuite.scala.
      
      ## How was this patch tested?
      
      Manual and pass the Jenkins tests.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #12139 from dongjoon-hyun/SPARK-14355.
      3f749f7e
  35. Mar 30, 2016
    • Marcelo Vanzin's avatar
      [SPARK-13955][YARN] Also look for Spark jars in the build directory. · bdabfd43
      Marcelo Vanzin authored
      Move the logic to find Spark jars to CommandBuilderUtils and make it
      available for YARN code, so that it's possible to easily launch Spark
      on YARN from a build directory.
      
      Tested by running SparkPi from the build directory on YARN.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #11970 from vanzin/SPARK-13955.
      bdabfd43
  36. Mar 21, 2016
    • Dongjoon Hyun's avatar
      [SPARK-14011][CORE][SQL] Enable `LineLength` Java checkstyle rule · 20fd2541
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      [Spark Coding Style Guide](https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide) has 100-character limit on lines, but it's disabled for Java since 11/09/15. This PR enables **LineLength** checkstyle again. To help that, this also introduces **RedundantImport** and **RedundantModifier**, too. The following is the diff on `checkstyle.xml`.
      
      ```xml
      -        <!-- TODO: 11/09/15 disabled - the lengths are currently > 100 in many places -->
      -        <!--
               <module name="LineLength">
                   <property name="max" value="100"/>
                   <property name="ignorePattern" value="^package.*|^import.*|a href|href|http://|https://|ftp://"/>
               </module>
      -        -->
               <module name="NoLineWrap"/>
               <module name="EmptyBlock">
                   <property name="option" value="TEXT"/>
       -167,5 +164,7
               </module>
               <module name="CommentsIndentation"/>
               <module name="UnusedImports"/>
      +        <module name="RedundantImport"/>
      +        <module name="RedundantModifier"/>
      ```
      
      ## How was this patch tested?
      
      Currently, `lint-java` is disabled in Jenkins. It needs a manual test.
      After passing the Jenkins tests, `dev/lint-java` should passes locally.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #11831 from dongjoon-hyun/SPARK-14011.
      20fd2541
  37. Mar 16, 2016
    • Sean Owen's avatar
      [SPARK-13823][HOTFIX] Increase tryAcquire timeout and assert it succeeds to... · 9412547e
      Sean Owen authored
      [SPARK-13823][HOTFIX] Increase tryAcquire timeout and assert it succeeds to fix failure on slow machines
      
      ## What changes were proposed in this pull request?
      
      I'm seeing several PR builder builds fail after https://github.com/apache/spark/pull/11725/files. Example:
      
      https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.4/lastFailedBuild/console
      
      ```
      testCommunication(org.apache.spark.launcher.LauncherServerSuite)  Time elapsed: 0.023 sec  <<< FAILURE!
      java.lang.AssertionError: expected:<app-id> but was:<null>
      	at org.apache.spark.launcher.LauncherServerSuite.testCommunication(LauncherServerSuite.java:93)
      ```
      
      However, other builds pass this same test, including the test when run locally and on the Jenkins PR builder. The failure itself concerns a change to how the test waits on a condition, and the wait can time out; therefore I think this is due to fast/slow machine differences.
      
      This is an attempt at a hot fix; it's a little hard to verify since locally and on the PR builder, it passes anyway. The change itself should be harmless anyway.
      
      Why didn't this happen before, if the new logic was supposed to be equivalent to the old? I think this is the sequence:
      
      - First attempt to acquire semaphore for 10ms actually silently times out
      - The changed being waited for happens just after that, a bit too late
      - Assertion passes since condition became true just in time
      - `release()` fires from the listener
      - Next `tryAcquire` however immediately succeeds because the first `tryAcquire` didn't acquire anything, but its subsequent condition is not yet true; this would explain why the second one always fails
      
      Versus the original using `notifyAll()`, there's a small difference: `wait()`-ing after `notifyAll()` just results in another wait; it doesn't make it return immediately. So this was a tiny latent issue that was masked by the semantics. Now the test asserts that the event actually happened (semaphore was acquired). (The timeout is still here to prevent the test from hanging forever, and to detect really slow response.) The timeout is increased to a second to allow plenty of time anyway.
      
      ## How was this patch tested?
      
      Jenkins tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #11763 from srowen/SPARK-13823.3.
      9412547e
Loading