Skip to content
Snippets Groups Projects
  1. Apr 06, 2016
    • Marcelo Vanzin's avatar
      [SPARK-14134][CORE] Change the package name used for shading classes. · 21d5ca12
      Marcelo Vanzin authored
      The current package name uses a dash, which is a little weird but seemed
      to work. That is, until a new test tried to mock a class that references
      one of those shaded types, and then things started failing.
      
      Most changes are just noise to fix the logging configs.
      
      For reference, SPARK-8815 also raised this issue, although at the time it
      did not cause any issues in Spark, so it was not addressed.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #11941 from vanzin/SPARK-14134.
      21d5ca12
    • Marcelo Vanzin's avatar
      [SPARK-14391][LAUNCHER] Increase test timeouts. · de479260
      Marcelo Vanzin authored
      Most of the time tests should still pass really quickly; it's just
      when machines are overloaded that the tests may take a little time,
      but that's still preferable over just failing the test.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #12210 from vanzin/SPARK-14391.
      de479260
  2. Apr 04, 2016
    • Marcelo Vanzin's avatar
      [SPARK-13579][BUILD] Stop building the main Spark assembly. · 24d7d2e4
      Marcelo Vanzin authored
      This change modifies the "assembly/" module to just copy needed
      dependencies to its build directory, and modifies the packaging
      script to pick those up (and remove duplicate jars packages in the
      examples module).
      
      I also made some minor adjustments to dependencies to remove some
      test jars from the final packaging, and remove jars that conflict with each
      other when packaged separately (e.g. servlet api).
      
      Also note that this change restores guava in applications' classpaths, even
      though it's still shaded inside Spark. This is now needed for the Hadoop
      libraries that are packaged with Spark, which now are not processed by
      the shade plugin.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #11796 from vanzin/SPARK-13579.
      24d7d2e4
  3. Apr 03, 2016
    • Dongjoon Hyun's avatar
      [SPARK-14355][BUILD] Fix typos in Exception/Testcase/Comments and static analysis results · 3f749f7e
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      This PR contains the following 5 types of maintenance fix over 59 files (+94 lines, -93 lines).
      - Fix typos(exception/log strings, testcase name, comments) in 44 lines.
      - Fix lint-java errors (MaxLineLength) in 6 lines. (New codes after SPARK-14011)
      - Use diamond operators in 40 lines. (New codes after SPARK-13702)
      - Fix redundant semicolon in 5 lines.
      - Rename class `InferSchemaSuite` to `CSVInferSchemaSuite` in CSVInferSchemaSuite.scala.
      
      ## How was this patch tested?
      
      Manual and pass the Jenkins tests.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #12139 from dongjoon-hyun/SPARK-14355.
      3f749f7e
  4. Mar 30, 2016
    • Marcelo Vanzin's avatar
      [SPARK-13955][YARN] Also look for Spark jars in the build directory. · bdabfd43
      Marcelo Vanzin authored
      Move the logic to find Spark jars to CommandBuilderUtils and make it
      available for YARN code, so that it's possible to easily launch Spark
      on YARN from a build directory.
      
      Tested by running SparkPi from the build directory on YARN.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #11970 from vanzin/SPARK-13955.
      bdabfd43
  5. Mar 21, 2016
    • Dongjoon Hyun's avatar
      [SPARK-14011][CORE][SQL] Enable `LineLength` Java checkstyle rule · 20fd2541
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      [Spark Coding Style Guide](https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide) has 100-character limit on lines, but it's disabled for Java since 11/09/15. This PR enables **LineLength** checkstyle again. To help that, this also introduces **RedundantImport** and **RedundantModifier**, too. The following is the diff on `checkstyle.xml`.
      
      ```xml
      -        <!-- TODO: 11/09/15 disabled - the lengths are currently > 100 in many places -->
      -        <!--
               <module name="LineLength">
                   <property name="max" value="100"/>
                   <property name="ignorePattern" value="^package.*|^import.*|a href|href|http://|https://|ftp://"/>
               </module>
      -        -->
               <module name="NoLineWrap"/>
               <module name="EmptyBlock">
                   <property name="option" value="TEXT"/>
       -167,5 +164,7
               </module>
               <module name="CommentsIndentation"/>
               <module name="UnusedImports"/>
      +        <module name="RedundantImport"/>
      +        <module name="RedundantModifier"/>
      ```
      
      ## How was this patch tested?
      
      Currently, `lint-java` is disabled in Jenkins. It needs a manual test.
      After passing the Jenkins tests, `dev/lint-java` should passes locally.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #11831 from dongjoon-hyun/SPARK-14011.
      20fd2541
  6. Mar 16, 2016
    • Sean Owen's avatar
      [SPARK-13823][HOTFIX] Increase tryAcquire timeout and assert it succeeds to... · 9412547e
      Sean Owen authored
      [SPARK-13823][HOTFIX] Increase tryAcquire timeout and assert it succeeds to fix failure on slow machines
      
      ## What changes were proposed in this pull request?
      
      I'm seeing several PR builder builds fail after https://github.com/apache/spark/pull/11725/files. Example:
      
      https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.4/lastFailedBuild/console
      
      ```
      testCommunication(org.apache.spark.launcher.LauncherServerSuite)  Time elapsed: 0.023 sec  <<< FAILURE!
      java.lang.AssertionError: expected:<app-id> but was:<null>
      	at org.apache.spark.launcher.LauncherServerSuite.testCommunication(LauncherServerSuite.java:93)
      ```
      
      However, other builds pass this same test, including the test when run locally and on the Jenkins PR builder. The failure itself concerns a change to how the test waits on a condition, and the wait can time out; therefore I think this is due to fast/slow machine differences.
      
      This is an attempt at a hot fix; it's a little hard to verify since locally and on the PR builder, it passes anyway. The change itself should be harmless anyway.
      
      Why didn't this happen before, if the new logic was supposed to be equivalent to the old? I think this is the sequence:
      
      - First attempt to acquire semaphore for 10ms actually silently times out
      - The changed being waited for happens just after that, a bit too late
      - Assertion passes since condition became true just in time
      - `release()` fires from the listener
      - Next `tryAcquire` however immediately succeeds because the first `tryAcquire` didn't acquire anything, but its subsequent condition is not yet true; this would explain why the second one always fails
      
      Versus the original using `notifyAll()`, there's a small difference: `wait()`-ing after `notifyAll()` just results in another wait; it doesn't make it return immediately. So this was a tiny latent issue that was masked by the semantics. Now the test asserts that the event actually happened (semaphore was acquired). (The timeout is still here to prevent the test from hanging forever, and to detect really slow response.) The timeout is increased to a second to allow plenty of time anyway.
      
      ## How was this patch tested?
      
      Jenkins tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #11763 from srowen/SPARK-13823.3.
      9412547e
    • Sean Owen's avatar
      [SPARK-13823][SPARK-13397][SPARK-13395][CORE] More warnings, StandardCharset follow up · 3b461d9e
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Follow up to https://github.com/apache/spark/pull/11657
      
      - Also update `String.getBytes("UTF-8")` to use `StandardCharsets.UTF_8`
      - And fix one last new Coverity warning that turned up (use of unguarded `wait()` replaced by simpler/more robust `java.util.concurrent` classes in tests)
      - And while we're here cleaning up Coverity warnings, just fix about 15 more build warnings
      
      ## How was this patch tested?
      
      Jenkins tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #11725 from srowen/SPARK-13823.2.
      3b461d9e
  7. Mar 15, 2016
    • Marcelo Vanzin's avatar
      [SPARK-13576][BUILD] Don't create assembly for examples. · 48978abf
      Marcelo Vanzin authored
      As part of the goal to stop creating assemblies in Spark, this change
      modifies the mvn and sbt builds to not create an assembly for examples.
      
      Instead, dependencies are copied to the build directory (under
      target/scala-xx/jars), and in the final archive, into the "examples/jars"
      directory.
      
      To avoid having to deal too much with Windows batch files, I made examples
      run through the launcher library; the spark-submit launcher now has a
      special mode to run examples, which adds all the necessary jars to the
      spark-submit command line, and replaces the bash and batch scripts that
      were used to run examples. The scripts are now just a thin wrapper around
      spark-submit; another advantage is that now all spark-submit options are
      supported.
      
      There are a few glitches; in the mvn build, a lot of duplicated dependencies
      get copied, because they are promoted to "compile" scope due to extra
      dependencies in the examples module (such as HBase). In the sbt build,
      all dependencies are copied, because there doesn't seem to be an easy
      way to filter things.
      
      I plan to clean some of this up when the rest of the tasks are finished.
      When the main assembly is replaced with jars, we can remove duplicate jars
      from the examples directory during packaging.
      
      Tested by running SparkPi in: maven build, sbt build, dist created by
      make-distribution.sh.
      
      Finally: note that running the "assembly" target in sbt doesn't build
      the examples anymore. You need to run "package" for that.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #11452 from vanzin/SPARK-13576.
      48978abf
  8. Mar 14, 2016
    • Marcelo Vanzin's avatar
      [SPARK-13578][CORE] Modify launch scripts to not use assemblies. · 45f8053b
      Marcelo Vanzin authored
      Instead of looking for a specially-named assembly, the scripts now will
      blindly add all jars under the libs directory to the classpath. This
      libs directory is still currently the old assembly dir, so things should
      keep working the same way as before until we make more packaging changes.
      
      The only lost feature is the detection of multiple assemblies; I consider
      that a minor nicety that only really affects few developers, so it's probably
      ok.
      
      Tested locally by running spark-shell; also did some minor Win32 testing
      (just made sure spark-shell started).
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #11591 from vanzin/SPARK-13578.
      45f8053b
  9. Mar 13, 2016
    • Sean Owen's avatar
      [SPARK-13823][CORE][STREAMING][SQL] Always specify Charset in String <->... · 18408528
      Sean Owen authored
      [SPARK-13823][CORE][STREAMING][SQL] Always specify Charset in String <-> byte[] conversions (and remaining Coverity items)
      
      ## What changes were proposed in this pull request?
      
      - Fixes calls to `new String(byte[])` or `String.getBytes()` that rely on platform default encoding, to use UTF-8
      - Same for `InputStreamReader` and `OutputStreamWriter` constructors
      - Standardizes on UTF-8 everywhere
      - Standardizes specifying the encoding with `StandardCharsets.UTF-8`, not the Guava constant or "UTF-8" (which means handling `UnuspportedEncodingException`)
      - (also addresses the other remaining Coverity scan issues, which are pretty trivial; these are separated into commit https://github.com/srowen/spark/commit/1deecd8d9ca986d8adb1a42d315890ce5349d29c )
      
      ## How was this patch tested?
      
      Jenkins tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #11657 from srowen/SPARK-13823.
      18408528
  10. Mar 11, 2016
    • Josh Rosen's avatar
      [SPARK-13294][PROJECT INFRA] Remove MiMa's dependency on spark-class / Spark assembly · 6ca990fb
      Josh Rosen authored
      This patch removes the need to build a full Spark assembly before running the `dev/mima` script.
      
      - I modified the `tools` project to remove a direct dependency on Spark, so `sbt/sbt tools/fullClasspath` will now return the classpath for the `GenerateMIMAIgnore` class itself plus its own dependencies.
         - This required me to delete two classes full of dead code that we don't use anymore
      - `GenerateMIMAIgnore` now uses [ClassUtil](http://software.clapper.org/classutil/) to find all of the Spark classes rather than our homemade JAR traversal code. The problem in our own code was that it didn't handle folders of classes properly, which is necessary in order to generate excludes with an assembly-free Spark build.
      - `./dev/mima` no longer runs through `spark-class`, eliminating the need to reason about classpath ordering between `SPARK_CLASSPATH` and the assembly.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #11178 from JoshRosen/remove-assembly-in-run-tests.
      6ca990fb
  11. Mar 09, 2016
    • Dongjoon Hyun's avatar
      [SPARK-13702][CORE][SQL][MLLIB] Use diamond operator for generic instance creation in Java code. · c3689bc2
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      In order to make `docs/examples` (and other related code) more simple/readable/user-friendly, this PR replaces existing codes like the followings by using `diamond` operator.
      
      ```
      -    final ArrayList<Product2<Object, Object>> dataToWrite =
      -      new ArrayList<Product2<Object, Object>>();
      +    final ArrayList<Product2<Object, Object>> dataToWrite = new ArrayList<>();
      ```
      
      Java 7 or higher supports **diamond** operator which replaces the type arguments required to invoke the constructor of a generic class with an empty set of type parameters (<>). Currently, Spark Java code use mixed usage of this.
      
      ## How was this patch tested?
      
      Manual.
      Pass the existing tests.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #11541 from dongjoon-hyun/SPARK-13702.
      c3689bc2
  12. Mar 03, 2016
    • Dongjoon Hyun's avatar
      [SPARK-13583][CORE][STREAMING] Remove unused imports and add checkstyle rule · b5f02d67
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      After SPARK-6990, `dev/lint-java` keeps Java code healthy and helps PR review by saving much time.
      This issue aims remove unused imports from Java/Scala code and add `UnusedImports` checkstyle rule to help developers.
      
      ## How was this patch tested?
      ```
      ./dev/lint-java
      ./build/sbt compile
      ```
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #11438 from dongjoon-hyun/SPARK-13583.
      b5f02d67
  13. Feb 28, 2016
    • Reynold Xin's avatar
      [SPARK-13529][BUILD] Move network/* modules into common/network-* · 9e01dcc6
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      As the title says, this moves the three modules currently in network/ into common/network-*. This removes one top level, non-user-facing folder.
      
      ## How was this patch tested?
      Compilation and existing tests. We should run both SBT and Maven.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #11409 from rxin/SPARK-13529.
      9e01dcc6
  14. Feb 25, 2016
  15. Feb 14, 2016
  16. Jan 30, 2016
    • Josh Rosen's avatar
      [SPARK-6363][BUILD] Make Scala 2.11 the default Scala version · 289373b2
      Josh Rosen authored
      This patch changes Spark's build to make Scala 2.11 the default Scala version. To be clear, this does not mean that Spark will stop supporting Scala 2.10: users will still be able to compile Spark for Scala 2.10 by following the instructions on the "Building Spark" page; however, it does mean that Scala 2.11 will be the default Scala version used by our CI builds (including pull request builds).
      
      The Scala 2.11 compiler is faster than 2.10, so I think we'll be able to look forward to a slight speedup in our CI builds (it looks like it's about 2X faster for the Maven compile-only builds, for instance).
      
      After this patch is merged, I'll update Jenkins to add new compile-only jobs to ensure that Scala 2.10 compilation doesn't break.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10608 from JoshRosen/SPARK-6363.
      289373b2
  17. Jan 14, 2016
  18. Dec 28, 2015
  19. Dec 20, 2015
  20. Dec 19, 2015
  21. Dec 15, 2015
  22. Nov 23, 2015
    • Marcelo Vanzin's avatar
      [SPARK-11140][CORE] Transfer files using network lib when using NettyRpcEnv. · c2467dad
      Marcelo Vanzin authored
      This change abstracts the code that serves jars / files to executors so that
      each RpcEnv can have its own implementation; the akka version uses the existing
      HTTP-based file serving mechanism, while the netty versions uses the new
      stream support added to the network lib, which makes file transfers benefit
      from the easier security configuration of the network library, and should also
      reduce overhead overall.
      
      The change includes a small fix to TransportChannelHandler so that it propagates
      user events to downstream handlers.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #9530 from vanzin/SPARK-11140.
      c2467dad
  23. Nov 17, 2015
  24. Nov 12, 2015
    • Marcelo Vanzin's avatar
      [SPARK-11655][CORE] Fix deadlock in handling of launcher stop(). · 767d288b
      Marcelo Vanzin authored
      The stop() callback was trying to close the launcher connection in the
      same thread that handles connection data, which ended up causing a
      deadlock. So avoid that by dispatching the stop() request in its own
      thread.
      
      On top of that, add some exception safety to a few parts of the code,
      and use "destroyForcibly" from Java 8 if it's available, to force
      kill the child process. The flip side is that "kill()" may not actually
      work if running Java 7.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #9633 from vanzin/SPARK-11655.
      767d288b
  25. Oct 29, 2015
  26. Oct 15, 2015
  27. Oct 09, 2015
    • Marcelo Vanzin's avatar
      [SPARK-8673] [LAUNCHER] API and infrastructure for communicating with child apps. · 015f7ef5
      Marcelo Vanzin authored
      This change adds an API that encapsulates information about an app
      launched using the library. It also creates a socket-based communication
      layer for apps that are launched as child processes; the launching
      application listens for connections from launched apps, and once
      communication is established, the channel can be used to send updates
      to the launching app, or to send commands to the child app.
      
      The change also includes hooks for local, standalone/client and yarn
      masters.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #7052 from vanzin/SPARK-8673.
      015f7ef5
  28. Oct 07, 2015
  29. Oct 06, 2015
  30. Sep 15, 2015
  31. Aug 28, 2015
    • Marcelo Vanzin's avatar
      [SPARK-9284] [TESTS] Allow all tests to run without an assembly. · c53c902f
      Marcelo Vanzin authored
      This change aims at speeding up the dev cycle a little bit, by making
      sure that all tests behave the same w.r.t. where the code to be tested
      is loaded from. Namely, that means that tests don't rely on the assembly
      anymore, rather loading all needed classes from the build directories.
      
      The main change is to make sure all build directories (classes and test-classes)
      are added to the classpath of child processes when running tests.
      
      YarnClusterSuite required some custom code since the executors are run
      differently (i.e. not through the launcher library, like standalone and
      Mesos do).
      
      I also found a couple of tests that could leak a SparkContext on failure,
      and added code to handle those.
      
      With this patch, it's possible to run the following command from a clean
      source directory and have all tests pass:
      
        mvn -Pyarn -Phadoop-2.4 -Phive-thriftserver install
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #7629 from vanzin/SPARK-9284.
      c53c902f
  32. Aug 15, 2015
  33. Aug 11, 2015
    • Marcelo Vanzin's avatar
      [SPARK-9074] [LAUNCHER] Allow arbitrary Spark args to be set. · 5a5bbc29
      Marcelo Vanzin authored
      This change allows any Spark argument to be added to the app to
      be started using SparkLauncher. Known arguments are properly
      validated, while unknown arguments are allowed so that the
      library can launch newer Spark versions (in case SPARK_HOME points
      at one).
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #7975 from vanzin/SPARK-9074 and squashes the following commits:
      
      b5e451a [Marcelo Vanzin] [SPARK-9074] [launcher] Allow arbitrary Spark args to be set.
      5a5bbc29
  34. Aug 03, 2015
    • Burak Yavuz's avatar
      [SPARK-9263] Added flags to exclude dependencies when using --packages · 1633d0a2
      Burak Yavuz authored
      While the functionality is there to exclude packages, there are no flags that allow users to exclude dependencies, in case of dependency conflicts. We should provide users with a flag to add dependency exclusions in case the packages are not resolved properly (or not available due to licensing).
      
      The flag I added was --packages-exclude, but I'm open on renaming it. I also added property flags in case people would like to use a conf file to provide dependencies, which is possible if there is a long list of dependencies or exclusions.
      
      cc andrewor14 vanzin pwendell
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      
      Closes #7599 from brkyvz/packages-exclusions and squashes the following commits:
      
      636f410 [Burak Yavuz] addressed nits
      6e54ede [Burak Yavuz] is this the culprit
      b5e508e [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into packages-exclusions
      154f5db [Burak Yavuz] addressed initial comments
      1536d7a [Burak Yavuz] Added flags to exclude packages using --packages-exclude
      1633d0a2
    • Timothy Chen's avatar
      [SPARK-8873] [MESOS] Clean up shuffle files if external shuffle service is used · 95dccc63
      Timothy Chen authored
      This patch builds directly on #7820, which is largely written by tnachen. The only addition is one commit for cleaning up the code. There should be no functional differences between this and #7820.
      
      Author: Timothy Chen <tnachen@gmail.com>
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #7881 from andrewor14/tim-cleanup-mesos-shuffle and squashes the following commits:
      
      8894f7d [Andrew Or] Clean up code
      2a5fa10 [Andrew Or] Merge branch 'mesos_shuffle_clean' of github.com:tnachen/spark into tim-cleanup-mesos-shuffle
      fadff89 [Timothy Chen] Address comments.
      e4d0f1d [Timothy Chen] Clean up external shuffle data on driver exit with Mesos.
      95dccc63
Loading