Skip to content
Snippets Groups Projects
  1. Mar 03, 2016
    • Dongjoon Hyun's avatar
      [SPARK-13583][CORE][STREAMING] Remove unused imports and add checkstyle rule · b5f02d67
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      After SPARK-6990, `dev/lint-java` keeps Java code healthy and helps PR review by saving much time.
      This issue aims remove unused imports from Java/Scala code and add `UnusedImports` checkstyle rule to help developers.
      
      ## How was this patch tested?
      ```
      ./dev/lint-java
      ./build/sbt compile
      ```
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #11438 from dongjoon-hyun/SPARK-13583.
      b5f02d67
  2. Feb 28, 2016
    • Reynold Xin's avatar
      [SPARK-13529][BUILD] Move network/* modules into common/network-* · 9e01dcc6
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      As the title says, this moves the three modules currently in network/ into common/network-*. This removes one top level, non-user-facing folder.
      
      ## How was this patch tested?
      Compilation and existing tests. We should run both SBT and Maven.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #11409 from rxin/SPARK-13529.
      9e01dcc6
  3. Feb 25, 2016
  4. Feb 14, 2016
  5. Jan 30, 2016
    • Josh Rosen's avatar
      [SPARK-6363][BUILD] Make Scala 2.11 the default Scala version · 289373b2
      Josh Rosen authored
      This patch changes Spark's build to make Scala 2.11 the default Scala version. To be clear, this does not mean that Spark will stop supporting Scala 2.10: users will still be able to compile Spark for Scala 2.10 by following the instructions on the "Building Spark" page; however, it does mean that Scala 2.11 will be the default Scala version used by our CI builds (including pull request builds).
      
      The Scala 2.11 compiler is faster than 2.10, so I think we'll be able to look forward to a slight speedup in our CI builds (it looks like it's about 2X faster for the Maven compile-only builds, for instance).
      
      After this patch is merged, I'll update Jenkins to add new compile-only jobs to ensure that Scala 2.10 compilation doesn't break.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10608 from JoshRosen/SPARK-6363.
      289373b2
  6. Jan 14, 2016
  7. Dec 28, 2015
  8. Dec 20, 2015
  9. Dec 19, 2015
  10. Dec 15, 2015
  11. Nov 23, 2015
    • Marcelo Vanzin's avatar
      [SPARK-11140][CORE] Transfer files using network lib when using NettyRpcEnv. · c2467dad
      Marcelo Vanzin authored
      This change abstracts the code that serves jars / files to executors so that
      each RpcEnv can have its own implementation; the akka version uses the existing
      HTTP-based file serving mechanism, while the netty versions uses the new
      stream support added to the network lib, which makes file transfers benefit
      from the easier security configuration of the network library, and should also
      reduce overhead overall.
      
      The change includes a small fix to TransportChannelHandler so that it propagates
      user events to downstream handlers.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #9530 from vanzin/SPARK-11140.
      c2467dad
  12. Nov 17, 2015
  13. Nov 12, 2015
    • Marcelo Vanzin's avatar
      [SPARK-11655][CORE] Fix deadlock in handling of launcher stop(). · 767d288b
      Marcelo Vanzin authored
      The stop() callback was trying to close the launcher connection in the
      same thread that handles connection data, which ended up causing a
      deadlock. So avoid that by dispatching the stop() request in its own
      thread.
      
      On top of that, add some exception safety to a few parts of the code,
      and use "destroyForcibly" from Java 8 if it's available, to force
      kill the child process. The flip side is that "kill()" may not actually
      work if running Java 7.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #9633 from vanzin/SPARK-11655.
      767d288b
  14. Oct 29, 2015
  15. Oct 15, 2015
  16. Oct 09, 2015
    • Marcelo Vanzin's avatar
      [SPARK-8673] [LAUNCHER] API and infrastructure for communicating with child apps. · 015f7ef5
      Marcelo Vanzin authored
      This change adds an API that encapsulates information about an app
      launched using the library. It also creates a socket-based communication
      layer for apps that are launched as child processes; the launching
      application listens for connections from launched apps, and once
      communication is established, the channel can be used to send updates
      to the launching app, or to send commands to the child app.
      
      The change also includes hooks for local, standalone/client and yarn
      masters.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #7052 from vanzin/SPARK-8673.
      015f7ef5
  17. Oct 07, 2015
  18. Oct 06, 2015
  19. Sep 15, 2015
  20. Aug 28, 2015
    • Marcelo Vanzin's avatar
      [SPARK-9284] [TESTS] Allow all tests to run without an assembly. · c53c902f
      Marcelo Vanzin authored
      This change aims at speeding up the dev cycle a little bit, by making
      sure that all tests behave the same w.r.t. where the code to be tested
      is loaded from. Namely, that means that tests don't rely on the assembly
      anymore, rather loading all needed classes from the build directories.
      
      The main change is to make sure all build directories (classes and test-classes)
      are added to the classpath of child processes when running tests.
      
      YarnClusterSuite required some custom code since the executors are run
      differently (i.e. not through the launcher library, like standalone and
      Mesos do).
      
      I also found a couple of tests that could leak a SparkContext on failure,
      and added code to handle those.
      
      With this patch, it's possible to run the following command from a clean
      source directory and have all tests pass:
      
        mvn -Pyarn -Phadoop-2.4 -Phive-thriftserver install
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #7629 from vanzin/SPARK-9284.
      c53c902f
  21. Aug 15, 2015
  22. Aug 11, 2015
    • Marcelo Vanzin's avatar
      [SPARK-9074] [LAUNCHER] Allow arbitrary Spark args to be set. · 5a5bbc29
      Marcelo Vanzin authored
      This change allows any Spark argument to be added to the app to
      be started using SparkLauncher. Known arguments are properly
      validated, while unknown arguments are allowed so that the
      library can launch newer Spark versions (in case SPARK_HOME points
      at one).
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #7975 from vanzin/SPARK-9074 and squashes the following commits:
      
      b5e451a [Marcelo Vanzin] [SPARK-9074] [launcher] Allow arbitrary Spark args to be set.
      5a5bbc29
  23. Aug 03, 2015
    • Burak Yavuz's avatar
      [SPARK-9263] Added flags to exclude dependencies when using --packages · 1633d0a2
      Burak Yavuz authored
      While the functionality is there to exclude packages, there are no flags that allow users to exclude dependencies, in case of dependency conflicts. We should provide users with a flag to add dependency exclusions in case the packages are not resolved properly (or not available due to licensing).
      
      The flag I added was --packages-exclude, but I'm open on renaming it. I also added property flags in case people would like to use a conf file to provide dependencies, which is possible if there is a long list of dependencies or exclusions.
      
      cc andrewor14 vanzin pwendell
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      
      Closes #7599 from brkyvz/packages-exclusions and squashes the following commits:
      
      636f410 [Burak Yavuz] addressed nits
      6e54ede [Burak Yavuz] is this the culprit
      b5e508e [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into packages-exclusions
      154f5db [Burak Yavuz] addressed initial comments
      1536d7a [Burak Yavuz] Added flags to exclude packages using --packages-exclude
      1633d0a2
    • Timothy Chen's avatar
      [SPARK-8873] [MESOS] Clean up shuffle files if external shuffle service is used · 95dccc63
      Timothy Chen authored
      This patch builds directly on #7820, which is largely written by tnachen. The only addition is one commit for cleaning up the code. There should be no functional differences between this and #7820.
      
      Author: Timothy Chen <tnachen@gmail.com>
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #7881 from andrewor14/tim-cleanup-mesos-shuffle and squashes the following commits:
      
      8894f7d [Andrew Or] Clean up code
      2a5fa10 [Andrew Or] Merge branch 'mesos_shuffle_clean' of github.com:tnachen/spark into tim-cleanup-mesos-shuffle
      fadff89 [Timothy Chen] Address comments.
      e4d0f1d [Timothy Chen] Clean up external shuffle data on driver exit with Mesos.
      95dccc63
  24. Jul 14, 2015
    • Joseph Gonzalez's avatar
      [SPARK-9001] Fixing errors in javadocs that lead to failed build/sbt doc · 20c1434a
      Joseph Gonzalez authored
      These are minor corrections in the documentation of several classes that are preventing:
      
      ```bash
      build/sbt publish-local
      ```
      
      I believe this might be an issue associated with running JDK8 as ankurdave does not appear to have this issue in JDK7.
      
      Author: Joseph Gonzalez <joseph.e.gonzalez@gmail.com>
      
      Closes #7354 from jegonzal/FixingJavadocErrors and squashes the following commits:
      
      6664b7e [Joseph Gonzalez] making requested changes
      2e16d89 [Joseph Gonzalez] Fixing errors in javadocs that prevents build/sbt publish-local from completing.
      20c1434a
  25. Jul 03, 2015
  26. Jul 02, 2015
    • Ilya Ganelin's avatar
      [SPARK-3071] Increase default driver memory · 3697232b
      Ilya Ganelin authored
      I've updated default values in comments, documentation, and in the command line builder to be 1g based on comments in the JIRA. I've also updated most usages to point at a single variable defined in the Utils.scala and JavaUtils.java files. This wasn't possible in all cases (R, shell scripts etc.) but usage in most code is now pointing at the same place.
      
      Please let me know if I've missed anything.
      
      Will the spark-shell use the value within the command line builder during instantiation?
      
      Author: Ilya Ganelin <ilya.ganelin@capitalone.com>
      
      Closes #7132 from ilganeli/SPARK-3071 and squashes the following commits:
      
      4074164 [Ilya Ganelin] String fix
      271610b [Ilya Ganelin] Merge branch 'SPARK-3071' of github.com:ilganeli/spark into SPARK-3071
      273b6e9 [Ilya Ganelin] Test fix
      fd67721 [Ilya Ganelin] Update JavaUtils.java
      26cc177 [Ilya Ganelin] test fix
      e5db35d [Ilya Ganelin] Fixed test failure
      39732a1 [Ilya Ganelin] merge fix
      a6f7deb [Ilya Ganelin] Created default value for DRIVER MEM in Utils that's now used in almost all locations instead of setting manually in each
      09ad698 [Ilya Ganelin] Update SubmitRestProtocolSuite.scala
      19b6f25 [Ilya Ganelin] Missed one doc update
      2698a3d [Ilya Ganelin] Updated default value for driver memory
      3697232b
  27. Jun 29, 2015
    • Josh Rosen's avatar
      [SPARK-8709] Exclude hadoop-client's mockito-all dependency · 27ef8545
      Josh Rosen authored
      This patch excludes `hadoop-client`'s dependency on `mockito-all`.  As of #7061, Spark depends on `mockito-core` instead of `mockito-all`, so the dependency from Hadoop was leading to test compilation failures for some of the Hadoop 2 SBT builds.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #7090 from JoshRosen/SPARK-8709 and squashes the following commits:
      
      e190122 [Josh Rosen] [SPARK-8709] Exclude hadoop-client's mockito-all dependency.
      27ef8545
  28. Jun 28, 2015
    • Josh Rosen's avatar
      [SPARK-8683] [BUILD] Depend on mockito-core instead of mockito-all · f5100451
      Josh Rosen authored
      Spark's tests currently depend on `mockito-all`, which bundles Hamcrest and Objenesis classes. Instead, it should depend on `mockito-core`, which declares those libraries as Maven dependencies. This is necessary in order to fix a dependency conflict that leads to a NoSuchMethodError when using certain Hamcrest matchers.
      
      See https://github.com/mockito/mockito/wiki/Declaring-mockito-dependency for more details.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #7061 from JoshRosen/mockito-core-instead-of-all and squashes the following commits:
      
      70eccbe [Josh Rosen] Depend on mockito-core instead of mockito-all.
      f5100451
  29. Jun 10, 2015
  30. Jun 05, 2015
    • Marcelo Vanzin's avatar
      [SPARK-6324] [CORE] Centralize handling of script usage messages. · 700312e1
      Marcelo Vanzin authored
      Reorganize code so that the launcher library handles most of the work
      of printing usage messages, instead of having an awkward protocol between
      the library and the scripts for that.
      
      This mostly applies to SparkSubmit, since the launcher lib does not do
      command line parsing for classes invoked in other ways, and thus cannot
      handle failures for those. Most scripts end up going through SparkSubmit,
      though, so it all works.
      
      The change adds a new, internal command line switch, "--usage-error",
      which prints the usage message and exits with a non-zero status. Scripts
      can override the command printed in the usage message by setting an
      environment variable - this avoids having to grep the output of
      SparkSubmit to remove references to the "spark-submit" script.
      
      The only sub-optimal part of the change is the special handling for the
      spark-sql usage, which is now done in SparkSubmitArguments.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #5841 from vanzin/SPARK-6324 and squashes the following commits:
      
      2821481 [Marcelo Vanzin] Merge branch 'master' into SPARK-6324
      bf139b5 [Marcelo Vanzin] Filter output of Spark SQL CLI help.
      c6609bf [Marcelo Vanzin] Fix exit code never being used when printing usage messages.
      6bc1b41 [Marcelo Vanzin] [SPARK-6324] [core] Centralize handling of script usage messages.
      700312e1
  31. Jun 03, 2015
    • Patrick Wendell's avatar
      [SPARK-7801] [BUILD] Updating versions to SPARK 1.5.0 · 2c4d550e
      Patrick Wendell authored
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #6328 from pwendell/spark-1.5-update and squashes the following commits:
      
      2f42d02 [Patrick Wendell] A few more excludes
      4bebcf0 [Patrick Wendell] Update to RC4
      61aaf46 [Patrick Wendell] Using new release candidate
      55f1610 [Patrick Wendell] Another exclude
      04b4f04 [Patrick Wendell] More issues with transient 1.4 changes
      36f549b [Patrick Wendell] [SPARK-7801] [BUILD] Updating versions to SPARK 1.5.0
      2c4d550e
    • WangTaoTheTonic's avatar
      [MINOR] make the launcher project name consistent with others · ccaa8232
      WangTaoTheTonic authored
      I found this by chance while building spark and think it is better to keep its name consistent with other sub-projects (Spark Project *).
      
      I am not gonna file JIRA as it is a pretty small issue.
      
      Author: WangTaoTheTonic <wangtao111@huawei.com>
      
      Closes #6603 from WangTaoTheTonic/projName and squashes the following commits:
      
      994b3ba [WangTaoTheTonic] make the project name consistent
      ccaa8232
  32. May 30, 2015
    • WangTaoTheTonic's avatar
      [SPARK-7945] [CORE] Do trim to values in properties file · 9d8aadb7
      WangTaoTheTonic authored
      https://issues.apache.org/jira/browse/SPARK-7945
      
      Now applications submited by org.apache.spark.launcher.Main read properties file without doing trim to values in it.
      If user left a space after a value(say spark.driver.extraClassPath) then it probably affect global functions(like some jar could not be included in the classpath), so we should do it like Utils.getPropertiesFromFile.
      
      Author: WangTaoTheTonic <wangtao111@huawei.com>
      Author: Tao Wang <wangtao111@huawei.com>
      
      Closes #6496 from WangTaoTheTonic/SPARK-7945 and squashes the following commits:
      
      bb41b4b [Tao Wang] indent 4 to 2
      6dd1cf2 [WangTaoTheTonic] use a simpler way
      2c053a1 [WangTaoTheTonic] Do trim to values in properties file
      9d8aadb7
  33. May 13, 2015
    • Tim Ellison's avatar
      [MINOR] Avoid passing the PermGenSize option to IBM JVMs. · e676fc0c
      Tim Ellison authored
      IBM's Java VM doesn't have the concept of a permgen, so this option shouldn't be passed when the vendor property shows it is an IBM JDK.
      
      Author: Tim Ellison <t.p.ellison@gmail.com>
      Author: Tim Ellison <tellison@users.noreply.github.com>
      
      Closes #6055 from tellison/MaxPermSize and squashes the following commits:
      
      3a0fb66 [Tim Ellison] Convert tabs back to spaces
      6ad4266 [Tim Ellison] Remove unnecessary else clauses to reduce nesting.
      d27174b [Tim Ellison] Merge branch 'master' of https://github.com/apache/spark into MaxPermSize
      42a8c3f [Tim Ellison] [MINOR] Avoid passing the PermGenSize option to IBM JVMs.
      e676fc0c
  34. May 02, 2015
    • WangTaoTheTonic's avatar
      [SPARK-7031] [THRIFTSERVER] let thrift server take SPARK_DAEMON_MEMORY and SPARK_DAEMON_JAVA_OPTS · 49549d5a
      WangTaoTheTonic authored
      We should let Thrift Server take these two parameters as it is a daemon. And it is better to read driver-related configs as an app submited by spark-submit.
      
      https://issues.apache.org/jira/browse/SPARK-7031
      
      Author: WangTaoTheTonic <wangtao111@huawei.com>
      
      Closes #5609 from WangTaoTheTonic/SPARK-7031 and squashes the following commits:
      
      8d3fc16 [WangTaoTheTonic] indent
      035069b [WangTaoTheTonic] better code style
      d3ddfb6 [WangTaoTheTonic] revert the unnecessary changes in suite
      624e652 [WangTaoTheTonic] fix break tests
      0565831 [WangTaoTheTonic] fix failed tests
      4fb25ed [WangTaoTheTonic] let thrift server take SPARK_DAEMON_MEMORY and SPARK_DAEMON_JAVA_OPTS
      49549d5a
  35. May 01, 2015
    • Hari Shreedharan's avatar
      [SPARK-5342] [YARN] Allow long running Spark apps to run on secure YARN/HDFS · b1f4ca82
      Hari Shreedharan authored
      Take 2. Does the same thing as #4688, but fixes Hadoop-1 build.
      
      Author: Hari Shreedharan <hshreedharan@apache.org>
      
      Closes #5823 from harishreedharan/kerberos-longrunning and squashes the following commits:
      
      3c86bba [Hari Shreedharan] Import fixes. Import postfixOps explicitly.
      4d04301 [Hari Shreedharan] Minor formatting fixes.
      b5e7a72 [Hari Shreedharan] Remove reflection, use a method in SparkHadoopUtil to update the token renewer.
      7bff6e9 [Hari Shreedharan] Make sure all required classes are present in the jar. Fix import order.
      e851f70 [Hari Shreedharan] Move the ExecutorDelegationTokenRenewer to yarn module. Use reflection to use it.
      36eb8a9 [Hari Shreedharan] Change the renewal interval config param. Fix a bunch of comments.
      611923a [Hari Shreedharan] Make sure the namenodes are listed correctly for creating tokens.
      09fe224 [Hari Shreedharan] Use token.renew to get token's renewal interval rather than using hdfs-site.xml
      6963bbc [Hari Shreedharan] Schedule renewal in AM before starting user class. Else, a restarted AM cannot access HDFS if the user class tries to.
      072659e [Hari Shreedharan] Fix build failure caused by thread factory getting moved to ThreadUtils.
      f041dd3 [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning
      42eead4 [Hari Shreedharan] Remove RPC part. Refactor and move methods around, use renewal interval rather than max lifetime to create new tokens.
      ebb36f5 [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning
      bc083e3 [Hari Shreedharan] Overload RegisteredExecutor to send tokens. Minor doc updates.
      7b19643 [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning
      8a4f268 [Hari Shreedharan] Added docs in the security guide. Changed some code to ensure that the renewer objects are created only if required.
      e800c8b [Hari Shreedharan] Restore original RegisteredExecutor message, and send new tokens via NewTokens message.
      0e9507e [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning
      7f1bc58 [Hari Shreedharan] Minor fixes, cleanup.
      bcd11f9 [Hari Shreedharan] Refactor AM and Executor token update code into separate classes, also send tokens via akka on executor startup.
      f74303c [Hari Shreedharan] Move the new logic into specialized classes. Add cleanup for old credentials files.
      2f9975c [Hari Shreedharan] Ensure new tokens are written out immediately on AM restart. Also, pikc up the latest suffix from HDFS if the AM is restarted.
      61b2b27 [Hari Shreedharan] Account for AM restarts by making sure lastSuffix is read from the files on HDFS.
      62c45ce [Hari Shreedharan] Relogin from keytab periodically.
      fa233bd [Hari Shreedharan] Adding logging, fixing minor formatting and ordering issues.
      42813b4 [Hari Shreedharan] Remove utils.sh, which was re-added due to merge with master.
      0de27ee [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning
      55522e3 [Hari Shreedharan] Fix failure caused by Preconditions ambiguity.
      9ef5f1b [Hari Shreedharan] Added explanation of how the credentials refresh works, some other minor fixes.
      f4fd711 [Hari Shreedharan] Fix SparkConf usage.
      2debcea [Hari Shreedharan] Change the file structure for credentials files. I will push a followup patch which adds a cleanup mechanism for old credentials files. The credentials files are small and few enough for it to cause issues on HDFS.
      af6d5f0 [Hari Shreedharan] Cleaning up files where changes weren't required.
      f0f54cb [Hari Shreedharan] Be more defensive when updating the credentials file.
      f6954da [Hari Shreedharan] Got rid of Akka communication to renew, instead the executors check a known file's modification time to read the credentials.
      5c11c3e [Hari Shreedharan] Move tests to YarnSparkHadoopUtil to fix compile issues.
      b4cb917 [Hari Shreedharan] Send keytab to AM via DistributedCache rather than directly via HDFS
      0985b4e [Hari Shreedharan] Write tokens to HDFS and read them back when required, rather than sending them over the wire.
      d79b2b9 [Hari Shreedharan] Make sure correct credentials are passed to FileSystem#addDelegationTokens()
      8c6928a [Hari Shreedharan] Fix issue caused by direct creation of Actor object.
      fb27f46 [Hari Shreedharan] Make sure principal and keytab are set before CoarseGrainedSchedulerBackend is started. Also schedule re-logins in CoarseGrainedSchedulerBackend#start()
      41efde0 [Hari Shreedharan] Merge branch 'master' into kerberos-longrunning
      d282d7a [Hari Shreedharan] Fix ClientSuite to set YARN mode, so that the correct class is used in tests.
      bcfc374 [Hari Shreedharan] Fix Hadoop-1 build by adding no-op methods in SparkHadoopUtil, with impl in YarnSparkHadoopUtil.
      f8fe694 [Hari Shreedharan] Handle None if keytab-login is not scheduled.
      2b0d745 [Hari Shreedharan] [SPARK-5342][YARN] Allow long running Spark apps to run on secure YARN/HDFS.
      ccba5bc [Hari Shreedharan] WIP: More changes wrt kerberos
      77914dd [Hari Shreedharan] WIP: Add kerberos principal and keytab to YARN client.
      b1f4ca82
Loading