Skip to content
Snippets Groups Projects
  1. Apr 06, 2015
  2. Mar 29, 2015
    • Nishkam Ravi's avatar
      [SPARK-6406] Launch Spark using assembly jar instead of a separate launcher jar · e3eb3939
      Nishkam Ravi authored
      Author: Nishkam Ravi <nravi@cloudera.com>
      Author: nishkamravi2 <nishkamravi@gmail.com>
      Author: nravi <nravi@c1704.halxg.cloudera.com>
      
      Closes #5085 from nishkamravi2/master_nravi and squashes the following commits:
      
      bad4349 [nishkamravi2] Update Main.java
      36a6f87 [Nishkam Ravi] Minor changes and bug fixes
      b7f4ae7 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      4a45d6a [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      458af39 [Nishkam Ravi] Locate the jar using getLocation, obviates the need to pass assembly path as an argument
      d9658d6 [Nishkam Ravi] Changes for SPARK-6406
      ccdc334 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      3faa7a4 [Nishkam Ravi] Launcher library changes (SPARK-6406)
      345206a [Nishkam Ravi] spark-class merge Merge branch 'master_nravi' of https://github.com/nishkamravi2/spark into master_nravi
      ac58975 [Nishkam Ravi] spark-class changes
      06bfeb0 [nishkamravi2] Update spark-class
      35af990 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      32c3ab3 [nishkamravi2] Update AbstractCommandBuilder.java
      4bd4489 [nishkamravi2] Update AbstractCommandBuilder.java
      746f35b [Nishkam Ravi] "hadoop" string in the assembly name should not be mandatory (everywhere else in spark we mandate spark-assembly*hadoop*.jar)
      bfe96e0 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      ee902fa [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      d453197 [nishkamravi2] Update NewHadoopRDD.scala
      6f41a1d [nishkamravi2] Update NewHadoopRDD.scala
      0ce2c32 [nishkamravi2] Update HadoopRDD.scala
      f7e33c2 [Nishkam Ravi] Merge branch 'master_nravi' of https://github.com/nishkamravi2/spark into master_nravi
      ba1eb8b [Nishkam Ravi] Try-catch block around the two occurrences of removeShutDownHook. Deletion of semi-redundant occurrences of expensive operation inShutDown.
      71d0e17 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      494d8c0 [nishkamravi2] Update DiskBlockManager.scala
      3c5ddba [nishkamravi2] Update DiskBlockManager.scala
      f0d12de [Nishkam Ravi] Workaround for IllegalStateException caused by recent changes to BlockManager.stop
      79ea8b4 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      b446edc [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      5c9a4cb [nishkamravi2] Update TaskSetManagerSuite.scala
      535295a [nishkamravi2] Update TaskSetManager.scala
      3e1b616 [Nishkam Ravi] Modify test for maxResultSize
      9f6583e [Nishkam Ravi] Changes to maxResultSize code (improve error message and add condition to check if maxResultSize > 0)
      5f8f9ed [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      636a9ff [nishkamravi2] Update YarnAllocator.scala
      8f76c8b [Nishkam Ravi] Doc change for yarn memory overhead
      35daa64 [Nishkam Ravi] Slight change in the doc for yarn memory overhead
      5ac2ec1 [Nishkam Ravi] Remove out
      dac1047 [Nishkam Ravi] Additional documentation for yarn memory overhead issue
      42c2c3d [Nishkam Ravi] Additional changes for yarn memory overhead issue
      362da5e [Nishkam Ravi] Additional changes for yarn memory overhead
      c726bd9 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      f00fa31 [Nishkam Ravi] Improving logging for AM memoryOverhead
      1cf2d1e [nishkamravi2] Update YarnAllocator.scala
      ebcde10 [Nishkam Ravi] Modify default YARN memory_overhead-- from an additive constant to a multiplier (redone to resolve merge conflicts)
      2e69f11 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi
      efd688a [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark
      2b630f9 [nravi] Accept memory input as "30g", "512M" instead of an int value, to be consistent with rest of Spark
      3bf8fad [nravi] Merge branch 'master' of https://github.com/apache/spark
      5423a03 [nravi] Merge branch 'master' of https://github.com/apache/spark
      eb663ca [nravi] Merge branch 'master' of https://github.com/apache/spark
      df2aeb1 [nravi] Improved fix for ConcurrentModificationIssue (Spark-1097, Hadoop-10456)
      6b840f0 [nravi] Undo the fix for SPARK-1758 (the problem is fixed)
      5108700 [nravi] Fix in Spark for the Concurrent thread modification issue (SPARK-1097, HADOOP-10456)
      681b36f [nravi] Fix for SPARK-1758: failing test org.apache.spark.JavaAPISuite.wholeTextFiles
      e3eb3939
  3. Mar 16, 2015
    • Davies Liu's avatar
      [SPARK-6327] [PySpark] fix launch spark-submit from python · e3f315ac
      Davies Liu authored
      SparkSubmit should be launched without setting PYSPARK_SUBMIT_ARGS
      
      cc JoshRosen , this mode is actually used by python unit test, so I will not add more test for it.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #5019 from davies/fix_submit and squashes the following commits:
      
      2c20b0c [Davies Liu] fix launch spark-submit from python
      e3f315ac
  4. Mar 11, 2015
    • Marcelo Vanzin's avatar
      [SPARK-4924] Add a library for launching Spark jobs programmatically. · 517975d8
      Marcelo Vanzin authored
      This change encapsulates all the logic involved in launching a Spark job
      into a small Java library that can be easily embedded into other applications.
      
      The overall goal of this change is twofold, as described in the bug:
      
      - Provide a public API for launching Spark processes. This is a common request
        from users and currently there's no good answer for it.
      
      - Remove a lot of the duplicated code and other coupling that exists in the
        different parts of Spark that deal with launching processes.
      
      A lot of the duplication was due to different code needed to build an
      application's classpath (and the bootstrapper needed to run the driver in
      certain situations), and also different code needed to parse spark-submit
      command line options in different contexts. The change centralizes those
      as much as possible so that all code paths can rely on the library for
      handling those appropriately.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #3916 from vanzin/SPARK-4924 and squashes the following commits:
      
      18c7e4d [Marcelo Vanzin] Fix make-distribution.sh.
      2ce741f [Marcelo Vanzin] Add lots of quotes.
      3b28a75 [Marcelo Vanzin] Update new pom.
      a1b8af1 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      897141f [Marcelo Vanzin] Review feedback.
      e2367d2 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      28cd35e [Marcelo Vanzin] Remove stale comment.
      b1d86b0 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      00505f9 [Marcelo Vanzin] Add blurb about new API in the programming guide.
      5f4ddcc [Marcelo Vanzin] Better usage messages.
      92a9cfb [Marcelo Vanzin] Fix Win32 launcher, usage.
      6184c07 [Marcelo Vanzin] Rename field.
      4c19196 [Marcelo Vanzin] Update comment.
      7e66c18 [Marcelo Vanzin] Fix pyspark tests.
      0031a8e [Marcelo Vanzin] Review feedback.
      c12d84b [Marcelo Vanzin] Review feedback. And fix spark-submit on Windows.
      e2d4d71 [Marcelo Vanzin] Simplify some code used to launch pyspark.
      43008a7 [Marcelo Vanzin] Don't make builder extend SparkLauncher.
      b4d6912 [Marcelo Vanzin] Use spark-submit script in SparkLauncher.
      28b1434 [Marcelo Vanzin] Add a comment.
      304333a [Marcelo Vanzin] Fix propagation of properties file arg.
      bb67b93 [Marcelo Vanzin] Remove unrelated Yarn change (that is also wrong).
      8ec0243 [Marcelo Vanzin] Add missing newline.
      95ddfa8 [Marcelo Vanzin] Fix handling of --help for spark-class command builder.
      72da7ec [Marcelo Vanzin] Rename SparkClassLauncher.
      62978e4 [Marcelo Vanzin] Minor cleanup of Windows code path.
      9cd5b44 [Marcelo Vanzin] Make all non-public APIs package-private.
      e4c80b6 [Marcelo Vanzin] Reorganize the code so that only SparkLauncher is public.
      e50dc5e [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      de81da2 [Marcelo Vanzin] Fix CommandUtils.
      86a87bf [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      2061967 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      46d46da [Marcelo Vanzin] Clean up a test and make it more future-proof.
      b93692a [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      ad03c48 [Marcelo Vanzin] Revert "Fix a thread-safety issue in "local" mode."
      0b509d0 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      23aa2a9 [Marcelo Vanzin] Read java-opts from conf dir, not spark home.
      7cff919 [Marcelo Vanzin] Javadoc updates.
      eae4d8e [Marcelo Vanzin] Fix new unit tests on Windows.
      e570fb5 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      44cd5f7 [Marcelo Vanzin] Add package-info.java, clean up javadocs.
      f7cacff [Marcelo Vanzin] Remove "launch Spark in new thread" feature.
      7ed8859 [Marcelo Vanzin] Some more feedback.
      54cd4fd [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      61919df [Marcelo Vanzin] Clean leftover debug statement.
      aae5897 [Marcelo Vanzin] Use launcher classes instead of jars in non-release mode.
      e584fc3 [Marcelo Vanzin] Rework command building a little bit.
      525ef5b [Marcelo Vanzin] Rework Unix spark-class to handle argument with newlines.
      8ac4e92 [Marcelo Vanzin] Minor test cleanup.
      e946a99 [Marcelo Vanzin] Merge PySparkLauncher into SparkSubmitCliLauncher.
      c617539 [Marcelo Vanzin] Review feedback round 1.
      fc6a3e2 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      f26556b [Marcelo Vanzin] Fix a thread-safety issue in "local" mode.
      2f4e8b4 [Marcelo Vanzin] Changes needed to make this work with SPARK-4048.
      799fc20 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      bb5d324 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      53faef1 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      a7936ef [Marcelo Vanzin] Fix pyspark tests.
      656374e [Marcelo Vanzin] Mima fixes.
      4d511e7 [Marcelo Vanzin] Fix tools search code.
      7a01e4a [Marcelo Vanzin] Fix pyspark on Yarn.
      1b3f6e9 [Marcelo Vanzin] Call SparkSubmit from spark-class launcher for unknown classes.
      25c5ae6 [Marcelo Vanzin] Centralize SparkSubmit command line parsing.
      27be98a [Marcelo Vanzin] Modify Spark to use launcher lib.
      6f70eea [Marcelo Vanzin] [SPARK-4924] Add a library for launching Spark jobs programatically.
      517975d8
  5. Feb 12, 2015
  6. Feb 10, 2015
    • Marcelo Vanzin's avatar
      [SPARK-5493] [core] Add option to impersonate user. · ed167e70
      Marcelo Vanzin authored
      Hadoop has a feature that allows users to impersonate other users
      when submitting applications or talking to HDFS, for example. These
      impersonated users are referred generally as "proxy users".
      
      Services such as Oozie or Hive use this feature to run applications
      as the requesting user.
      
      This change makes SparkSubmit accept a new command line option to
      run the application as a proxy user. It also fixes the plumbing
      of the user name through the UI (and a couple of other places) to
      refer to the correct user running the application, which can be
      different than `sys.props("user.name")` even without proxies (e.g.
      when using kerberos).
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #4405 from vanzin/SPARK-5493 and squashes the following commits:
      
      df82427 [Marcelo Vanzin] Clarify the reason for the special exception handling.
      05bfc08 [Marcelo Vanzin] Remove unneeded annotation.
      4840de9 [Marcelo Vanzin] Review feedback.
      8af06ff [Marcelo Vanzin] Fix usage string.
      2e4fa8f [Marcelo Vanzin] Merge branch 'master' into SPARK-5493
      b6c947d [Marcelo Vanzin] Merge branch 'master' into SPARK-5493
      0540d38 [Marcelo Vanzin] [SPARK-5493] [core] Add option to impersonate user.
      ed167e70
  7. Feb 06, 2015
    • Masayoshi TSUZUKI's avatar
      [SPARK-5396] Syntax error in spark scripts on windows. · c01b9852
      Masayoshi TSUZUKI authored
      Modified syntax error in spark-submit2.cmd. Command prompt doesn't have "defined" operator.
      
      Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>
      
      Closes #4428 from tsudukim/feature/SPARK-5396 and squashes the following commits:
      
      ec18465 [Masayoshi TSUZUKI] [SPARK-5396] Syntax error in spark scripts on windows.
      c01b9852
    • Kousuke Saruta's avatar
      [Minor] Remove permission for execution from spark-shell.cmd · f6ba813a
      Kousuke Saruta authored
      .cmd files in bin is not set permission for execution except for spark-shell.cmd.
      Let's unify that.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #3983 from sarutak/fix-mode-of-cmd and squashes the following commits:
      
      9d6eedc [Kousuke Saruta] Removed permission for execution from spark-shell.cmd
      f6ba813a
  8. Feb 04, 2015
    • Burak Yavuz's avatar
      [SPARK-5341] Use maven coordinates as dependencies in spark-shell and spark-submit · 6aed719e
      Burak Yavuz authored
      This PR adds support for using maven coordinates as dependencies to spark-shell.
      Coordinates can be provided as a comma-delimited string after the flag `--packages`.
      Additional remote repositories (like SonaType) can be supplied as a comma-delimited string after the flag
      `--repositories`.
      
      Uses the Ivy library to resolve dependencies. Unfortunately the library has no decent documentation, therefore solving more complex dependency issues can be a problem.
      
      pwendell, mateiz, mengxr
      
      **Note: This is still a WIP. The following need to be handled:**
      - [x] add docs for the methods
      - [x] take local ivy cache path as an argument
      - [x] add tests
      - [x] add Windows compatibility
      - [x] exclude unused Ivy dependencies
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      
      Closes #4215 from brkyvz/SPARK-5341ivy and squashes the following commits:
      
      9215851 [Burak Yavuz] ready to merge
      db2a5cc [Burak Yavuz] changed logging to printStream
      9dae87f [Burak Yavuz] file separators changed
      71c374d [Burak Yavuz] merge conflicts fixed
      c08dc9f [Burak Yavuz] fixed merge conflicts
      3ada19a [Burak Yavuz] fixed Jenkins error (hopefully) and added comment on oro
      43c2290 [Burak Yavuz] fixed that ONE line
      231f72f [Burak Yavuz] addressed code review
      2cd6562 [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into SPARK-5341ivy
      85ec5a3 [Burak Yavuz] added oro as a dependency explicitly
      ea44ca4 [Burak Yavuz] add oro back to dependencies
      cef0e24 [Burak Yavuz] IntelliJ is just messing things up
      97c4a92 [Burak Yavuz] fix more weird IntelliJ formatting
      9cf077d [Burak Yavuz] fix weird IntelliJ formatting
      dcf5e13 [Burak Yavuz] fix windows command line flags
      3a23f21 [Burak Yavuz] excluded ivy dependencies
      53423e0 [Burak Yavuz] tests added
      3705907 [Burak Yavuz] remove ivy-repo as a command line argument. Use global ivy cache as default
      c04d885 [Burak Yavuz] take path to ivy cache as a conf
      2edc9b5 [Burak Yavuz] managed to exclude Spark and it's dependencies
      a0870af [Burak Yavuz] add docs. remove unnecesary new lines
      6645af4 [Burak Yavuz] [SPARK-5341] added base implementation
      882c4c8 [Burak Yavuz] added maven dependency download
      6aed719e
  9. Feb 01, 2015
    • Patrick Wendell's avatar
      [SPARK-3996]: Shade Jetty in Spark deliverables · a15f6e31
      Patrick Wendell authored
      (v2 of this patch with a fix that was only relevant for the maven build).
      
      This patch piggy-back's on vanzin's work to simplify the Guava shading,
      and adds Jetty as a shaded library in Spark. Other than adding Jetty,
      it consilidates the <artifactSet>'s into the root pom. I found it was
      a bit easier to follow that way, since you don't need to look into
      child pom's to find out specific artifact sets included in shading.
      
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #4285 from pwendell/jetty and squashes the following commits:
      
      d3e7f4e [Patrick Wendell] Fix for shaded deps causing compile errors
      19f0710 [Patrick Wendell] More code review feedback
      961452d [Patrick Wendell] Responding to feedback from Marcello
      6df25ca [Patrick Wendell] [WIP] [SPARK-3996]: Shade Jetty in Spark deliverables
      a15f6e31
  10. Jan 29, 2015
    • Patrick Wendell's avatar
      Revert "[WIP] [SPARK-3996]: Shade Jetty in Spark deliverables" · d2071e8f
      Patrick Wendell authored
      This reverts commit f240fe39.
      d2071e8f
    • Patrick Wendell's avatar
      [WIP] [SPARK-3996]: Shade Jetty in Spark deliverables · f240fe39
      Patrick Wendell authored
      This patch piggy-back's on vanzin's work to simplify the Guava shading,
      and adds Jetty as a shaded library in Spark. Other than adding Jetty,
      it consilidates the \<artifactSet\>'s into the root pom. I found it was
      a bit easier to follow that way, since you don't need to look into
      child pom's to find out specific artifact sets included in shading.
      
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #4252 from pwendell/jetty and squashes the following commits:
      
      19f0710 [Patrick Wendell] More code review feedback
      961452d [Patrick Wendell] Responding to feedback from Marcello
      6df25ca [Patrick Wendell] [WIP] [SPARK-3996]: Shade Jetty in Spark deliverables
      f240fe39
  11. Jan 25, 2015
  12. Jan 19, 2015
    • Venkata Ramana Gollamudi's avatar
      [SPARK-4504][Examples] fix run-example failure if multiple assembly jars exist · 74de94ea
      Venkata Ramana Gollamudi authored
      Fix run-example script to fail fast with useful error message if multiple
      example assembly JARs are present.
      
      Author: Venkata Ramana Gollamudi <ramana.gollamudi@huawei.com>
      
      Closes #3377 from gvramana/run-example_fails and squashes the following commits:
      
      fa7f481 [Venkata Ramana Gollamudi] Fixed review comments, avoiding ls output scanning.
      6aa1ab7 [Venkata Ramana Gollamudi] Fix run-examples script error during multiple jars
      74de94ea
    • Jongyoul Lee's avatar
      [SPARK-5088] Use spark-class for running executors directly · 4a4f9ccb
      Jongyoul Lee authored
      Author: Jongyoul Lee <jongyoul@gmail.com>
      
      Closes #3897 from jongyoul/SPARK-5088 and squashes the following commits:
      
      8232aa8 [Jongyoul Lee] [SPARK-5088] Use spark-class for running executors directly - Added a listenerBus for fixing test cases
      932289f [Jongyoul Lee] [SPARK-5088] Use spark-class for running executors directly - Rebased from master
      613cb47 [Jongyoul Lee] [SPARK-5088] Use spark-class for running executors directly - Fixed code if spark.executor.uri doesn't have any value - Added test cases
      ff57bda [Jongyoul Lee] [SPARK-5088] Use spark-class for running executors directly - Adjusted orders of import
      97e4bd4 [Jongyoul Lee] [SPARK-5088] Use spark-class for running executors directly - Changed command for using spark-class directly - Delete sbin/spark-executor and moved some codes into spark-class' case statement
      4a4f9ccb
  13. Jan 16, 2015
  14. Jan 09, 2015
    • WangTaoTheTonic's avatar
      [SPARK-4990][Deploy]to find default properties file, search SPARK_CONF_DIR first · 8782eb99
      WangTaoTheTonic authored
      https://issues.apache.org/jira/browse/SPARK-4990
      
      Author: WangTaoTheTonic <barneystinson@aliyun.com>
      Author: WangTao <barneystinson@aliyun.com>
      
      Closes #3823 from WangTaoTheTonic/SPARK-4990 and squashes the following commits:
      
      133c43e [WangTao] Update spark-submit2.cmd
      b1ab402 [WangTao] Update spark-submit
      4cc7f34 [WangTaoTheTonic] rebase
      55300bc [WangTaoTheTonic] use export to make it global
      d8d3cb7 [WangTaoTheTonic] remove blank line
      07b9ebf [WangTaoTheTonic] check SPARK_CONF_DIR instead of checking properties file
      c5a85eb [WangTaoTheTonic] to find default properties file, search SPARK_CONF_DIR first
      8782eb99
  15. Jan 08, 2015
    • Marcelo Vanzin's avatar
      [SPARK-4048] Enhance and extend hadoop-provided profile. · 48cecf67
      Marcelo Vanzin authored
      This change does a few things to make the hadoop-provided profile more useful:
      
      - Create new profiles for other libraries / services that might be provided by the infrastructure
      - Simplify and fix the poms so that the profiles are only activated while building assemblies.
      - Fix tests so that they're able to run when the profiles are activated
      - Add a new env variable to be used by distributions that use these profiles to provide the runtime
        classpath for Spark jobs and daemons.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #2982 from vanzin/SPARK-4048 and squashes the following commits:
      
      82eb688 [Marcelo Vanzin] Add a comment.
      eb228c0 [Marcelo Vanzin] Fix borked merge.
      4e38f4e [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
      9ef79a3 [Marcelo Vanzin] Alternative way to propagate test classpath to child processes.
      371ebee [Marcelo Vanzin] Review feedback.
      52f366d [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
      83099fc [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
      7377e7b [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
      322f882 [Marcelo Vanzin] Fix merge fail.
      f24e9e7 [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
      8b00b6a [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
      9640503 [Marcelo Vanzin] Cleanup child process log message.
      115fde5 [Marcelo Vanzin] Simplify a comment (and make it consistent with another pom).
      e3ab2da [Marcelo Vanzin] Fix hive-thriftserver profile.
      7820d58 [Marcelo Vanzin] Fix CliSuite with provided profiles.
      1be73d4 [Marcelo Vanzin] Restore flume-provided profile.
      d1399ed [Marcelo Vanzin] Restore jetty dependency.
      82a54b9 [Marcelo Vanzin] Remove unused profile.
      5c54a25 [Marcelo Vanzin] Fix HiveThriftServer2Suite with *-provided profiles.
      1fc4d0b [Marcelo Vanzin] Update dependencies for hive-thriftserver.
      f7b3bbe [Marcelo Vanzin] Add snappy to hadoop-provided list.
      9e4e001 [Marcelo Vanzin] Remove duplicate hive profile.
      d928d62 [Marcelo Vanzin] Redirect child stderr to parent's log.
      4d67469 [Marcelo Vanzin] Propagate SPARK_DIST_CLASSPATH on Yarn.
      417d90e [Marcelo Vanzin] Introduce "SPARK_DIST_CLASSPATH".
      2f95f0d [Marcelo Vanzin] Propagate classpath to child processes during testing.
      1adf91c [Marcelo Vanzin] Re-enable maven-install-plugin for a few projects.
      284dda6 [Marcelo Vanzin] Rework the "hadoop-provided" profile, add new ones.
      48cecf67
    • WangTaoTheTonic's avatar
      [SPARK-5130][Deploy]Take yarn-cluster as cluster mode in spark-submit · 0760787d
      WangTaoTheTonic authored
      https://issues.apache.org/jira/browse/SPARK-5130
      
      Author: WangTaoTheTonic <barneystinson@aliyun.com>
      
      Closes #3929 from WangTaoTheTonic/SPARK-5130 and squashes the following commits:
      
      c490648 [WangTaoTheTonic] take yarn-cluster as cluster mode in spark-submit
      0760787d
  16. Dec 19, 2014
  17. Dec 10, 2014
    • Daoyuan Wang's avatar
      [SPARK-4793] [Deploy] ensure .jar at end of line · e230da18
      Daoyuan Wang authored
      sometimes I switch between different version and do not want to rebuild spark, so I rename assembly.jar into .jar.bak, but still caught by `compute-classpath.sh`
      
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      
      Closes #3641 from adrian-wang/jar and squashes the following commits:
      
      45cbfd0 [Daoyuan Wang] ensure .jar at end of line
      e230da18
    • GuoQiang Li's avatar
      [SPARK-4161]Spark shell class path is not correctly set if... · 742e7093
      GuoQiang Li authored
      [SPARK-4161]Spark shell class path is not correctly set if "spark.driver.extraClassPath" is set in defaults.conf
      
      Author: GuoQiang Li <witgo@qq.com>
      
      Closes #3050 from witgo/SPARK-4161 and squashes the following commits:
      
      abb6fa4 [GuoQiang Li] move usejavacp opt to spark-shell
      89e39e7 [GuoQiang Li] review commit
      c2a6f04 [GuoQiang Li] Spark shell class path is not correctly set if "spark.driver.extraClassPath" is set in defaults.conf
      742e7093
  18. Dec 04, 2014
  19. Nov 30, 2014
    • carlmartin's avatar
      [SPARK-4623]Add the some error infomation if using spark-sql in yarn-cluster mode · aea7a997
      carlmartin authored
      If using spark-sql in yarn-cluster mode, print an error infomation just as the spark shell in yarn-cluster mode.
      
      Author: carlmartin <carlmartinmax@gmail.com>
      Author: huangzhaowei <carlmartinmax@gmail.com>
      
      Closes #3479 from SaintBacchus/sparkSqlShell and squashes the following commits:
      
      35829a9 [carlmartin] improve the description of comment
      e6c1eb7 [carlmartin] add a comment in bin/spark-sql to remind user who wants to change the class
      f1c5c8d [carlmartin] Merge branch 'master' into sparkSqlShell
      8e112c5 [huangzhaowei] singular form
      ec957bc [carlmartin] Add the some error infomation if using spark-sql in yarn-cluster mode
      7bcecc2 [carlmartin] Merge branch 'master' of https://github.com/apache/spark into codereview
      4fad75a [carlmartin] Add the Error infomation using spark-sql in yarn-cluster mode
      aea7a997
  20. Nov 18, 2014
    • Davies Liu's avatar
      [SPARK-4017] show progress bar in console · e34f38ff
      Davies Liu authored
      The progress bar will look like this:
      
      ![1___spark_job__85_250_finished__4_are_running___java_](https://cloud.githubusercontent.com/assets/40902/4854813/a02f44ac-6099-11e4-9060-7c73a73151d6.png)
      
      In the right corner, the numbers are: finished tasks, running tasks, total tasks.
      
      After the stage has finished, it will disappear.
      
      The progress bar is only showed if logging level is WARN or higher (but progress in title is still showed), it can be turned off by spark.driver.showConsoleProgress.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #3029 from davies/progress and squashes the following commits:
      
      95336d5 [Davies Liu] Merge branch 'master' of github.com:apache/spark into progress
      fc49ac8 [Davies Liu] address commentse
      2e90f75 [Davies Liu] show multiple stages in same time
      0081bcc [Davies Liu] address comments
      38c42f1 [Davies Liu] fix tests
      ab87958 [Davies Liu] disable progress bar during tests
      30ac852 [Davies Liu] re-implement progress bar
      b3f34e5 [Davies Liu] Merge branch 'master' of github.com:apache/spark into progress
      6fd30ff [Davies Liu] show progress bar if no task finished in 500ms
      e4e7344 [Davies Liu] refactor
      e1f524d [Davies Liu] revert unnecessary change
      a60477c [Davies Liu] Merge branch 'master' of github.com:apache/spark into progress
      5cae3f2 [Davies Liu] fix style
      ea49fe0 [Davies Liu] address comments
      bc53d99 [Davies Liu] refactor
      e6bb189 [Davies Liu] fix logging in sparkshell
      7e7d4e7 [Davies Liu] address commments
      5df26bb [Davies Liu] fix style
      9e42208 [Davies Liu] show progress bar in console and title
      e34f38ff
  21. Nov 14, 2014
    • Davies Liu's avatar
      [SPARK-4415] [PySpark] JVM should exit after Python exit · 7fe08b43
      Davies Liu authored
      When JVM is started in a Python process, it should exit once the stdin is closed.
      
      test: add spark.driver.memory in conf/spark-defaults.conf
      
      ```
      daviesdm:~/work/spark$ cat conf/spark-defaults.conf
      spark.driver.memory       8g
      daviesdm:~/work/spark$ bin/pyspark
      >>> quit
      daviesdm:~/work/spark$ jps
      4931 Jps
      286
      daviesdm:~/work/spark$ python wc.py
      943738
      0.719928026199
      daviesdm:~/work/spark$ jps
      286
      4990 Jps
      ```
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #3274 from davies/exit and squashes the following commits:
      
      df0e524 [Davies Liu] address comments
      ce8599c [Davies Liu] address comments
      050651f [Davies Liu] JVM should exit after Python exit
      7fe08b43
  22. Nov 11, 2014
    • Prashant Sharma's avatar
      Support cross building for Scala 2.11 · daaca14c
      Prashant Sharma authored
      Let's give this another go using a version of Hive that shades its JLine dependency.
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #3159 from pwendell/scala-2.11-prashant and squashes the following commits:
      
      e93aa3e [Patrick Wendell] Restoring -Phive-thriftserver profile and cleaning up build script.
      f65d17d [Patrick Wendell] Fixing build issue due to merge conflict
      a8c41eb [Patrick Wendell] Reverting dev/run-tests back to master state.
      7a6eb18 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into scala-2.11-prashant
      583aa07 [Prashant Sharma] REVERT ME: removed hive thirftserver
      3680e58 [Prashant Sharma] Revert "REVERT ME: Temporarily removing some Cli tests."
      935fb47 [Prashant Sharma] Revert "Fixed by disabling a few tests temporarily."
      925e90f [Prashant Sharma] Fixed by disabling a few tests temporarily.
      2fffed3 [Prashant Sharma] Exclude groovy from sbt build, and also provide a way for such instances in future.
      8bd4e40 [Prashant Sharma] Switched to gmaven plus, it fixes random failures observer with its predecessor gmaven.
      5272ce5 [Prashant Sharma] SPARK_SCALA_VERSION related bugs.
      2121071 [Patrick Wendell] Migrating version detection to PySpark
      b1ed44d [Patrick Wendell] REVERT ME: Temporarily removing some Cli tests.
      1743a73 [Patrick Wendell] Removing decimal test that doesn't work with Scala 2.11
      f5cad4e [Patrick Wendell] Add Scala 2.11 docs
      210d7e1 [Patrick Wendell] Revert "Testing new Hive version with shaded jline"
      48518ce [Patrick Wendell] Remove association of Hive and Thriftserver profiles.
      e9d0a06 [Patrick Wendell] Revert "Enable thritfserver for Scala 2.10 only"
      67ec364 [Patrick Wendell] Guard building of thriftserver around Scala 2.10 check
      8502c23 [Patrick Wendell] Enable thritfserver for Scala 2.10 only
      e22b104 [Patrick Wendell] Small fix in pom file
      ec402ab [Patrick Wendell] Various fixes
      0be5a9d [Patrick Wendell] Testing new Hive version with shaded jline
      4eaec65 [Prashant Sharma] Changed scripts to ignore target.
      5167bea [Prashant Sharma] small correction
      a4fcac6 [Prashant Sharma] Run against scala 2.11 on jenkins.
      80285f4 [Prashant Sharma] MAven equivalent of setting spark.executor.extraClasspath during tests.
      034b369 [Prashant Sharma] Setting test jars on executor classpath during tests from sbt.
      d4874cb [Prashant Sharma] Fixed Python Runner suite. null check should be first case in scala 2.11.
      6f50f13 [Prashant Sharma] Fixed build after rebasing with master. We should use ${scala.binary.version} instead of just 2.10
      e56ca9d [Prashant Sharma] Print an error if build for 2.10 and 2.11 is spotted.
      937c0b8 [Prashant Sharma] SCALA_VERSION -> SPARK_SCALA_VERSION
      cb059b0 [Prashant Sharma] Code review
      0476e5e [Prashant Sharma] Scala 2.11 support with repl and all build changes.
      daaca14c
  23. Oct 31, 2014
    • Kousuke Saruta's avatar
      [SPARK-3870] EOL character enforcement · 55ab7770
      Kousuke Saruta authored
      We have shell scripts and Windows batch files, so we should enforce proper EOL character.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #2726 from sarutak/eol-enforcement and squashes the following commits:
      
      9748c3f [Kousuke Saruta] Fixed make.bat
      252de89 [Kousuke Saruta] Removed extra characters from make.bat
      5b81c00 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into eol-enforcement
      8633ed2 [Kousuke Saruta] merge branch 'master' of git://git.apache.org/spark into eol-enforcement
      5d630d8 [Kousuke Saruta] Merged
      ba10797 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into eol-enforcement
      7407515 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into eol-enforcement
      772fd4e [Kousuke Saruta] Normized EOL character in make.bat and compute-classpath.cmd
      ac7f873 [Kousuke Saruta] Added an entry for .gitattributes to .rat-excludes
      1570e77 [Kousuke Saruta] Added .gitattributes
      55ab7770
  24. Oct 30, 2014
    • GuoQiang Li's avatar
      [SPARK-1720][SPARK-1719] use LD_LIBRARY_PATH instead of -Djava.library.path · cd739bd7
      GuoQiang Li authored
      - [X] Standalone
      - [X] YARN
      - [X] Mesos
      - [X]  Mac OS X
      - [X] Linux
      - [ ]  Windows
      
      This is another implementation about #1031
      
      Author: GuoQiang Li <witgo@qq.com>
      
      Closes #2711 from witgo/SPARK-1719 and squashes the following commits:
      
      c7b26f6 [GuoQiang Li] review commits
      4488e41 [GuoQiang Li] Refactoring CommandUtils
      a444094 [GuoQiang Li] review commits
      40c0b4a [GuoQiang Li] Add buildLocalCommand method
      c1a0ddd [GuoQiang Li] fix comments
      156ce88 [GuoQiang Li] review commit
      38aa377 [GuoQiang Li] Refactor CommandUtils.scala
      4269e00 [GuoQiang Li] Refactor SparkSubmitDriverBootstrapper.scala
      7a1d634 [GuoQiang Li] use LD_LIBRARY_PATH instead of -Djava.library.path
      cd739bd7
  25. Oct 28, 2014
    • Michael Griffiths's avatar
      [SPARK-4065] Add check for IPython on Windows · 2f254dac
      Michael Griffiths authored
      This issue employs logic similar to the bash launcher (pyspark) to check
      if IPTYHON=1, and if so launch ipython with options in IPYTHON_OPTS.
      This fix assumes that ipython is available in the system Path, and can
      be invoked with a plain "ipython" command.
      
      Author: Michael Griffiths <msjgriffiths@gmail.com>
      
      Closes #2910 from msjgriffiths/pyspark-windows and squashes the following commits:
      
      ef34678 [Michael Griffiths] Change build message to comply with [SPARK-3775]
      361e3d8 [Michael Griffiths] [SPARK-4065] Add check for IPython on Windows
      9ce72d1 [Michael Griffiths] [SPARK-4065] Add check for IPython on Windows
      2f254dac
  26. Oct 14, 2014
    • Masayoshi TSUZUKI's avatar
      [SPARK-3943] Some scripts bin\*.cmd pollutes environment variables in Windows · 66af8e25
      Masayoshi TSUZUKI authored
      Modified not to pollute environment variables.
      Just moved the main logic into `XXX2.cmd` from `XXX.cmd`, and call `XXX2.cmd` with cmd command in `XXX.cmd`.
      `pyspark.cmd` and `spark-class.cmd` are already using the same way, but `spark-shell.cmd`, `spark-submit.cmd` and `/python/docs/make.bat` are not.
      
      Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>
      
      Closes #2797 from tsudukim/feature/SPARK-3943 and squashes the following commits:
      
      b397a7d [Masayoshi TSUZUKI] [SPARK-3943] Some scripts bin\*.cmd pollutes environment variables in Windows
      66af8e25
    • cocoatomo's avatar
      [SPARK-3869] ./bin/spark-class miss Java version with _JAVA_OPTIONS set · 7b4f39f6
      cocoatomo authored
      When _JAVA_OPTIONS environment variable is set, a command "java -version" outputs a message like "Picked up _JAVA_OPTIONS: -Dfile.encoding=UTF-8".
      ./bin/spark-class knows java version from the first line of "java -version" output, so it mistakes java version with _JAVA_OPTIONS set.
      
      Author: cocoatomo <cocoatomo77@gmail.com>
      
      Closes #2725 from cocoatomo/issues/3869-mistake-java-version and squashes the following commits:
      
      f894ebd [cocoatomo] [SPARK-3869] ./bin/spark-class miss Java version with _JAVA_OPTIONS set
      7b4f39f6
  27. Oct 09, 2014
    • Josh Rosen's avatar
      [SPARK-3772] Allow `ipython` to be used by Pyspark workers; IPython support improvements: · 4e9b551a
      Josh Rosen authored
      This pull request addresses a few issues related to PySpark's IPython support:
      
      - Fix the remaining uses of the '-u' flag, which IPython doesn't support (see SPARK-3772).
      - Change PYSPARK_PYTHON_OPTS to PYSPARK_DRIVER_PYTHON_OPTS, so that the old name is reserved in case we ever want to allow the worker Python options to be customized (this variable was introduced in #2554 and hasn't landed in a release yet, so this doesn't break any compatibility).
      - Introduce a PYSPARK_DRIVER_PYTHON option that allows the driver to use `ipython` while the workers use a different Python version.
      - Attempt to use Python 2.7 by default if PYSPARK_PYTHON is not specified.
      - Retain the old semantics for IPYTHON=1 and IPYTHON_OPTS (to avoid breaking existing example programs).
      
      There are more details in a block comment in `bin/pyspark`.
      
      Author: Josh Rosen <joshrosen@apache.org>
      
      Closes #2651 from JoshRosen/SPARK-3772 and squashes the following commits:
      
      7b8eb86 [Josh Rosen] More changes to PySpark python executable configuration:
      c4f5778 [Josh Rosen] [SPARK-3772] Allow ipython to be used by Pyspark workers; IPython fixes:
      4e9b551a
  28. Oct 07, 2014
    • Masayoshi TSUZUKI's avatar
      [SPARK-3808] PySpark fails to start in Windows · 12e2551e
      Masayoshi TSUZUKI authored
      Modified syntax error of *.cmd script.
      
      Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>
      
      Closes #2669 from tsudukim/feature/SPARK-3808 and squashes the following commits:
      
      7f804e6 [Masayoshi TSUZUKI] [SPARK-3808] PySpark fails to start in Windows
      12e2551e
  29. Oct 03, 2014
    • Masayoshi TSUZUKI's avatar
      [SPARK-3774] typo comment in bin/utils.sh · e5566e05
      Masayoshi TSUZUKI authored
      Modified the comment of bin/utils.sh.
      
      Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>
      
      Closes #2639 from tsudukim/feature/SPARK-3774 and squashes the following commits:
      
      707b779 [Masayoshi TSUZUKI] [SPARK-3774] typo comment in bin/utils.sh
      e5566e05
    • Masayoshi TSUZUKI's avatar
      [SPARK-3775] Not suitable error message in spark-shell.cmd · 358d7ffd
      Masayoshi TSUZUKI authored
      Modified some sentence of error message in bin\*.cmd.
      
      Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>
      
      Closes #2640 from tsudukim/feature/SPARK-3775 and squashes the following commits:
      
      3458afb [Masayoshi TSUZUKI] [SPARK-3775] Not suitable error message in spark-shell.cmd
      358d7ffd
    • EugenCepoi's avatar
      SPARK-2058: Overriding SPARK_HOME/conf with SPARK_CONF_DIR · f0811f92
      EugenCepoi authored
      Update of PR #997.
      
      With this PR, setting SPARK_CONF_DIR overrides SPARK_HOME/conf (not only spark-defaults.conf and spark-env).
      
      Author: EugenCepoi <cepoi.eugen@gmail.com>
      
      Closes #2481 from EugenCepoi/SPARK-2058 and squashes the following commits:
      
      0bb32c2 [EugenCepoi] use orElse orNull and fixing trailing percent in compute-classpath.cmd
      77f35d7 [EugenCepoi] SPARK-2058: Overriding SPARK_HOME/conf with SPARK_CONF_DIR
      f0811f92
  30. Oct 02, 2014
    • cocoatomo's avatar
      [SPARK-3706][PySpark] Cannot run IPython REPL with IPYTHON set to "1" and PYSPARK_PYTHON unset · 5b4a5b1a
      cocoatomo authored
      ### Problem
      
      The section "Using the shell" in Spark Programming Guide (https://spark.apache.org/docs/latest/programming-guide.html#using-the-shell) says that we can run pyspark REPL through IPython.
      But a folloing command does not run IPython but a default Python executable.
      
      ```
      $ IPYTHON=1 ./bin/pyspark
      Python 2.7.8 (default, Jul  2 2014, 10:14:46)
      ...
      ```
      
      the spark/bin/pyspark script on the commit b235e013 decides which executable and options it use folloing way.
      
      1. if PYSPARK_PYTHON unset
         * → defaulting to "python"
      2. if IPYTHON_OPTS set
         * → set IPYTHON "1"
      3. some python scripts passed to ./bin/pyspak → run it with ./bin/spark-submit
         * out of this issues scope
      4. if IPYTHON set as "1"
         * → execute $PYSPARK_PYTHON (default: ipython) with arguments $IPYTHON_OPTS
         * otherwise execute $PYSPARK_PYTHON
      
      Therefore, when PYSPARK_PYTHON is unset, python is executed though IPYTHON is "1".
      In other word, when PYSPARK_PYTHON is unset, IPYTHON_OPS and IPYTHON has no effect on decide which command to use.
      
      PYSPARK_PYTHON | IPYTHON_OPTS | IPYTHON | resulting command | expected command
      ---- | ---- | ----- | ----- | -----
      (unset → defaults to python) | (unset) | (unset) | python | (same)
      (unset → defaults to python) | (unset) | 1 | python | ipython
      (unset → defaults to python) | an_option | (unset → set to 1) | python an_option | ipython an_option
      (unset → defaults to python) | an_option | 1 | python an_option | ipython an_option
      ipython | (unset) | (unset) | ipython | (same)
      ipython | (unset) | 1 | ipython | (same)
      ipython | an_option | (unset → set to 1) | ipython an_option | (same)
      ipython | an_option | 1 | ipython an_option | (same)
      
      ### Suggestion
      
      The pyspark script should determine firstly whether a user wants to run IPython or other executables.
      
      1. if IPYTHON_OPTS set
         * set IPYTHON "1"
      2.  if IPYTHON has a value "1"
         * PYSPARK_PYTHON defaults to "ipython" if not set
      3. PYSPARK_PYTHON defaults to "python" if not set
      
      See the pull request for more detailed modification.
      
      Author: cocoatomo <cocoatomo77@gmail.com>
      
      Closes #2554 from cocoatomo/issues/cannot-run-ipython-without-options and squashes the following commits:
      
      d2a9b06 [cocoatomo] [SPARK-3706][PySpark] Use PYTHONUNBUFFERED environment variable instead of -u option
      264114c [cocoatomo] [SPARK-3706][PySpark] Remove the sentence about deprecated environment variables
      42e02d5 [cocoatomo] [SPARK-3706][PySpark] Replace environment variables used to customize execution of PySpark REPL
      10d56fb [cocoatomo] [SPARK-3706][PySpark] Cannot run IPython REPL with IPYTHON set to "1" and PYSPARK_PYTHON unset
      5b4a5b1a
  31. Oct 01, 2014
Loading