Skip to content
Snippets Groups Projects
  1. Oct 02, 2014
    • cocoatomo's avatar
      [SPARK-3706][PySpark] Cannot run IPython REPL with IPYTHON set to "1" and PYSPARK_PYTHON unset · 5b4a5b1a
      cocoatomo authored
      ### Problem
      
      The section "Using the shell" in Spark Programming Guide (https://spark.apache.org/docs/latest/programming-guide.html#using-the-shell) says that we can run pyspark REPL through IPython.
      But a folloing command does not run IPython but a default Python executable.
      
      ```
      $ IPYTHON=1 ./bin/pyspark
      Python 2.7.8 (default, Jul  2 2014, 10:14:46)
      ...
      ```
      
      the spark/bin/pyspark script on the commit b235e013 decides which executable and options it use folloing way.
      
      1. if PYSPARK_PYTHON unset
         * → defaulting to "python"
      2. if IPYTHON_OPTS set
         * → set IPYTHON "1"
      3. some python scripts passed to ./bin/pyspak → run it with ./bin/spark-submit
         * out of this issues scope
      4. if IPYTHON set as "1"
         * → execute $PYSPARK_PYTHON (default: ipython) with arguments $IPYTHON_OPTS
         * otherwise execute $PYSPARK_PYTHON
      
      Therefore, when PYSPARK_PYTHON is unset, python is executed though IPYTHON is "1".
      In other word, when PYSPARK_PYTHON is unset, IPYTHON_OPS and IPYTHON has no effect on decide which command to use.
      
      PYSPARK_PYTHON | IPYTHON_OPTS | IPYTHON | resulting command | expected command
      ---- | ---- | ----- | ----- | -----
      (unset → defaults to python) | (unset) | (unset) | python | (same)
      (unset → defaults to python) | (unset) | 1 | python | ipython
      (unset → defaults to python) | an_option | (unset → set to 1) | python an_option | ipython an_option
      (unset → defaults to python) | an_option | 1 | python an_option | ipython an_option
      ipython | (unset) | (unset) | ipython | (same)
      ipython | (unset) | 1 | ipython | (same)
      ipython | an_option | (unset → set to 1) | ipython an_option | (same)
      ipython | an_option | 1 | ipython an_option | (same)
      
      ### Suggestion
      
      The pyspark script should determine firstly whether a user wants to run IPython or other executables.
      
      1. if IPYTHON_OPTS set
         * set IPYTHON "1"
      2.  if IPYTHON has a value "1"
         * PYSPARK_PYTHON defaults to "ipython" if not set
      3. PYSPARK_PYTHON defaults to "python" if not set
      
      See the pull request for more detailed modification.
      
      Author: cocoatomo <cocoatomo77@gmail.com>
      
      Closes #2554 from cocoatomo/issues/cannot-run-ipython-without-options and squashes the following commits:
      
      d2a9b06 [cocoatomo] [SPARK-3706][PySpark] Use PYTHONUNBUFFERED environment variable instead of -u option
      264114c [cocoatomo] [SPARK-3706][PySpark] Remove the sentence about deprecated environment variables
      42e02d5 [cocoatomo] [SPARK-3706][PySpark] Replace environment variables used to customize execution of PySpark REPL
      10d56fb [cocoatomo] [SPARK-3706][PySpark] Cannot run IPython REPL with IPYTHON set to "1" and PYSPARK_PYTHON unset
      5b4a5b1a
  2. Oct 01, 2014
  3. Sep 18, 2014
  4. Sep 15, 2014
  5. Sep 12, 2014
    • Marcelo Vanzin's avatar
      [SPARK-3217] Add Guava to classpath when SPARK_PREPEND_CLASSES is set. · af258382
      Marcelo Vanzin authored
      When that option is used, the compiled classes from the build directory
      are prepended to the classpath. Now that we avoid packaging Guava, that
      means we have classes referencing the original Guava location in the app's
      classpath, so errors happen.
      
      For that case, add Guava manually to the classpath.
      
      Note: if Spark is compiled with "-Phadoop-provided", it's tricky to
      make things work with SPARK_PREPEND_CLASSES, because you need to add
      the Hadoop classpath using SPARK_CLASSPATH and that means the older
      Hadoop Guava overrides the newer one Spark needs. So someone using
      SPARK_PREPEND_CLASSES needs to remember to not use that profile.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #2141 from vanzin/SPARK-3217 and squashes the following commits:
      
      b967324 [Marcelo Vanzin] [SPARK-3217] Add Guava to classpath when SPARK_PREPEND_CLASSES is set.
      af258382
  6. Sep 08, 2014
    • Prashant Sharma's avatar
      SPARK-3337 Paranoid quoting in shell to allow install dirs with spaces within. · e16a8e7d
      Prashant Sharma authored
      ...
      
      Tested ! TBH, it isn't a great idea to have directory with spaces within. Because emacs doesn't like it then hadoop doesn't like it. and so on...
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      
      Closes #2229 from ScrapCodes/SPARK-3337/quoting-shell-scripts and squashes the following commits:
      
      d4ad660 [Prashant Sharma] SPARK-3337 Paranoid quoting in shell to allow install dirs with spaces within.
      e16a8e7d
  7. Sep 05, 2014
  8. Aug 28, 2014
    • Andrew Or's avatar
      [HOTFIX] Wait for EOF only for the PySpark shell · dafe3434
      Andrew Or authored
      In `SparkSubmitDriverBootstrapper`, we wait for the parent process to send us an `EOF` before finishing the application. This is applicable for the PySpark shell because we terminate the application the same way. However if we run a python application, for instance, the JVM actually never exits unless it receives a manual EOF from the user. This is causing a few tests to timeout.
      
      We only need to do this for the PySpark shell because Spark submit runs as a python subprocess only in this case. Thus, the normal Spark shell doesn't need to go through this case even though it is also a REPL.
      
      Thanks davies for reporting this.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #2170 from andrewor14/bootstrap-hotfix and squashes the following commits:
      
      42963f5 [Andrew Or] Do not wait for EOF unless this is the pyspark shell
      dafe3434
  9. Aug 27, 2014
    • Rob O'Dwyer's avatar
      SPARK-3265 Allow using custom ipython executable with pyspark · f38fab97
      Rob O'Dwyer authored
      Although you can make pyspark use ipython with `IPYTHON=1`, and also change the python executable with `PYSPARK_PYTHON=...`, you can't use both at the same time because it hardcodes the default ipython script.
      
      This makes it use the `PYSPARK_PYTHON` variable if present and fall back to default python, similarly to how the default python executable is handled.
      
      So you can use a custom ipython like so:
      `PYSPARK_PYTHON=./anaconda/bin/ipython IPYTHON_OPTS="notebook" pyspark`
      
      Author: Rob O'Dwyer <odwyerrob@gmail.com>
      
      Closes #2167 from robbles/patch-1 and squashes the following commits:
      
      d98e8a9 [Rob O'Dwyer] Allow using custom ipython executable with pyspark
      f38fab97
    • Andrew Or's avatar
      [SPARK-3167] Handle special driver configs in Windows · 7557c4cf
      Andrew Or authored
      This is an effort to bring the Windows scripts up to speed after recent splashing changes in #1845.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #2129 from andrewor14/windows-config and squashes the following commits:
      
      881a8f0 [Andrew Or] Add reference to Windows taskkill
      92e6047 [Andrew Or] Update a few comments (minor)
      22b1acd [Andrew Or] Fix style again (minor)
      afcffea [Andrew Or] Fix style (minor)
      72004c2 [Andrew Or] Actually respect --driver-java-options
      803218b [Andrew Or] Actually respect SPARK_*_CLASSPATH
      eeb34a0 [Andrew Or] Update outdated comment (minor)
      35caecc [Andrew Or] In Windows, actually kill Java processes on exit
      f97daa2 [Andrew Or] Fix Windows spark shell stdin issue
      83ebe60 [Andrew Or] Parse special driver configs in Windows (broken)
      7557c4cf
  10. Aug 26, 2014
    • Cheng Lian's avatar
      [SPARK-2964] [SQL] Remove duplicated code from spark-sql and start-thriftserver.sh · faeb9c0e
      Cheng Lian authored
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #1886 from sarutak/SPARK-2964 and squashes the following commits:
      
      8ef8751 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2964
      26e7c95 [Kousuke Saruta] Revert "Shorten timeout to more reasonable value"
      ffb68fa [Kousuke Saruta] Modified spark-sql and start-thriftserver.sh to use bin/utils.sh
      8c6f658 [Kousuke Saruta] Merge branch 'spark-3026' of https://github.com/liancheng/spark into SPARK-2964
      81b43a8 [Cheng Lian] Shorten timeout to more reasonable value
      a89e66d [Cheng Lian] Fixed command line options quotation in scripts
      9c894d3 [Cheng Lian] Fixed bin/spark-sql -S option typo
      be4736b [Cheng Lian] Report better error message when running JDBC/CLI without hive-thriftserver profile enabled
      faeb9c0e
    • WangTao's avatar
      [SPARK-3225]Typo in script · 2ffd3290
      WangTao authored
      use_conf_dir => user_conf_dir in load-spark-env.sh.
      
      Author: WangTao <barneystinson@aliyun.com>
      
      Closes #1926 from WangTaoTheTonic/TypoInScript and squashes the following commits:
      
      0c104ad [WangTao] Typo in script
      2ffd3290
  11. Aug 24, 2014
  12. Aug 23, 2014
    • Daoyuan Wang's avatar
      [SPARK-3068]remove MaxPermSize option for jvm 1.8 · f3d65cd0
      Daoyuan Wang authored
      In JVM 1.8.0, MaxPermSize is no longer supported.
      In spark `stderr` output, there would be a line of
      
          Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
      
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      
      Closes #2011 from adrian-wang/maxpermsize and squashes the following commits:
      
      ef1d660 [Daoyuan Wang] direct get java version in runtime
      37db9c1 [Daoyuan Wang] code refine
      3c1d554 [Daoyuan Wang] remove MaxPermSize option for jvm 1.8
      f3d65cd0
  13. Aug 20, 2014
    • Andrew Or's avatar
      [SPARK-2849] Handle driver configs separately in client mode · b3ec51bf
      Andrew Or authored
      In client deploy mode, the driver is launched from within `SparkSubmit`'s JVM. This means by the time we parse Spark configs from `spark-defaults.conf`, it is already too late to control certain properties of the driver's JVM. We currently ignore these configs in client mode altogether.
      ```
      spark.driver.memory
      spark.driver.extraJavaOptions
      spark.driver.extraClassPath
      spark.driver.extraLibraryPath
      ```
      This PR handles these properties before launching the driver JVM. It achieves this by spawning a separate JVM that runs a new class called `SparkSubmitDriverBootstrapper`, which spawns `SparkSubmit` as a sub-process with the appropriate classpath, library paths, java opts and memory.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #1845 from andrewor14/handle-configs-bash and squashes the following commits:
      
      bed4bdf [Andrew Or] Change a few comments / messages (minor)
      24dba60 [Andrew Or] Merge branch 'master' of github.com:apache/spark into handle-configs-bash
      08fd788 [Andrew Or] Warn against external usages of SparkSubmitDriverBootstrapper
      ff34728 [Andrew Or] Minor comments
      51aeb01 [Andrew Or] Filter out JVM memory in Scala rather than Bash (minor)
      9a778f6 [Andrew Or] Fix PySpark: actually kill driver on termination
      d0f20db [Andrew Or] Don't pass empty library paths, classpath, java opts etc.
      a78cb26 [Andrew Or] Revert a few changes in utils.sh (minor)
      9ba37e2 [Andrew Or] Don't barf when the properties file does not exist
      8867a09 [Andrew Or] A few more naming things (minor)
      19464ad [Andrew Or] SPARK_SUBMIT_JAVA_OPTS -> SPARK_SUBMIT_OPTS
      d6488f9 [Andrew Or] Merge branch 'master' of github.com:apache/spark into handle-configs-bash
      1ea6bbe [Andrew Or] SparkClassLauncher -> SparkSubmitDriverBootstrapper
      a91ea19 [Andrew Or] Fix precedence of library paths, classpath, java opts and memory
      158f813 [Andrew Or] Remove "client mode" boolean argument
      c84f5c8 [Andrew Or] Remove debug print statement (minor)
      b71f52b [Andrew Or] Revert a few more changes (minor)
      7d94a8d [Andrew Or] Merge branch 'master' of github.com:apache/spark into handle-configs-bash
      3a8235d [Andrew Or] Only parse the properties file if special configs exist
      c37e08d [Andrew Or] Revert a few more changes
      a396eda [Andrew Or] Nullify my own hard work to simplify bash
      0effa1e [Andrew Or] Add code in Scala that handles special configs
      c886568 [Andrew Or] Fix lines too long + a few comments / style (minor)
      7a4190a [Andrew Or] Merge branch 'master' of github.com:apache/spark into handle-configs-bash
      7396be2 [Andrew Or] Explicitly comment that multi-line properties are not supported
      fa11ef8 [Andrew Or] Parse the properties file only if the special configs exist
      371cac4 [Andrew Or] Add function prefix (minor)
      be99eb3 [Andrew Or] Fix tests to not include multi-line configs
      bd0d468 [Andrew Or] Simplify parsing config file by ignoring multi-line arguments
      56ac247 [Andrew Or] Use eval and set to simplify splitting
      8d4614c [Andrew Or] Merge branch 'master' of github.com:apache/spark into handle-configs-bash
      aeb79c7 [Andrew Or] Merge branch 'master' of github.com:apache/spark into handle-configs-bash
      2732ac0 [Andrew Or] Integrate BASH tests into dev/run-tests + log error properly
      8d26a5c [Andrew Or] Add tests for bash/utils.sh
      4ae24c3 [Andrew Or] Fix bug: escape properly in quote_java_property
      b3c4cd5 [Andrew Or] Fix bug: count the number of quotes instead of detecting presence
      c2273fc [Andrew Or] Fix typo (minor)
      e793e5f [Andrew Or] Handle multi-line arguments
      5d8f8c4 [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-driver-extra
      c7b9926 [Andrew Or] Minor changes to spark-defaults.conf.template
      a992ae2 [Andrew Or] Escape spark.*.extraJavaOptions correctly
      aabfc7e [Andrew Or] escape -> split (minor)
      45a1eb9 [Andrew Or] Fix bug: escape escaped backslashes and quotes properly...
      1cdc6b1 [Andrew Or] Fix bug: escape escaped double quotes properly
      c854859 [Andrew Or] Add small comment
      c13a2cb [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-driver-extra
      8e552b7 [Andrew Or] Include an example of spark.*.extraJavaOptions
      de765c9 [Andrew Or] Print spark-class command properly
      a4df3c4 [Andrew Or] Move parsing and escaping logic to utils.sh
      dec2343 [Andrew Or] Only export variables if they exist
      fa2136e [Andrew Or] Escape Java options + parse java properties files properly
      ef12f74 [Andrew Or] Minor formatting
      4ec22a1 [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-driver-extra
      e5cfb46 [Andrew Or] Collapse duplicate code + fix potential whitespace issues
      4edcaa8 [Andrew Or] Redirect stdout to stderr for python
      130f295 [Andrew Or] Handle spark.driver.memory too
      98dd8e3 [Andrew Or] Add warning if properties file does not exist
      8843562 [Andrew Or] Fix compilation issues...
      75ee6b4 [Andrew Or] Remove accidentally added file
      63ed2e9 [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-driver-extra
      0025474 [Andrew Or] Revert SparkSubmit handling of --driver-* options for only cluster mode
      a2ab1b0 [Andrew Or] Parse spark.driver.extra* in bash
      250cb95 [Andrew Or] Do not ignore spark.driver.extra* for client mode
      b3ec51bf
  14. Aug 14, 2014
    • wangfei's avatar
      [SPARK-2925] [sql]fix spark-sql and start-thriftserver shell bugs when set --driver-java-options · 267fdffe
      wangfei authored
      https://issues.apache.org/jira/browse/SPARK-2925
      
      Run cmd like this will get the error
      bin/spark-sql --driver-java-options '-Xdebug -Xnoagent -Xrunjdwp:transport=dt_socket,address=8788,server=y,suspend=y'
      
      Error: Unrecognized option '-Xnoagent'.
      Run with --help for usage help or --verbose for debug output
      
      Author: wangfei <wangfei_hello@126.com>
      Author: wangfei <wangfei1@huawei.com>
      
      Closes #1851 from scwf/patch-2 and squashes the following commits:
      
      516554d [wangfei] quote variables to fix this issue
      8bd40f2 [wangfei] quote variables to fix this problem
      e6d79e3 [wangfei] fix start-thriftserver bug when set driver-java-options
      948395d [wangfei] fix spark-sql error when set --driver-java-options
      267fdffe
    • Masayoshi TSUZUKI's avatar
      [SPARK-3006] Failed to execute spark-shell in Windows OS · 9497b12d
      Masayoshi TSUZUKI authored
      Modified the order of the options and arguments in spark-shell.cmd
      
      Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>
      
      Closes #1918 from tsudukim/feature/SPARK-3006 and squashes the following commits:
      
      8bba494 [Masayoshi TSUZUKI] [SPARK-3006] Failed to execute spark-shell in Windows OS
      1a32410 [Masayoshi TSUZUKI] [SPARK-3006] Failed to execute spark-shell in Windows OS
      9497b12d
  15. Aug 09, 2014
    • Kousuke Saruta's avatar
      [SPARK-2894] spark-shell doesn't accept flags · 4f4a9884
      Kousuke Saruta authored
      As sryza reported, spark-shell doesn't accept any flags.
      The root cause is wrong usage of spark-submit in spark-shell and it come to the surface by #1801
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1715, Closes #1864, and Closes #1861
      
      Closes #1825 from sarutak/SPARK-2894 and squashes the following commits:
      
      47f3510 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2894
      2c899ed [Kousuke Saruta] Removed useless code from java_gateway.py
      98287ed [Kousuke Saruta] Removed useless code from java_gateway.py
      513ad2e [Kousuke Saruta] Modified util.sh to enable to use option including white spaces
      28a374e [Kousuke Saruta] Modified java_gateway.py to recognize arguments
      5afc584 [Cheng Lian] Filter out spark-submit options when starting Python gateway
      e630d19 [Cheng Lian] Fixing pyspark and spark-shell CLI options
      4f4a9884
  16. Aug 07, 2014
    • Oleg Danilov's avatar
      SPARK-2905 Fixed path sbin => bin · 80ec5bad
      Oleg Danilov authored
      Author: Oleg Danilov <oleg.danilov@wandisco.com>
      
      Closes #1835 from dosoft/SPARK-2905 and squashes the following commits:
      
      4df423c [Oleg Danilov] SPARK-2905 Fixed path sbin => bin
      80ec5bad
  17. Aug 06, 2014
    • Cheng Lian's avatar
      [SPARK-2678][Core][SQL] A workaround for SPARK-2678 · a6cd3110
      Cheng Lian authored
      JIRA issues:
      
      - Main: [SPARK-2678](https://issues.apache.org/jira/browse/SPARK-2678)
      - Related: [SPARK-2874](https://issues.apache.org/jira/browse/SPARK-2874)
      
      Related PR:
      
      - #1715
      
      This PR is both a fix for SPARK-2874 and a workaround for SPARK-2678. Fixing SPARK-2678 completely requires some API level changes that need further discussion, and we decided not to include it in Spark 1.1 release. As currently SPARK-2678 only affects Spark SQL scripts, this workaround is enough for Spark 1.1. Command line option handling logic in bash scripts looks somewhat dirty and duplicated, but it helps to provide a cleaner user interface as well as retain full downward compatibility for now.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1801 from liancheng/spark-2874 and squashes the following commits:
      
      8045d7a [Cheng Lian] Make sure test suites pass
      8493a9e [Cheng Lian] Using eval to retain quoted arguments
      aed523f [Cheng Lian] Fixed typo in bin/spark-sql
      f12a0b1 [Cheng Lian] Worked arount SPARK-2678
      daee105 [Cheng Lian] Fixed usage messages of all Spark SQL related scripts
      a6cd3110
  18. Aug 02, 2014
    • Chris Fregly's avatar
      [SPARK-1981] Add AWS Kinesis streaming support · 91f9504e
      Chris Fregly authored
      Author: Chris Fregly <chris@fregly.com>
      
      Closes #1434 from cfregly/master and squashes the following commits:
      
      4774581 [Chris Fregly] updated docs, renamed retry to retryRandom to be more clear, removed retries around store() method
      0393795 [Chris Fregly] moved Kinesis examples out of examples/ and back into extras/kinesis-asl
      691a6be [Chris Fregly] fixed tests and formatting, fixed a bug with JavaKinesisWordCount during union of streams
      0e1c67b [Chris Fregly] Merge remote-tracking branch 'upstream/master'
      74e5c7c [Chris Fregly] updated per TD's feedback.  simplified examples, updated docs
      e33cbeb [Chris Fregly] Merge remote-tracking branch 'upstream/master'
      bf614e9 [Chris Fregly] per matei's feedback:  moved the kinesis examples into the examples/ dir
      d17ca6d [Chris Fregly] per TD's feedback:  updated docs, simplified the KinesisUtils api
      912640c [Chris Fregly] changed the foundKinesis class to be a publically-avail class
      db3eefd [Chris Fregly] Merge remote-tracking branch 'upstream/master'
      21de67f [Chris Fregly] Merge remote-tracking branch 'upstream/master'
      6c39561 [Chris Fregly] parameterized the versions of the aws java sdk and kinesis client
      338997e [Chris Fregly] improve build docs for kinesis
      828f8ae [Chris Fregly] more cleanup
      e7c8978 [Chris Fregly] Merge remote-tracking branch 'upstream/master'
      cd68c0d [Chris Fregly] fixed typos and backward compatibility
      d18e680 [Chris Fregly] Merge remote-tracking branch 'upstream/master'
      b3b0ff1 [Chris Fregly] [SPARK-1981] Add AWS Kinesis streaming support
      91f9504e
  19. Jul 29, 2014
  20. Jul 28, 2014
    • Cheng Lian's avatar
      [SPARK-2410][SQL] Merging Hive Thrift/JDBC server (with Maven profile fix) · a7a9d144
      Cheng Lian authored
      JIRA issue: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410)
      
      Another try for #1399 & #1600. Those two PR breaks Jenkins builds because we made a separate profile `hive-thriftserver` in sub-project `assembly`, but the `hive-thriftserver` module is defined outside the `hive-thriftserver` profile. Thus every time a pull request that doesn't touch SQL code will also execute test suites defined in `hive-thriftserver`, but tests fail because related .class files are not included in the assembly jar.
      
      In the most recent commit, module `hive-thriftserver` is moved into its own profile to fix this problem. All previous commits are squashed for clarity.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1620 from liancheng/jdbc-with-maven-fix and squashes the following commits:
      
      629988e [Cheng Lian] Moved hive-thriftserver module definition into its own profile
      ec3c7a7 [Cheng Lian] Cherry picked the Hive Thrift server
      a7a9d144
  21. Jul 27, 2014
    • Patrick Wendell's avatar
      Revert "[SPARK-2410][SQL] Merging Hive Thrift/JDBC server" · e5bbce9a
      Patrick Wendell authored
      This reverts commit f6ff2a61.
      e5bbce9a
    • Cheng Lian's avatar
      [SPARK-2410][SQL] Merging Hive Thrift/JDBC server · f6ff2a61
      Cheng Lian authored
      (This is a replacement of #1399, trying to fix potential `HiveThriftServer2` port collision between parallel builds. Please refer to [these comments](https://github.com/apache/spark/pull/1399#issuecomment-50212572) for details.)
      
      JIRA issue: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410)
      
      Merging the Hive Thrift/JDBC server from [branch-1.0-jdbc](https://github.com/apache/spark/tree/branch-1.0-jdbc).
      
      Thanks chenghao-intel for his initial contribution of the Spark SQL CLI.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1600 from liancheng/jdbc and squashes the following commits:
      
      ac4618b [Cheng Lian] Uses random port for HiveThriftServer2 to avoid collision with parallel builds
      090beea [Cheng Lian] Revert changes related to SPARK-2678, decided to move them to another PR
      21c6cf4 [Cheng Lian] Updated Spark SQL programming guide docs
      fe0af31 [Cheng Lian] Reordered spark-submit options in spark-shell[.cmd]
      199e3fb [Cheng Lian] Disabled MIMA for hive-thriftserver
      1083e9d [Cheng Lian] Fixed failed test suites
      7db82a1 [Cheng Lian] Fixed spark-submit application options handling logic
      9cc0f06 [Cheng Lian] Starts beeline with spark-submit
      cfcf461 [Cheng Lian] Updated documents and build scripts for the newly added hive-thriftserver profile
      061880f [Cheng Lian] Addressed all comments by @pwendell
      7755062 [Cheng Lian] Adapts test suites to spark-submit settings
      40bafef [Cheng Lian] Fixed more license header issues
      e214aab [Cheng Lian] Added missing license headers
      b8905ba [Cheng Lian] Fixed minor issues in spark-sql and start-thriftserver.sh
      f975d22 [Cheng Lian] Updated docs for Hive compatibility and Shark migration guide draft
      3ad4e75 [Cheng Lian] Starts spark-sql shell with spark-submit
      a5310d1 [Cheng Lian] Make HiveThriftServer2 play well with spark-submit
      61f39f4 [Cheng Lian] Starts Hive Thrift server via spark-submit
      2c4c539 [Cheng Lian] Cherry picked the Hive Thrift server
      f6ff2a61
  22. Jul 25, 2014
    • Michael Armbrust's avatar
      Revert "[SPARK-2410][SQL] Merging Hive Thrift/JDBC server" · afd757a2
      Michael Armbrust authored
      This reverts commit 06dc0d2c.
      
      #1399 is making Jenkins fail.  We should investigate and put this back after its passing tests.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #1594 from marmbrus/revertJDBC and squashes the following commits:
      
      59748da [Michael Armbrust] Revert "[SPARK-2410][SQL] Merging Hive Thrift/JDBC server"
      afd757a2
    • Cheng Lian's avatar
      [SPARK-2410][SQL] Merging Hive Thrift/JDBC server · 06dc0d2c
      Cheng Lian authored
      JIRA issue:
      
      - Main: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410)
      - Related: [SPARK-2678](https://issues.apache.org/jira/browse/SPARK-2678)
      
      Cherry picked the Hive Thrift/JDBC server from [branch-1.0-jdbc](https://github.com/apache/spark/tree/branch-1.0-jdbc).
      
      (Thanks chenghao-intel for his initial contribution of the Spark SQL CLI.)
      
      TODO
      
      - [x] Use `spark-submit` to launch the server, the CLI and beeline
      - [x] Migration guideline draft for Shark users
      
      ----
      
      Hit by a bug in `SparkSubmitArguments` while working on this PR: all application options that are recognized by `SparkSubmitArguments` are stolen as `SparkSubmit` options. For example:
      
      ```bash
      $ spark-submit --class org.apache.hive.beeline.BeeLine spark-internal --help
      ```
      
      This actually shows usage information of `SparkSubmit` rather than `BeeLine`.
      
      ~~Fixed this bug here since the `spark-internal` related stuff also touches `SparkSubmitArguments` and I'd like to avoid conflict.~~
      
      **UPDATE** The bug mentioned above is now tracked by [SPARK-2678](https://issues.apache.org/jira/browse/SPARK-2678). Decided to revert changes to this bug since it involves more subtle considerations and worth a separate PR.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1399 from liancheng/thriftserver and squashes the following commits:
      
      090beea [Cheng Lian] Revert changes related to SPARK-2678, decided to move them to another PR
      21c6cf4 [Cheng Lian] Updated Spark SQL programming guide docs
      fe0af31 [Cheng Lian] Reordered spark-submit options in spark-shell[.cmd]
      199e3fb [Cheng Lian] Disabled MIMA for hive-thriftserver
      1083e9d [Cheng Lian] Fixed failed test suites
      7db82a1 [Cheng Lian] Fixed spark-submit application options handling logic
      9cc0f06 [Cheng Lian] Starts beeline with spark-submit
      cfcf461 [Cheng Lian] Updated documents and build scripts for the newly added hive-thriftserver profile
      061880f [Cheng Lian] Addressed all comments by @pwendell
      7755062 [Cheng Lian] Adapts test suites to spark-submit settings
      40bafef [Cheng Lian] Fixed more license header issues
      e214aab [Cheng Lian] Added missing license headers
      b8905ba [Cheng Lian] Fixed minor issues in spark-sql and start-thriftserver.sh
      f975d22 [Cheng Lian] Updated docs for Hive compatibility and Shark migration guide draft
      3ad4e75 [Cheng Lian] Starts spark-sql shell with spark-submit
      a5310d1 [Cheng Lian] Make HiveThriftServer2 play well with spark-submit
      61f39f4 [Cheng Lian] Starts Hive Thrift server via spark-submit
      2c4c539 [Cheng Lian] Cherry picked the Hive Thrift server
      06dc0d2c
  23. Jul 10, 2014
    • Prashant Sharma's avatar
      [SPARK-1776] Have Spark's SBT build read dependencies from Maven. · 628932b8
      Prashant Sharma authored
      Patch introduces the new way of working also retaining the existing ways of doing things.
      
      For example build instruction for yarn in maven is
      `mvn -Pyarn -PHadoop2.2 clean package -DskipTests`
      in sbt it can become
      `MAVEN_PROFILES="yarn, hadoop-2.2" sbt/sbt clean assembly`
      Also supports
      `sbt/sbt -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 clean assembly`
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #772 from ScrapCodes/sbt-maven and squashes the following commits:
      
      a8ac951 [Prashant Sharma] Updated sbt version.
      62b09bb [Prashant Sharma] Improvements.
      fa6221d [Prashant Sharma] Excluding sql from mima
      4b8875e [Prashant Sharma] Sbt assembly no longer builds tools by default.
      72651ca [Prashant Sharma] Addresses code reivew comments.
      acab73d [Prashant Sharma] Revert "Small fix to run-examples script."
      ac4312c [Prashant Sharma] Revert "minor fix"
      6af91ac [Prashant Sharma] Ported oldDeps back. + fixes issues with prev commit.
      65cf06c [Prashant Sharma] Servelet API jars mess up with the other servlet jars on the class path.
      446768e [Prashant Sharma] minor fix
      89b9777 [Prashant Sharma] Merge conflicts
      d0a02f2 [Prashant Sharma] Bumped up pom versions, Since the build now depends on pom it is better updated there. + general cleanups.
      dccc8ac [Prashant Sharma] updated mima to check against 1.0
      a49c61b [Prashant Sharma] Fix for tools jar
      a2f5ae1 [Prashant Sharma] Fixes a bug in dependencies.
      cf88758 [Prashant Sharma] cleanup
      9439ea3 [Prashant Sharma] Small fix to run-examples script.
      96cea1f [Prashant Sharma] SPARK-1776 Have Spark's SBT build read dependencies from Maven.
      36efa62 [Patrick Wendell] Set project name in pom files and added eclipse/intellij plugins.
      4973dbd [Patrick Wendell] Example build using pom reader.
      628932b8
  24. Jul 03, 2014
    • Prashant Sharma's avatar
      [SPARK-2109] Setting SPARK_MEM for bin/pyspark does not work. · 731f683b
      Prashant Sharma authored
      Trivial fix.
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      
      Closes #1050 from ScrapCodes/SPARK-2109/pyspark-script-bug and squashes the following commits:
      
      77072b9 [Prashant Sharma] Changed echos to redirect to STDERR.
      13f48a0 [Prashant Sharma] [SPARK-2109] Setting SPARK_MEM for bin/pyspark does not work.
      731f683b
  25. Jun 23, 2014
  26. Jun 12, 2014
    • Patrick Wendell's avatar
      SPARK-1843: Replace assemble-deps with env variable. · 1c04652c
      Patrick Wendell authored
      (This change is actually small, I moved some logic into
      compute-classpath that was previously in spark-class).
      
      Assemble deps has existed for a while to allow developers to
      run local code with new changes quickly. When I'm developing I
      typically use a simpler approach which just prepends the Spark
      classes to the classpath before the assembly jar. This is well
      defined in the JVM and the Spark classes take precedence over those
      in the assembly.
      
      This approach is portable across both builds which is the main reason I'd
      like to switch to it. It's also a bit easier to toggle on and off quickly.
      
      The way you use this is the following:
      ```
      $ ./bin/spark-shell # Use spark with the normal assembly
      $ export SPARK_PREPEND_CLASSES=true
      $ ./bin/spark-shell # Now it's using compiled classes
      $ unset SPARK_PREPEND_CLASSES
      $ ./bin/spark-shell # Back to normal
      ```
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #877 from pwendell/assemble-deps and squashes the following commits:
      
      8a11345 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into assemble-deps
      faa3168 [Patrick Wendell] Adding a warning for compatibility
      3f151a7 [Patrick Wendell] Small fix
      bbfb73c [Patrick Wendell] Review feedback
      328e9f8 [Patrick Wendell] SPARK-1843: Replace assemble-deps with env variable.
      1c04652c
  27. Jun 11, 2014
    • Andrew Or's avatar
      HOTFIX: A few PySpark tests were not actually run · fe78b8b6
      Andrew Or authored
      This is a hot fix for the hot fix in fb499be1. The changes in that commit did not actually cause the `doctest` module in python to be loaded for the following tests:
      - pyspark/broadcast.py
      - pyspark/accumulators.py
      - pyspark/serializers.py
      
      (@pwendell I might have told you the wrong thing)
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #1053 from andrewor14/python-test-fix and squashes the following commits:
      
      d2e5401 [Andrew Or] Explain why these tests are handled differently
      0bd6fdd [Andrew Or] Fix 3 pyspark tests not being invoked
      fe78b8b6
  28. Jun 10, 2014
    • Patrick Wendell's avatar
      HOTFIX: Fix Python tests on Jenkins. · fb499be1
      Patrick Wendell authored
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #1036 from pwendell/jenkins-test and squashes the following commits:
      
      9c99856 [Patrick Wendell] Better output during tests
      71e7b74 [Patrick Wendell] Removing incorrect python path
      74984db [Patrick Wendell] HOTFIX: Allow PySpark tests to run on Jenkins.
      fb499be1
  29. Jun 08, 2014
    • maji2014's avatar
      Update run-example · e9261d08
      maji2014 authored
      Old code can only be ran under spark_home and use "bin/run-example".
       Error "./run-example: line 55: ./bin/spark-submit: No such file or directory" appears when running in other place. So change this
      
      Author: maji2014 <maji3@asiainfo-linkage.com>
      
      Closes #1011 from maji2014/master and squashes the following commits:
      
      2cc1af6 [maji2014] Update run-example
      
      Closes #988.
      e9261d08
  30. May 25, 2014
    • Colin Patrick Mccabe's avatar
      spark-submit: add exec at the end of the script · 6e9fb632
      Colin Patrick Mccabe authored
      Add an 'exec' at the end of the spark-submit script, to avoid keeping a
      bash process hanging around while it runs.  This makes ps look a little
      bit nicer.
      
      Author: Colin Patrick Mccabe <cmccabe@cloudera.com>
      
      Closes #858 from cmccabe/SPARK-1907 and squashes the following commits:
      
      7023b64 [Colin Patrick Mccabe] spark-submit: add exec at the end of the script
      6e9fb632
  31. May 21, 2014
    • Sumedh Mungee's avatar
      [SPARK-1250] Fixed misleading comments in bin/pyspark, bin/spark-class · 6e337380
      Sumedh Mungee authored
      Fixed a couple of misleading comments in bin/pyspark and bin/spark-class. The comments make it seem like the script is looking for the Scala installation when in fact it is looking for Spark.
      
      Author: Sumedh Mungee <smungee@gmail.com>
      
      Closes #843 from smungee/spark-1250-fix-comments and squashes the following commits:
      
      26870f3 [Sumedh Mungee] [SPARK-1250] Fixed misleading comments in bin/pyspark and bin/spark-class
      6e337380
  32. May 19, 2014
    • Matei Zaharia's avatar
      SPARK-1879. Increase MaxPermSize since some of our builds have many classes · 5af99d76
      Matei Zaharia authored
      See https://issues.apache.org/jira/browse/SPARK-1879 -- builds with Hadoop2 and Hive ran out of PermGen space in spark-shell, when those things added up with the Scala compiler.
      
      Note that users can still override it by setting their own Java options with this change. Their options will come later in the command string than the -XX:MaxPermSize=128m.
      
      Author: Matei Zaharia <matei@databricks.com>
      
      Closes #823 from mateiz/spark-1879 and squashes the following commits:
      
      6bc0ee8 [Matei Zaharia] Increase MaxPermSize to 128m since some of our builds have lots of classes
      5af99d76
    • Matei Zaharia's avatar
      [SPARK-1876] Windows fixes to deal with latest distribution layout changes · 7b70a707
      Matei Zaharia authored
      - Look for JARs in the right place
      - Launch examples the same way as on Unix
      - Load datanucleus JARs if they exist
      - Don't attempt to parse local paths as URIs in SparkSubmit, since paths with C:\ are not valid URIs
      - Also fixed POM exclusion rules for datanucleus (it wasn't properly excluding it, whereas SBT was)
      
      Author: Matei Zaharia <matei@databricks.com>
      
      Closes #819 from mateiz/win-fixes and squashes the following commits:
      
      d558f96 [Matei Zaharia] Fix comment
      228577b [Matei Zaharia] Review comments
      d3b71c7 [Matei Zaharia] Properly exclude datanucleus files in Maven assembly
      144af84 [Matei Zaharia] Update Windows scripts to match latest binary package layout
      7b70a707
  33. May 18, 2014
    • Neville Li's avatar
      Fix spark-submit path in spark-shell & pyspark · ebcd2d68
      Neville Li authored
      Author: Neville Li <neville@spotify.com>
      
      Closes #812 from nevillelyh/neville/v1.0 and squashes the following commits:
      
      0dc33ed [Neville Li] Fix spark-submit path in pyspark
      becec64 [Neville Li] Fix spark-submit path in spark-shell
      ebcd2d68
  34. May 17, 2014
    • Andrew Or's avatar
      [SPARK-1808] Route bin/pyspark through Spark submit · 4b8ec6fc
      Andrew Or authored
      **Problem.** For `bin/pyspark`, there is currently no other way to specify Spark configuration properties other than through `SPARK_JAVA_OPTS` in `conf/spark-env.sh`. However, this mechanism is supposedly deprecated. Instead, it needs to pick up configurations explicitly specified in `conf/spark-defaults.conf`.
      
      **Solution.** Have `bin/pyspark` invoke `bin/spark-submit`, like all of its counterparts in Scala land (i.e. `bin/spark-shell`, `bin/run-example`). This has the additional benefit of making the invocation of all the user facing Spark scripts consistent.
      
      **Details.** `bin/pyspark` inherently handles two cases: (1) running python applications and (2) running the python shell. For (1), Spark submit already handles running python applications. For cases in which `bin/pyspark` is given a python file, we can simply call pass the file directly to Spark submit and let it handle the rest.
      
      For case (2), `bin/pyspark` starts a python process as before, which launches the JVM as a sub-process. The existing code already provides a code path to do this. All we needed to change is to use `bin/spark-submit` instead of `spark-class` to launch the JVM. This requires modifications to Spark submit to handle the pyspark shell as a special case.
      
      This has been tested locally (OSX and Windows 7), on a standalone cluster, and on a YARN cluster. Running IPython also works as before, except now it takes in Spark submit arguments too.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #799 from andrewor14/pyspark-submit and squashes the following commits:
      
      bf37e36 [Andrew Or] Minor changes
      01066fa [Andrew Or] bin/pyspark for Windows
      c8cb3bf [Andrew Or] Handle perverse app names (with escaped quotes)
      1866f85 [Andrew Or] Windows is not cooperating
      456d844 [Andrew Or] Guard against shlex hanging if PYSPARK_SUBMIT_ARGS is not set
      7eebda8 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit
      b7ba0d8 [Andrew Or] Address a few comments (minor)
      06eb138 [Andrew Or] Use shlex instead of writing our own parser
      05879fa [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit
      a823661 [Andrew Or] Fix --die-on-broken-pipe not propagated properly
      6fba412 [Andrew Or] Deal with quotes + address various comments
      fe4c8a7 [Andrew Or] Update --help for bin/pyspark
      afe47bf [Andrew Or] Fix spark shell
      f04aaa4 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit
      a371d26 [Andrew Or] Route bin/pyspark through Spark submit
      4b8ec6fc
Loading