Skip to content
Snippets Groups Projects
  1. May 15, 2015
    • Sean Owen's avatar
      [SPARK-5412] [DEPLOY] Cannot bind Master to a specific hostname as per the documentation · 8ab1450d
      Sean Owen authored
      Pass args to start-master.sh through to start-daemon.sh, as other scripts do, so that things like --host have effect on start-master.sh as per docs
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #6185 from srowen/SPARK-5412 and squashes the following commits:
      
      b3ce9da [Sean Owen] Pass args to start-master.sh through to start-daemon.sh, as other scripts do, so that things like --host have effect on start-master.sh as per docs
      8ab1450d
  2. Apr 28, 2015
    • Timothy Chen's avatar
      [SPARK-5338] [MESOS] Add cluster mode support for Mesos · 53befacc
      Timothy Chen authored
      This patch adds the support for cluster mode to run on Mesos.
      It introduces a new Mesos framework dedicated to launch new apps/drivers, and can be called with the spark-submit script and specifying --master flag to the cluster mode REST interface instead of Mesos master.
      
      Example:
      ./bin/spark-submit --deploy-mode cluster --class org.apache.spark.examples.SparkPi --master mesos://10.0.0.206:8077 --executor-memory 1G --total-executor-cores 100 examples/target/spark-examples_2.10-1.3.0-SNAPSHOT.jar 30
      
      Part of this patch is also to abstract the StandaloneRestServer so it can have different implementations of the REST endpoints.
      
      Features of the cluster mode in this PR:
      - Supports supervise mode where scheduler will keep trying to reschedule exited job.
      - Adds a new UI for the cluster mode scheduler to see all the running jobs, finished jobs, and supervise jobs waiting to be retried
      - Supports state persistence to ZK, so when the cluster scheduler fails over it can pick up all the queued and running jobs
      
      Author: Timothy Chen <tnachen@gmail.com>
      Author: Luc Bourlier <luc.bourlier@typesafe.com>
      
      Closes #5144 from tnachen/mesos_cluster_mode and squashes the following commits:
      
      069e946 [Timothy Chen] Fix rebase.
      e24b512 [Timothy Chen] Persist submitted driver.
      390c491 [Timothy Chen] Fix zk conf key for mesos zk engine.
      e324ac1 [Timothy Chen] Fix merge.
      fd5259d [Timothy Chen] Address review comments.
      1553230 [Timothy Chen] Address review comments.
      c6c6b73 [Timothy Chen] Pass spark properties to mesos cluster tasks.
      f7d8046 [Timothy Chen] Change app name to spark cluster.
      17f93a2 [Timothy Chen] Fix head of line blocking in scheduling drivers.
      6ff8e5c [Timothy Chen] Address comments and add logging.
      df355cd [Timothy Chen] Add metrics to mesos cluster scheduler.
      20f7284 [Timothy Chen] Address review comments
      7252612 [Timothy Chen] Fix tests.
      a46ad66 [Timothy Chen] Allow zk cli param override.
      920fc4b [Timothy Chen] Fix scala style issues.
      862b5b5 [Timothy Chen] Support asking driver status when it's retrying.
      7f214c2 [Timothy Chen] Fix RetryState visibility
      e0f33f7 [Timothy Chen] Add supervise support and persist retries.
      371ce65 [Timothy Chen] Handle cluster mode recovery and state persistence.
      3d4dfa1 [Luc Bourlier] Adds support to kill submissions
      febfaba [Timothy Chen] Bound the finished drivers in memory
      543a98d [Timothy Chen] Schedule multiple jobs
      6887e5e [Timothy Chen] Support looking at SPARK_EXECUTOR_URI env variable in schedulers
      8ec76bc [Timothy Chen] Fix Mesos dispatcher UI.
      d57d77d [Timothy Chen] Add documentation
      825afa0 [Luc Bourlier] Supports more spark-submit parameters
      b8e7181 [Luc Bourlier] Adds a shutdown latch to keep the deamon running
      0fa7780 [Luc Bourlier] Launch task through the mesos scheduler
      5b7a12b [Timothy Chen] WIP: Making a cluster mode a mesos framework.
      4b2f5ef [Timothy Chen] Specify user jar in command to be replaced with local.
      e775001 [Timothy Chen] Support fetching remote uris in driver runner.
      7179495 [Timothy Chen] Change Driver page output and add logging
      880bc27 [Timothy Chen] Add Mesos Cluster UI to display driver results
      9986731 [Timothy Chen] Kill drivers when shutdown
      67cbc18 [Timothy Chen] Rename StandaloneRestClient to RestClient and add sbin scripts
      e3facdd [Timothy Chen] Add Mesos Cluster dispatcher
      53befacc
    • Iulian Dragos's avatar
      [SPARK-4286] Add an external shuffle service that can be run as a daemon. · 8aab94d8
      Iulian Dragos authored
      This allows Mesos deployments to use the shuffle service (and implicitly dynamic allocation). It does so by adding a new "main" class and two corresponding scripts in `sbin`:
      
      - `sbin/start-shuffle-service.sh`
      - `sbin/stop-shuffle-service.sh`
      
      Specific options can be passed in `SPARK_SHUFFLE_OPTS`.
      
      This is picking up work from #3861 /cc tnachen
      
      Author: Iulian Dragos <jaguarul@gmail.com>
      
      Closes #4990 from dragos/feature/external-shuffle-service and squashes the following commits:
      
      6c2b148 [Iulian Dragos] Import order and wrong name fixup.
      07804ad [Iulian Dragos] Moved ExternalShuffleService to the `deploy` package + other minor tweaks.
      4dc1f91 [Iulian Dragos] Reviewer’s comments:
      8145429 [Iulian Dragos] Add an external shuffle service that can be run as a daemon.
      8aab94d8
  3. Apr 17, 2015
    • Punya Biswal's avatar
      [SPARK-6952] Handle long args when detecting PID reuse · f6a9a57a
      Punya Biswal authored
      sbin/spark-daemon.sh used
      
          ps -p "$TARGET_PID" -o args=
      
      to figure out whether the process running with the expected PID is actually a Spark
      daemon. When running with a large classpath, the output of ps gets
      truncated and the check fails spuriously.
      
      This weakens the check to see if it's a java command (which is something
      we do in other parts of the script) rather than looking for the specific
      main class name. This means that SPARK-4832 might happen under a
      slightly broader range of circumstances (a java program happened to
      reuse the same PID), but it seems worthwhile compared to failing
      consistently with a large classpath.
      
      Author: Punya Biswal <pbiswal@palantir.com>
      
      Closes #5535 from punya/feature/SPARK-6952 and squashes the following commits:
      
      7ea12d1 [Punya Biswal] Handle long args when detecting PID reuse
      f6a9a57a
  4. Apr 13, 2015
    • Nathan Kronenfeld's avatar
      [Spark-4848] Allow different Worker configurations in standalone cluster · 435b8779
      Nathan Kronenfeld authored
      This refixes #3699 with the latest code.
      This fixes SPARK-4848
      
      I've changed the stand-alone cluster scripts to allow different workers to have different numbers of instances, with both port and web-ui port following allong appropriately.
      
      I did this by moving the loop over instances from start-slaves and stop-slaves (on the master) to start-slave and stop-slave (on the worker).
      
      Wile I was at it, I changed SPARK_WORKER_PORT to work the same way as SPARK_WORKER_WEBUI_PORT, since the new methods work fine for both.
      
      Author: Nathan Kronenfeld <nkronenfeld@oculusinfo.com>
      
      Closes #5140 from nkronenfeld/feature/spark-4848 and squashes the following commits:
      
      cf5f47e [Nathan Kronenfeld] Merge remote branch 'upstream/master' into feature/spark-4848
      044ca6f [Nathan Kronenfeld] Documentation and formatting as requested by by andrewor14
      d739640 [Nathan Kronenfeld] Move looping through instances from the master to the workers, so that each worker respects its own number of instances and web-ui port
      435b8779
    • Pradeep Chanumolu's avatar
      [SPARK-6671] Add status command for spark daemons · 240ea03f
      Pradeep Chanumolu authored
      SPARK-6671
      Currently using the spark-daemon.sh script we can start and stop the spark demons. But we cannot get the status of the daemons. It will be nice to include the status command in the spark-daemon.sh script, through which we can know if the spark demon is alive or not.
      
      Author: Pradeep Chanumolu <pchanumolu@maprtech.com>
      
      Closes #5327 from pchanumolu/master and squashes the following commits:
      
      d3a1f05 [Pradeep Chanumolu] Make status command check consistent with Stop command
      5062926 [Pradeep Chanumolu] Fix indentation in spark-daemon.sh
      3e66bc8 [Pradeep Chanumolu] SPARK-6671 : Add status command to spark daemons
      1ac3918 [Pradeep Chanumolu] Add status command to spark-daemon
      240ea03f
  5. Mar 30, 2015
    • Jose Manuel Gomez's avatar
      [HOTFIX] Update start-slave.sh · 19d4c392
      Jose Manuel Gomez authored
      wihtout this change the below error happens when I execute sbin/start-all.sh
      
      localhost: /spark-1.3/sbin/start-slave.sh: line 32: unexpected EOF while looking for matching `"'
      localhost: /spark-1.3/sbin/start-slave.sh: line 33: syntax error: unexpected end of file
      
      my operating system is Linux Mint 17.1 Rebecca
      
      Author: Jose Manuel Gomez <jmgomez@stratio.com>
      
      Closes #5262 from josegom/patch-2 and squashes the following commits:
      
      453af8b [Jose Manuel Gomez] Update start-slave.sh
      2c456bd [Jose Manuel Gomez] Update start-slave.sh
      19d4c392
  6. Mar 28, 2015
  7. Mar 11, 2015
    • Marcelo Vanzin's avatar
      [SPARK-4924] Add a library for launching Spark jobs programmatically. · 517975d8
      Marcelo Vanzin authored
      This change encapsulates all the logic involved in launching a Spark job
      into a small Java library that can be easily embedded into other applications.
      
      The overall goal of this change is twofold, as described in the bug:
      
      - Provide a public API for launching Spark processes. This is a common request
        from users and currently there's no good answer for it.
      
      - Remove a lot of the duplicated code and other coupling that exists in the
        different parts of Spark that deal with launching processes.
      
      A lot of the duplication was due to different code needed to build an
      application's classpath (and the bootstrapper needed to run the driver in
      certain situations), and also different code needed to parse spark-submit
      command line options in different contexts. The change centralizes those
      as much as possible so that all code paths can rely on the library for
      handling those appropriately.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #3916 from vanzin/SPARK-4924 and squashes the following commits:
      
      18c7e4d [Marcelo Vanzin] Fix make-distribution.sh.
      2ce741f [Marcelo Vanzin] Add lots of quotes.
      3b28a75 [Marcelo Vanzin] Update new pom.
      a1b8af1 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      897141f [Marcelo Vanzin] Review feedback.
      e2367d2 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      28cd35e [Marcelo Vanzin] Remove stale comment.
      b1d86b0 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      00505f9 [Marcelo Vanzin] Add blurb about new API in the programming guide.
      5f4ddcc [Marcelo Vanzin] Better usage messages.
      92a9cfb [Marcelo Vanzin] Fix Win32 launcher, usage.
      6184c07 [Marcelo Vanzin] Rename field.
      4c19196 [Marcelo Vanzin] Update comment.
      7e66c18 [Marcelo Vanzin] Fix pyspark tests.
      0031a8e [Marcelo Vanzin] Review feedback.
      c12d84b [Marcelo Vanzin] Review feedback. And fix spark-submit on Windows.
      e2d4d71 [Marcelo Vanzin] Simplify some code used to launch pyspark.
      43008a7 [Marcelo Vanzin] Don't make builder extend SparkLauncher.
      b4d6912 [Marcelo Vanzin] Use spark-submit script in SparkLauncher.
      28b1434 [Marcelo Vanzin] Add a comment.
      304333a [Marcelo Vanzin] Fix propagation of properties file arg.
      bb67b93 [Marcelo Vanzin] Remove unrelated Yarn change (that is also wrong).
      8ec0243 [Marcelo Vanzin] Add missing newline.
      95ddfa8 [Marcelo Vanzin] Fix handling of --help for spark-class command builder.
      72da7ec [Marcelo Vanzin] Rename SparkClassLauncher.
      62978e4 [Marcelo Vanzin] Minor cleanup of Windows code path.
      9cd5b44 [Marcelo Vanzin] Make all non-public APIs package-private.
      e4c80b6 [Marcelo Vanzin] Reorganize the code so that only SparkLauncher is public.
      e50dc5e [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      de81da2 [Marcelo Vanzin] Fix CommandUtils.
      86a87bf [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      2061967 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      46d46da [Marcelo Vanzin] Clean up a test and make it more future-proof.
      b93692a [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      ad03c48 [Marcelo Vanzin] Revert "Fix a thread-safety issue in "local" mode."
      0b509d0 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      23aa2a9 [Marcelo Vanzin] Read java-opts from conf dir, not spark home.
      7cff919 [Marcelo Vanzin] Javadoc updates.
      eae4d8e [Marcelo Vanzin] Fix new unit tests on Windows.
      e570fb5 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      44cd5f7 [Marcelo Vanzin] Add package-info.java, clean up javadocs.
      f7cacff [Marcelo Vanzin] Remove "launch Spark in new thread" feature.
      7ed8859 [Marcelo Vanzin] Some more feedback.
      54cd4fd [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      61919df [Marcelo Vanzin] Clean leftover debug statement.
      aae5897 [Marcelo Vanzin] Use launcher classes instead of jars in non-release mode.
      e584fc3 [Marcelo Vanzin] Rework command building a little bit.
      525ef5b [Marcelo Vanzin] Rework Unix spark-class to handle argument with newlines.
      8ac4e92 [Marcelo Vanzin] Minor test cleanup.
      e946a99 [Marcelo Vanzin] Merge PySparkLauncher into SparkSubmitCliLauncher.
      c617539 [Marcelo Vanzin] Review feedback round 1.
      fc6a3e2 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      f26556b [Marcelo Vanzin] Fix a thread-safety issue in "local" mode.
      2f4e8b4 [Marcelo Vanzin] Changes needed to make this work with SPARK-4048.
      799fc20 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      bb5d324 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      53faef1 [Marcelo Vanzin] Merge branch 'master' into SPARK-4924
      a7936ef [Marcelo Vanzin] Fix pyspark tests.
      656374e [Marcelo Vanzin] Mima fixes.
      4d511e7 [Marcelo Vanzin] Fix tools search code.
      7a01e4a [Marcelo Vanzin] Fix pyspark on Yarn.
      1b3f6e9 [Marcelo Vanzin] Call SparkSubmit from spark-class launcher for unknown classes.
      25c5ae6 [Marcelo Vanzin] Centralize SparkSubmit command line parsing.
      27be98a [Marcelo Vanzin] Modify Spark to use launcher lib.
      6f70eea [Marcelo Vanzin] [SPARK-4924] Add a library for launching Spark jobs programatically.
      517975d8
  8. Mar 07, 2015
    • WangTaoTheTonic's avatar
      [Minor]fix the wrong description · 729c05bd
      WangTaoTheTonic authored
      Found it by accident. I'm not gonna file jira for this as it is a very tiny fix.
      
      Author: WangTaoTheTonic <wangtao111@huawei.com>
      
      Closes #4936 from WangTaoTheTonic/wrongdesc and squashes the following commits:
      
      fb8a8ec [WangTaoTheTonic] fix the wrong description
      aca5596 [WangTaoTheTonic] fix the wrong description
      729c05bd
  9. Mar 06, 2015
    • Zhang, Liye's avatar
      [CORE, DEPLOY][minor] align arguments order with docs of worker · d8b3da9d
      Zhang, Liye authored
      The help message for starting `worker` is `Usage: Worker [options] <master>`. While in `start-slaves.sh`, the format is not align with that, it is confusing for the fist glance.
      
      Author: Zhang, Liye <liye.zhang@intel.com>
      
      Closes #4924 from liyezhang556520/startSlaves and squashes the following commits:
      
      7fd5deb [Zhang, Liye] align arguments order with docs of worker
      d8b3da9d
  10. Feb 19, 2015
    • Zhan Zhang's avatar
      [Spark-5889] Remove pid file after stopping service. · ad6b169d
      Zhan Zhang authored
      Currently the pid file is not deleted, and potentially may cause some problem after service is stopped. The fix remove the pid file after service stopped.
      
      Author: Zhan Zhang <zhazhan@gmail.com>
      
      Closes #4676 from zhzhan/spark-5889 and squashes the following commits:
      
      eb01be1 [Zhan Zhang] solve review comments
      b4c009e [Zhan Zhang] solve review comments
      018110a [Zhan Zhang] spark-5889: remove pid file after stopping service
      088d2a2 [Zhan Zhang] squash all commits
      c1f1fa5 [Zhan Zhang] test
      ad6b169d
    • Cheng Hao's avatar
      [SPARK-5825] [Spark Submit] Remove the double checking instance name when stopping the service · 94cdb05f
      Cheng Hao authored
      `spark-daemon.sh` will confirm the process id by fuzzy matching the class name while stopping the service, however, it will fail if the java process arguments is very long (greater than 4096 characters).
      This PR looses the check for the service process.
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #4611 from chenghao-intel/stopping_service and squashes the following commits:
      
      a0051f6 [Cheng Hao] loosen the process checking while stopping a service
      94cdb05f
  11. Feb 13, 2015
    • WangTaoTheTonic's avatar
      [SPARK-4832][Deploy]some other processes might take the daemon pid · 1768bd51
      WangTaoTheTonic authored
      Some other processes might use the pid saved in pid file. In that case we should ignore it and launch daemons.
      
      JIRA is down for maintenance. I will file one once it return.
      
      Author: WangTaoTheTonic <barneystinson@aliyun.com>
      Author: WangTaoTheTonic <wangtao111@huawei.com>
      
      Closes #3683 from WangTaoTheTonic/otherproc and squashes the following commits:
      
      daa86a1 [WangTaoTheTonic] some bash style fix
      8befee7 [WangTaoTheTonic] handle the mistake scenario
      cf4ecc6 [WangTaoTheTonic] remove redundant condition
      f36cfb4 [WangTaoTheTonic] some other processes might take the pid
      1768bd51
  12. Feb 01, 2015
    • Tom Panning's avatar
      [SPARK-5176] The thrift server does not support cluster mode · 1ca0a101
      Tom Panning authored
      Output an error message if the thrift server is started in cluster mode.
      
      Author: Tom Panning <tom.panning@nextcentury.com>
      
      Closes #4137 from tpanningnextcen/spark-5176-thrift-cluster-mode-error and squashes the following commits:
      
      f5c0509 [Tom Panning] [SPARK-5176] The thrift server does not support cluster mode
      1ca0a101
  13. Jan 19, 2015
    • Jongyoul Lee's avatar
      [SPARK-5088] Use spark-class for running executors directly · 4a4f9ccb
      Jongyoul Lee authored
      Author: Jongyoul Lee <jongyoul@gmail.com>
      
      Closes #3897 from jongyoul/SPARK-5088 and squashes the following commits:
      
      8232aa8 [Jongyoul Lee] [SPARK-5088] Use spark-class for running executors directly - Added a listenerBus for fixing test cases
      932289f [Jongyoul Lee] [SPARK-5088] Use spark-class for running executors directly - Rebased from master
      613cb47 [Jongyoul Lee] [SPARK-5088] Use spark-class for running executors directly - Fixed code if spark.executor.uri doesn't have any value - Added test cases
      ff57bda [Jongyoul Lee] [SPARK-5088] Use spark-class for running executors directly - Adjusted orders of import
      97e4bd4 [Jongyoul Lee] [SPARK-5088] Use spark-class for running executors directly - Changed command for using spark-class directly - Delete sbin/spark-executor and moved some codes into spark-class' case statement
      4a4f9ccb
  14. Dec 09, 2014
    • jbencook's avatar
      [SPARK-874] adding a --wait flag · 61f1a702
      jbencook authored
      This PR adds a --wait flag to the `./sbin/stop-all.sh` script.
      
      Author: jbencook <jbenjamincook@gmail.com>
      
      Closes #3567 from jbencook/master and squashes the following commits:
      
      d05c5bb [jbencook] [SPARK-874] adding a --wait flag
      61f1a702
  15. Oct 28, 2014
    • Kousuke Saruta's avatar
      [SPARK-4110] Wrong comments about default settings in spark-daemon.sh · 44d8b45a
      Kousuke Saruta authored
      In spark-daemon.sh, thare are following comments.
      
          #   SPARK_CONF_DIR  Alternate conf dir. Default is ${SPARK_PREFIX}/conf.
          #   SPARK_LOG_DIR   Where log files are stored.  PWD by default.
      
      But, I think the default value for SPARK_CONF_DIR is `${SPARK_HOME}/conf` and for SPARK_LOG_DIR is `${SPARK_HOME}/logs`.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #2972 from sarutak/SPARK-4110 and squashes the following commits:
      
      5a171a2 [Kousuke Saruta] Fixed wrong comments
      44d8b45a
  16. Oct 24, 2014
    • Kousuke Saruta's avatar
      [SPARK-4076] Parameter expansion in spark-config is wrong · 30ea2868
      Kousuke Saruta authored
      In sbin/spark-config.sh, parameter expansion is used to extract source root as follows.
      
          this="${BASH_SOURCE-$0}"
      
      I think, the parameter expansion should be ":" instead of "".
      If we use "-" and BASH_SOURCE="", (empty character is set, not unset),
      "" (empty character) is set to $this.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #2930 from sarutak/SPARK-4076 and squashes the following commits:
      
      32a0370 [Kousuke Saruta] Fixed wrong parameter expansion
      30ea2868
  17. Oct 03, 2014
  18. Oct 01, 2014
  19. Sep 25, 2014
    • Kousuke Saruta's avatar
      [SPARK-3584] sbin/slaves doesn't work when we use password authentication for SSH · 0dc868e7
      Kousuke Saruta authored
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #2444 from sarutak/slaves-scripts-modification and squashes the following commits:
      
      eff7394 [Kousuke Saruta] Improve the description about Cluster Launch Script in docs/spark-standalone.md
      7858225 [Kousuke Saruta] Modified sbin/slaves to use the environment variable "SPARK_SSH_FOREGROUND" as a flag
      53d7121 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into slaves-scripts-modification
      e570431 [Kousuke Saruta] Added a description for SPARK_SSH_FOREGROUND variable
      7120a0c [Kousuke Saruta] Added a description about default host for sbin/slaves
      1bba8a9 [Kousuke Saruta] Added SPARK_SSH_FOREGROUND flag to sbin/slaves
      88e2f17 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into slaves-scripts-modification
      297e75d [Kousuke Saruta] Modified sbin/slaves not to export HOSTLIST
      0dc868e7
  20. Sep 18, 2014
  21. Sep 08, 2014
    • Prashant Sharma's avatar
      SPARK-3337 Paranoid quoting in shell to allow install dirs with spaces within. · e16a8e7d
      Prashant Sharma authored
      ...
      
      Tested ! TBH, it isn't a great idea to have directory with spaces within. Because emacs doesn't like it then hadoop doesn't like it. and so on...
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      
      Closes #2229 from ScrapCodes/SPARK-3337/quoting-shell-scripts and squashes the following commits:
      
      d4ad660 [Prashant Sharma] SPARK-3337 Paranoid quoting in shell to allow install dirs with spaces within.
      e16a8e7d
  22. Aug 26, 2014
    • Cheng Lian's avatar
      [SPARK-2964] [SQL] Remove duplicated code from spark-sql and start-thriftserver.sh · faeb9c0e
      Cheng Lian authored
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #1886 from sarutak/SPARK-2964 and squashes the following commits:
      
      8ef8751 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2964
      26e7c95 [Kousuke Saruta] Revert "Shorten timeout to more reasonable value"
      ffb68fa [Kousuke Saruta] Modified spark-sql and start-thriftserver.sh to use bin/utils.sh
      8c6f658 [Kousuke Saruta] Merge branch 'spark-3026' of https://github.com/liancheng/spark into SPARK-2964
      81b43a8 [Cheng Lian] Shorten timeout to more reasonable value
      a89e66d [Cheng Lian] Fixed command line options quotation in scripts
      9c894d3 [Cheng Lian] Fixed bin/spark-sql -S option typo
      be4736b [Cheng Lian] Report better error message when running JDBC/CLI without hive-thriftserver profile enabled
      faeb9c0e
  23. Aug 25, 2014
  24. Aug 20, 2014
  25. Aug 14, 2014
    • wangfei's avatar
      [SPARK-2925] [sql]fix spark-sql and start-thriftserver shell bugs when set --driver-java-options · 267fdffe
      wangfei authored
      https://issues.apache.org/jira/browse/SPARK-2925
      
      Run cmd like this will get the error
      bin/spark-sql --driver-java-options '-Xdebug -Xnoagent -Xrunjdwp:transport=dt_socket,address=8788,server=y,suspend=y'
      
      Error: Unrecognized option '-Xnoagent'.
      Run with --help for usage help or --verbose for debug output
      
      Author: wangfei <wangfei_hello@126.com>
      Author: wangfei <wangfei1@huawei.com>
      
      Closes #1851 from scwf/patch-2 and squashes the following commits:
      
      516554d [wangfei] quote variables to fix this issue
      8bd40f2 [wangfei] quote variables to fix this problem
      e6d79e3 [wangfei] fix start-thriftserver bug when set driver-java-options
      948395d [wangfei] fix spark-sql error when set --driver-java-options
      267fdffe
  26. Aug 06, 2014
    • Cheng Lian's avatar
      [SPARK-2678][Core][SQL] A workaround for SPARK-2678 · a6cd3110
      Cheng Lian authored
      JIRA issues:
      
      - Main: [SPARK-2678](https://issues.apache.org/jira/browse/SPARK-2678)
      - Related: [SPARK-2874](https://issues.apache.org/jira/browse/SPARK-2874)
      
      Related PR:
      
      - #1715
      
      This PR is both a fix for SPARK-2874 and a workaround for SPARK-2678. Fixing SPARK-2678 completely requires some API level changes that need further discussion, and we decided not to include it in Spark 1.1 release. As currently SPARK-2678 only affects Spark SQL scripts, this workaround is enough for Spark 1.1. Command line option handling logic in bash scripts looks somewhat dirty and duplicated, but it helps to provide a cleaner user interface as well as retain full downward compatibility for now.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1801 from liancheng/spark-2874 and squashes the following commits:
      
      8045d7a [Cheng Lian] Make sure test suites pass
      8493a9e [Cheng Lian] Using eval to retain quoted arguments
      aed523f [Cheng Lian] Fixed typo in bin/spark-sql
      f12a0b1 [Cheng Lian] Worked arount SPARK-2678
      daee105 [Cheng Lian] Fixed usage messages of all Spark SQL related scripts
      a6cd3110
  27. Jul 29, 2014
  28. Jul 28, 2014
    • Cheng Lian's avatar
      [SPARK-2410][SQL] Merging Hive Thrift/JDBC server (with Maven profile fix) · a7a9d144
      Cheng Lian authored
      JIRA issue: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410)
      
      Another try for #1399 & #1600. Those two PR breaks Jenkins builds because we made a separate profile `hive-thriftserver` in sub-project `assembly`, but the `hive-thriftserver` module is defined outside the `hive-thriftserver` profile. Thus every time a pull request that doesn't touch SQL code will also execute test suites defined in `hive-thriftserver`, but tests fail because related .class files are not included in the assembly jar.
      
      In the most recent commit, module `hive-thriftserver` is moved into its own profile to fix this problem. All previous commits are squashed for clarity.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1620 from liancheng/jdbc-with-maven-fix and squashes the following commits:
      
      629988e [Cheng Lian] Moved hive-thriftserver module definition into its own profile
      ec3c7a7 [Cheng Lian] Cherry picked the Hive Thrift server
      a7a9d144
  29. Jul 27, 2014
    • Patrick Wendell's avatar
      Revert "[SPARK-2410][SQL] Merging Hive Thrift/JDBC server" · e5bbce9a
      Patrick Wendell authored
      This reverts commit f6ff2a61.
      e5bbce9a
    • Cheng Lian's avatar
      [SPARK-2410][SQL] Merging Hive Thrift/JDBC server · f6ff2a61
      Cheng Lian authored
      (This is a replacement of #1399, trying to fix potential `HiveThriftServer2` port collision between parallel builds. Please refer to [these comments](https://github.com/apache/spark/pull/1399#issuecomment-50212572) for details.)
      
      JIRA issue: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410)
      
      Merging the Hive Thrift/JDBC server from [branch-1.0-jdbc](https://github.com/apache/spark/tree/branch-1.0-jdbc).
      
      Thanks chenghao-intel for his initial contribution of the Spark SQL CLI.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1600 from liancheng/jdbc and squashes the following commits:
      
      ac4618b [Cheng Lian] Uses random port for HiveThriftServer2 to avoid collision with parallel builds
      090beea [Cheng Lian] Revert changes related to SPARK-2678, decided to move them to another PR
      21c6cf4 [Cheng Lian] Updated Spark SQL programming guide docs
      fe0af31 [Cheng Lian] Reordered spark-submit options in spark-shell[.cmd]
      199e3fb [Cheng Lian] Disabled MIMA for hive-thriftserver
      1083e9d [Cheng Lian] Fixed failed test suites
      7db82a1 [Cheng Lian] Fixed spark-submit application options handling logic
      9cc0f06 [Cheng Lian] Starts beeline with spark-submit
      cfcf461 [Cheng Lian] Updated documents and build scripts for the newly added hive-thriftserver profile
      061880f [Cheng Lian] Addressed all comments by @pwendell
      7755062 [Cheng Lian] Adapts test suites to spark-submit settings
      40bafef [Cheng Lian] Fixed more license header issues
      e214aab [Cheng Lian] Added missing license headers
      b8905ba [Cheng Lian] Fixed minor issues in spark-sql and start-thriftserver.sh
      f975d22 [Cheng Lian] Updated docs for Hive compatibility and Shark migration guide draft
      3ad4e75 [Cheng Lian] Starts spark-sql shell with spark-submit
      a5310d1 [Cheng Lian] Make HiveThriftServer2 play well with spark-submit
      61f39f4 [Cheng Lian] Starts Hive Thrift server via spark-submit
      2c4c539 [Cheng Lian] Cherry picked the Hive Thrift server
      f6ff2a61
  30. Jul 25, 2014
    • Michael Armbrust's avatar
      Revert "[SPARK-2410][SQL] Merging Hive Thrift/JDBC server" · afd757a2
      Michael Armbrust authored
      This reverts commit 06dc0d2c.
      
      #1399 is making Jenkins fail.  We should investigate and put this back after its passing tests.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #1594 from marmbrus/revertJDBC and squashes the following commits:
      
      59748da [Michael Armbrust] Revert "[SPARK-2410][SQL] Merging Hive Thrift/JDBC server"
      afd757a2
    • Cheng Lian's avatar
      [SPARK-2410][SQL] Merging Hive Thrift/JDBC server · 06dc0d2c
      Cheng Lian authored
      JIRA issue:
      
      - Main: [SPARK-2410](https://issues.apache.org/jira/browse/SPARK-2410)
      - Related: [SPARK-2678](https://issues.apache.org/jira/browse/SPARK-2678)
      
      Cherry picked the Hive Thrift/JDBC server from [branch-1.0-jdbc](https://github.com/apache/spark/tree/branch-1.0-jdbc).
      
      (Thanks chenghao-intel for his initial contribution of the Spark SQL CLI.)
      
      TODO
      
      - [x] Use `spark-submit` to launch the server, the CLI and beeline
      - [x] Migration guideline draft for Shark users
      
      ----
      
      Hit by a bug in `SparkSubmitArguments` while working on this PR: all application options that are recognized by `SparkSubmitArguments` are stolen as `SparkSubmit` options. For example:
      
      ```bash
      $ spark-submit --class org.apache.hive.beeline.BeeLine spark-internal --help
      ```
      
      This actually shows usage information of `SparkSubmit` rather than `BeeLine`.
      
      ~~Fixed this bug here since the `spark-internal` related stuff also touches `SparkSubmitArguments` and I'd like to avoid conflict.~~
      
      **UPDATE** The bug mentioned above is now tracked by [SPARK-2678](https://issues.apache.org/jira/browse/SPARK-2678). Decided to revert changes to this bug since it involves more subtle considerations and worth a separate PR.
      
      Author: Cheng Lian <lian.cs.zju@gmail.com>
      
      Closes #1399 from liancheng/thriftserver and squashes the following commits:
      
      090beea [Cheng Lian] Revert changes related to SPARK-2678, decided to move them to another PR
      21c6cf4 [Cheng Lian] Updated Spark SQL programming guide docs
      fe0af31 [Cheng Lian] Reordered spark-submit options in spark-shell[.cmd]
      199e3fb [Cheng Lian] Disabled MIMA for hive-thriftserver
      1083e9d [Cheng Lian] Fixed failed test suites
      7db82a1 [Cheng Lian] Fixed spark-submit application options handling logic
      9cc0f06 [Cheng Lian] Starts beeline with spark-submit
      cfcf461 [Cheng Lian] Updated documents and build scripts for the newly added hive-thriftserver profile
      061880f [Cheng Lian] Addressed all comments by @pwendell
      7755062 [Cheng Lian] Adapts test suites to spark-submit settings
      40bafef [Cheng Lian] Fixed more license header issues
      e214aab [Cheng Lian] Added missing license headers
      b8905ba [Cheng Lian] Fixed minor issues in spark-sql and start-thriftserver.sh
      f975d22 [Cheng Lian] Updated docs for Hive compatibility and Shark migration guide draft
      3ad4e75 [Cheng Lian] Starts spark-sql shell with spark-submit
      a5310d1 [Cheng Lian] Make HiveThriftServer2 play well with spark-submit
      61f39f4 [Cheng Lian] Starts Hive Thrift server via spark-submit
      2c4c539 [Cheng Lian] Cherry picked the Hive Thrift server
      06dc0d2c
  31. Jun 23, 2014
    • Marcelo Vanzin's avatar
      [SPARK-1768] History server enhancements. · 21ddd7d1
      Marcelo Vanzin authored
      Two improvements to the history server:
      
      - Separate the HTTP handling from history fetching, so that it's easy to add
        new backends later (thinking about SPARK-1537 in the long run)
      
      - Avoid loading all UIs in memory. Do lazy loading instead, keeping a few in
        memory for faster access. This allows the app limit to go away, since holding
        just the listing in memory shouldn't be too expensive unless the user has millions
        of completed apps in the history (at which point I'd expect other issues to arise
        aside from history server memory usage, such as FileSystem.listStatus()
        starting to become ridiculously expensive).
      
      I also fixed a few minor things along the way which aren't really worth mentioning.
      I also removed the app's log path from the UI since that information may not even
      exist depending on which backend is used (even though there is only one now).
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #718 from vanzin/hist-server and squashes the following commits:
      
      53620c9 [Marcelo Vanzin] Add mima exclude, fix scaladoc wording.
      c21f8d8 [Marcelo Vanzin] Feedback: formatting, docs.
      dd8cc4b [Marcelo Vanzin] Standardize on using spark.history.* configuration.
      4da3a52 [Marcelo Vanzin] Remove UI from ApplicationHistoryInfo.
      2a7f68d [Marcelo Vanzin] Address review feedback.
      4e72c77 [Marcelo Vanzin] Remove comment about ordering.
      249bcea [Marcelo Vanzin] Remove offset / count from provider interface.
      ca5d320 [Marcelo Vanzin] Remove code that deals with unfinished apps.
      6e2432f [Marcelo Vanzin] Second round of feedback.
      b2c570a [Marcelo Vanzin] Make class package-private.
      4406f61 [Marcelo Vanzin] Cosmetic change to listing header.
      e852149 [Marcelo Vanzin] Initialize new app array to expected size.
      e8026f4 [Marcelo Vanzin] Review feedback.
      49d2fd3 [Marcelo Vanzin] Fix a comment.
      91e96ca [Marcelo Vanzin] Fix scalastyle issues.
      6fbe0d8 [Marcelo Vanzin] Better handle failures when loading app info.
      eee2f5a [Marcelo Vanzin] Ensure server.stop() is called when shutting down.
      bda2fa1 [Marcelo Vanzin] Rudimentary paging support for the history UI.
      b284478 [Marcelo Vanzin] Separate history server from history backend.
      21ddd7d1
  32. May 08, 2014
    • Bouke van der Bijl's avatar
      Include the sbin/spark-config.sh in spark-executor · 2fd2752e
      Bouke van der Bijl authored
      This is needed because broadcast values are broken on pyspark on Mesos, it tries to import pyspark but can't, as the PYTHONPATH is not set due to changes in ff5be9a4
      
      https://issues.apache.org/jira/browse/SPARK-1725
      
      Author: Bouke van der Bijl <boukevanderbijl@gmail.com>
      
      Closes #651 from bouk/include-spark-config-in-mesos-executor and squashes the following commits:
      
      b2f1295 [Bouke van der Bijl] Inline PYTHONPATH in spark-executor
      eedbbcc [Bouke van der Bijl] Include the sbin/spark-config.sh in spark-executor
      2fd2752e
  33. Apr 30, 2014
    • Sandy Ryza's avatar
      SPARK-1004. PySpark on YARN · ff5be9a4
      Sandy Ryza authored
      This reopens https://github.com/apache/incubator-spark/pull/640 against the new repo
      
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #30 from sryza/sandy-spark-1004 and squashes the following commits:
      
      89889d4 [Sandy Ryza] Move unzipping py4j to the generate-resources phase so that it gets included in the jar the first time
      5165a02 [Sandy Ryza] Fix docs
      fd0df79 [Sandy Ryza] PySpark on YARN
      ff5be9a4
  34. Apr 10, 2014
    • Andrew Or's avatar
      [SPARK-1276] Add a HistoryServer to render persisted UI · 79820fe8
      Andrew Or authored
      The new feature of event logging, introduced in #42, allows the user to persist the details of his/her Spark application to storage, and later replay these events to reconstruct an after-the-fact SparkUI.
      Currently, however, a persisted UI can only be rendered through the standalone Master. This greatly limits the use case of this new feature as many people also run Spark on Yarn / Mesos.
      
      This PR introduces a new entity called the HistoryServer, which, given a log directory, keeps track of all completed applications independently of a Spark Master. Unlike Master, the HistoryServer needs not be running while the application is still running. It is relatively light-weight in that it only maintains static information of applications and performs no scheduling.
      
      To quickly test it out, generate event logs with ```spark.eventLog.enabled=true``` and run ```sbin/start-history-server.sh <log-dir-path>```. Your HistoryServer awaits on port 18080.
      
      Comments and feedback are most welcome.
      
      ---
      
      A few other changes introduced in this PR include refactoring the WebUI interface, which is beginning to have a lot of duplicate code now that we have added more functionality to it. Two new SparkListenerEvents have been introduced (SparkListenerApplicationStart/End) to keep track of application name and start/finish times. This PR also clarifies the semantics of the ReplayListenerBus introduced in #42.
      
      A potential TODO in the future (not part of this PR) is to render live applications in addition to just completed applications. This is useful when applications fail, a condition that our current HistoryServer does not handle unless the user manually signals application completion (by creating the APPLICATION_COMPLETION file). Handling live applications becomes significantly more challenging, however, because it is now necessary to render the same SparkUI multiple times. To avoid reading the entire log every time, which is inefficient, we must handle reading the log from where we previously left off, but this becomes fairly complicated because we must deal with the arbitrary behavior of each input stream.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #204 from andrewor14/master and squashes the following commits:
      
      7b7234c [Andrew Or] Finished -> Completed
      b158d98 [Andrew Or] Address Patrick's comments
      69d1b41 [Andrew Or] Do not block on posting SparkListenerApplicationEnd
      19d5dd0 [Andrew Or] Merge github.com:apache/spark
      f7f5bf0 [Andrew Or] Make history server's web UI port a Spark configuration
      2dfb494 [Andrew Or] Decouple checking for application completion from replaying
      d02dbaa [Andrew Or] Expose Spark version and include it in event logs
      2282300 [Andrew Or] Add documentation for the HistoryServer
      567474a [Andrew Or] Merge github.com:apache/spark
      6edf052 [Andrew Or] Merge github.com:apache/spark
      19e1fb4 [Andrew Or] Address Thomas' comments
      248cb3d [Andrew Or] Limit number of live applications + add configurability
      a3598de [Andrew Or] Do not close file system with ReplayBus + fix bind address
      bc46fc8 [Andrew Or] Merge github.com:apache/spark
      e2f4ff9 [Andrew Or] Merge github.com:apache/spark
      050419e [Andrew Or] Merge github.com:apache/spark
      81b568b [Andrew Or] Fix strange error messages...
      0670743 [Andrew Or] Decouple page rendering from loading files from disk
      1b2f391 [Andrew Or] Minor changes
      a9eae7e [Andrew Or] Merge branch 'master' of github.com:apache/spark
      d5154da [Andrew Or] Styling and comments
      5dbfbb4 [Andrew Or] Merge branch 'master' of github.com:apache/spark
      60bc6d5 [Andrew Or] First complete implementation of HistoryServer (only for finished apps)
      7584418 [Andrew Or] Report application start/end times to HistoryServer
      8aac163 [Andrew Or] Add basic application table
      c086bd5 [Andrew Or] Add HistoryServer and scripts ++ Refactor WebUI interface
      79820fe8
  35. Mar 25, 2014
    • Aaron Davidson's avatar
      SPARK-1286: Make usage of spark-env.sh idempotent · 007a7334
      Aaron Davidson authored
      Various spark scripts load spark-env.sh. This can cause growth of any variables that may be appended to (SPARK_CLASSPATH, SPARK_REPL_OPTS) and it makes the precedence order for options specified in spark-env.sh less clear.
      
      One use-case for the latter is that we want to set options from the command-line of spark-shell, but these options will be overridden by subsequent loading of spark-env.sh. If we were to load the spark-env.sh first and then set our command-line options, we could guarantee correct precedence order.
      
      Note that we use SPARK_CONF_DIR if available to support the sbin/ scripts, which always set this variable from sbin/spark-config.sh. Otherwise, we default to the ../conf/ as usual.
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #184 from aarondav/idem and squashes the following commits:
      
      e291f91 [Aaron Davidson] Use "private" variables in load-spark-env.sh
      8da8360 [Aaron Davidson] Add .sh extension to load-spark-env.sh
      93a2471 [Aaron Davidson] SPARK-1286: Make usage of spark-env.sh idempotent
      007a7334
Loading