Skip to content
Snippets Groups Projects
  1. May 08, 2014
    • Bouke van der Bijl's avatar
      Include the sbin/spark-config.sh in spark-executor · 2fd2752e
      Bouke van der Bijl authored
      This is needed because broadcast values are broken on pyspark on Mesos, it tries to import pyspark but can't, as the PYTHONPATH is not set due to changes in ff5be9a4
      
      https://issues.apache.org/jira/browse/SPARK-1725
      
      Author: Bouke van der Bijl <boukevanderbijl@gmail.com>
      
      Closes #651 from bouk/include-spark-config-in-mesos-executor and squashes the following commits:
      
      b2f1295 [Bouke van der Bijl] Inline PYTHONPATH in spark-executor
      eedbbcc [Bouke van der Bijl] Include the sbin/spark-config.sh in spark-executor
      2fd2752e
  2. Apr 30, 2014
    • Sandy Ryza's avatar
      SPARK-1004. PySpark on YARN · ff5be9a4
      Sandy Ryza authored
      This reopens https://github.com/apache/incubator-spark/pull/640 against the new repo
      
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #30 from sryza/sandy-spark-1004 and squashes the following commits:
      
      89889d4 [Sandy Ryza] Move unzipping py4j to the generate-resources phase so that it gets included in the jar the first time
      5165a02 [Sandy Ryza] Fix docs
      fd0df79 [Sandy Ryza] PySpark on YARN
      ff5be9a4
  3. Apr 10, 2014
    • Andrew Or's avatar
      [SPARK-1276] Add a HistoryServer to render persisted UI · 79820fe8
      Andrew Or authored
      The new feature of event logging, introduced in #42, allows the user to persist the details of his/her Spark application to storage, and later replay these events to reconstruct an after-the-fact SparkUI.
      Currently, however, a persisted UI can only be rendered through the standalone Master. This greatly limits the use case of this new feature as many people also run Spark on Yarn / Mesos.
      
      This PR introduces a new entity called the HistoryServer, which, given a log directory, keeps track of all completed applications independently of a Spark Master. Unlike Master, the HistoryServer needs not be running while the application is still running. It is relatively light-weight in that it only maintains static information of applications and performs no scheduling.
      
      To quickly test it out, generate event logs with ```spark.eventLog.enabled=true``` and run ```sbin/start-history-server.sh <log-dir-path>```. Your HistoryServer awaits on port 18080.
      
      Comments and feedback are most welcome.
      
      ---
      
      A few other changes introduced in this PR include refactoring the WebUI interface, which is beginning to have a lot of duplicate code now that we have added more functionality to it. Two new SparkListenerEvents have been introduced (SparkListenerApplicationStart/End) to keep track of application name and start/finish times. This PR also clarifies the semantics of the ReplayListenerBus introduced in #42.
      
      A potential TODO in the future (not part of this PR) is to render live applications in addition to just completed applications. This is useful when applications fail, a condition that our current HistoryServer does not handle unless the user manually signals application completion (by creating the APPLICATION_COMPLETION file). Handling live applications becomes significantly more challenging, however, because it is now necessary to render the same SparkUI multiple times. To avoid reading the entire log every time, which is inefficient, we must handle reading the log from where we previously left off, but this becomes fairly complicated because we must deal with the arbitrary behavior of each input stream.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #204 from andrewor14/master and squashes the following commits:
      
      7b7234c [Andrew Or] Finished -> Completed
      b158d98 [Andrew Or] Address Patrick's comments
      69d1b41 [Andrew Or] Do not block on posting SparkListenerApplicationEnd
      19d5dd0 [Andrew Or] Merge github.com:apache/spark
      f7f5bf0 [Andrew Or] Make history server's web UI port a Spark configuration
      2dfb494 [Andrew Or] Decouple checking for application completion from replaying
      d02dbaa [Andrew Or] Expose Spark version and include it in event logs
      2282300 [Andrew Or] Add documentation for the HistoryServer
      567474a [Andrew Or] Merge github.com:apache/spark
      6edf052 [Andrew Or] Merge github.com:apache/spark
      19e1fb4 [Andrew Or] Address Thomas' comments
      248cb3d [Andrew Or] Limit number of live applications + add configurability
      a3598de [Andrew Or] Do not close file system with ReplayBus + fix bind address
      bc46fc8 [Andrew Or] Merge github.com:apache/spark
      e2f4ff9 [Andrew Or] Merge github.com:apache/spark
      050419e [Andrew Or] Merge github.com:apache/spark
      81b568b [Andrew Or] Fix strange error messages...
      0670743 [Andrew Or] Decouple page rendering from loading files from disk
      1b2f391 [Andrew Or] Minor changes
      a9eae7e [Andrew Or] Merge branch 'master' of github.com:apache/spark
      d5154da [Andrew Or] Styling and comments
      5dbfbb4 [Andrew Or] Merge branch 'master' of github.com:apache/spark
      60bc6d5 [Andrew Or] First complete implementation of HistoryServer (only for finished apps)
      7584418 [Andrew Or] Report application start/end times to HistoryServer
      8aac163 [Andrew Or] Add basic application table
      c086bd5 [Andrew Or] Add HistoryServer and scripts ++ Refactor WebUI interface
      79820fe8
  4. Mar 25, 2014
    • Aaron Davidson's avatar
      SPARK-1286: Make usage of spark-env.sh idempotent · 007a7334
      Aaron Davidson authored
      Various spark scripts load spark-env.sh. This can cause growth of any variables that may be appended to (SPARK_CLASSPATH, SPARK_REPL_OPTS) and it makes the precedence order for options specified in spark-env.sh less clear.
      
      One use-case for the latter is that we want to set options from the command-line of spark-shell, but these options will be overridden by subsequent loading of spark-env.sh. If we were to load the spark-env.sh first and then set our command-line options, we could guarantee correct precedence order.
      
      Note that we use SPARK_CONF_DIR if available to support the sbin/ scripts, which always set this variable from sbin/spark-config.sh. Otherwise, we default to the ../conf/ as usual.
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #184 from aarondav/idem and squashes the following commits:
      
      e291f91 [Aaron Davidson] Use "private" variables in load-spark-env.sh
      8da8360 [Aaron Davidson] Add .sh extension to load-spark-env.sh
      93a2471 [Aaron Davidson] SPARK-1286: Make usage of spark-env.sh idempotent
      007a7334
  5. Mar 19, 2014
    • Nick Lanham's avatar
      Bundle tachyon: SPARK-1269 · a18ea00f
      Nick Lanham authored
      This should all work as expected with the current version of the tachyon tarball (0.4.1)
      
      Author: Nick Lanham <nick@afternight.org>
      
      Closes #137 from nicklan/bundle-tachyon and squashes the following commits:
      
      2eee15b [Nick Lanham] Put back in exec, start tachyon first
      738ba23 [Nick Lanham] Move tachyon out of sbin
      f2f9bc6 [Nick Lanham] More checks for tachyon script
      111e8e1 [Nick Lanham] Only try tachyon operations if tachyon script exists
      0561574 [Nick Lanham] Copy over web resources so web interface can run
      4dc9809 [Nick Lanham] Update to tachyon 0.4.1
      0a1a20c [Nick Lanham] Add scripts using tachyon tarball
      a18ea00f
  6. Feb 22, 2014
    • CodingCat's avatar
      [SPARK-1041] remove dead code in start script, remind user to set that in spark-env.sh · 437b62fc
      CodingCat authored
      the lines in start-master.sh and start-slave.sh no longer work
      
      in ec2, the host name has changed, e.g.
      
      ubuntu@ip-172-31-36-93:~$ hostname
      ip-172-31-36-93
      
      also, the URL to fetch public DNS name also changed, e.g.
      
      ubuntu@ip-172-31-36-93:~$ wget -q -O - http://instance-data.ec2.internal/latest/meta-data/public-hostname
      ubuntu@ip-172-31-36-93:~$  (returns nothing)
      
      since we have spark-ec2 project, we don't need to have such ec2-specific lines here, instead, user only need to set in spark-env.sh
      
      Author: CodingCat <zhunansjtu@gmail.com>
      
      Closes #588 from CodingCat/deadcode_in_sbin and squashes the following commits:
      
      e4236e0 [CodingCat] remove dead code in start script, remind user set that in spark-env.sh
      437b62fc
  7. Jan 06, 2014
    • sproblvem's avatar
      Update stop-slaves.sh · dea4ba9d
      sproblvem authored
      The most recently version has changed the directory structure, but this script "sbin/stop-all.sh" doesn't change with it accordingly. This mistake makes "sbin/stop-all.sh" can't stop the slave node.
      dea4ba9d
  8. Jan 03, 2014
  9. Oct 12, 2013
  10. Sep 29, 2013
  11. Sep 23, 2013
  12. Sep 22, 2013
Loading