Skip to content
Snippets Groups Projects
  1. May 25, 2014
  2. May 24, 2014
    • Zhen Peng's avatar
      [SPARK-1886] check executor id existence when executor exit · b5e96869
      Zhen Peng authored
      
      Author: Zhen Peng <zhenpeng01@baidu.com>
      
      Closes #827 from zhpengg/bugfix-executor-id-not-found and squashes the following commits:
      
      cd8bb65 [Zhen Peng] bugfix: check executor id existence when executor exit
      
      (cherry picked from commit 4e4831b8)
      Signed-off-by: default avatarAaron Davidson <aaron@databricks.com>
      b5e96869
    • Tathagata Das's avatar
      Revert "[maven-release-plugin] prepare release v1.0.0-rc10" · 9ff42249
      Tathagata Das authored
      This reverts commit d8070234.
      9ff42249
    • Tathagata Das's avatar
      f856b8ca
    • Tathagata Das's avatar
      Updated CHANGES.txt · 84060927
      Tathagata Das authored
      84060927
    • Patrick Wendell's avatar
      SPARK-1911: Emphasize that Spark jars should be built with Java 6. · 217bd562
      Patrick Wendell authored
      
      This commit requires the user to manually say "yes" when buiding Spark
      without Java 6. The prompt can be bypassed with a flag (e.g. if the user
      is scripting around make-distribution).
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #859 from pwendell/java6 and squashes the following commits:
      
      4921133 [Patrick Wendell] Adding Pyspark Notice
      fee8c9e [Patrick Wendell] SPARK-1911: Emphasize that Spark jars should be built with Java 6.
      
      (cherry picked from commit 75a03277)
      Signed-off-by: default avatarTathagata Das <tathagata.das1565@gmail.com>
      217bd562
    • Andrew Or's avatar
      [SPARK-1900 / 1918] PySpark on YARN is broken · 12f5ecc8
      Andrew Or authored
      
      If I run the following on a YARN cluster
      ```
      bin/spark-submit sheep.py --master yarn-client
      ```
      it fails because of a mismatch in paths: `spark-submit` thinks that `sheep.py` resides on HDFS, and balks when it can't find the file there. A natural workaround is to add the `file:` prefix to the file:
      ```
      bin/spark-submit file:/path/to/sheep.py --master yarn-client
      ```
      However, this also fails. This time it is because python does not understand URI schemes.
      
      This PR fixes this by automatically resolving all paths passed as command line argument to `spark-submit` properly. This has the added benefit of keeping file and jar paths consistent across different cluster modes. For python, we strip the URI scheme before we actually try to run it.
      
      Much of the code is originally written by @mengxr. Tested on YARN cluster. More tests pending.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #853 from andrewor14/submit-paths and squashes the following commits:
      
      0bb097a [Andrew Or] Format path correctly before adding it to PYTHONPATH
      323b45c [Andrew Or] Include --py-files on PYTHONPATH for pyspark shell
      3c36587 [Andrew Or] Improve error messages (minor)
      854aa6a [Andrew Or] Guard against NPE if user gives pathological paths
      6638a6b [Andrew Or] Fix spark-shell jar paths after #849 went in
      3bb0359 [Andrew Or] Update more comments (minor)
      2a1f8a0 [Andrew Or] Update comments (minor)
      6af2c77 [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-paths
      a68c4d1 [Andrew Or] Handle Windows python file path correctly
      427a250 [Andrew Or] Resolve paths properly for Windows
      a591a4a [Andrew Or] Update tests for resolving URIs
      6c8621c [Andrew Or] Move resolveURIs to Utils
      db8255e [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-paths
      f542dce [Andrew Or] Fix outdated tests
      691c4ce [Andrew Or] Ignore special primary resource names
      5342ac7 [Andrew Or] Add missing space in error message
      02f77f3 [Andrew Or] Resolve command line arguments to spark-submit properly
      
      (cherry picked from commit 5081a0a9)
      Signed-off-by: default avatarTathagata Das <tathagata.das1565@gmail.com>
      12f5ecc8
  3. May 23, 2014
  4. May 22, 2014
    • Tathagata Das's avatar
      Updated scripts for auditing releases · 6541ca24
      Tathagata Das authored
      
      - Added script to automatically generate change list CHANGES.txt
      - Added test for verifying linking against maven distributions of `spark-sql` and `spark-hive`
      - Added SBT projects for testing functionality of `spark-sql` and `spark-hive`
      - Fixed issues in existing tests that might have come up because of changes in Spark 1.0
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #844 from tdas/update-dev-scripts and squashes the following commits:
      
      25090ba [Tathagata Das] Added missing license
      e2e20b3 [Tathagata Das] Updated tests for auditing releases.
      
      (cherry picked from commit b2bdd0e5)
      Signed-off-by: default avatarTathagata Das <tathagata.das1565@gmail.com>
      6541ca24
    • Andrew Or's avatar
      [SPARK-1896] Respect spark.master (and --master) before MASTER in spark-shell · c3b40651
      Andrew Or authored
      The hierarchy for configuring the Spark master in the shell is as follows:
      ```
      MASTER > --master > spark.master (spark-defaults.conf)
      ```
      This is inconsistent with the way we run normal applications, which is:
      ```
      --master > spark.master (spark-defaults.conf) > MASTER
      ```
      
      I was trying to run a shell locally on a standalone cluster launched through the ec2 scripts, which automatically set `MASTER` in spark-env.sh. It was surprising to me that `--master` didn't take effect, considering that this is the way we tell users to set their masters [here](http://people.apache.org/~pwendell/spark-1.0.0-rc7-docs/scala-programming-guide.html#initializing-spark
      
      ).
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #846 from andrewor14/shell-master and squashes the following commits:
      
      2cb81c9 [Andrew Or] Respect spark.master before MASTER in REPL
      
      (cherry picked from commit cce77457)
      Signed-off-by: default avatarTathagata Das <tathagata.das1565@gmail.com>
      c3b40651
    • Andrew Or's avatar
      [SPARK-1897] Respect spark.jars (and --jars) in spark-shell · 23cc40e3
      Andrew Or authored
      Spark shell currently overwrites `spark.jars` with `ADD_JARS`. In all modes except yarn-cluster, this means the `--jar` flag passed to `bin/spark-shell` is also discarded. However, in the [docs](http://people.apache.org/~pwendell/spark-1.0.0-rc7-docs/scala-programming-guide.html#initializing-spark
      
      ), we explicitly tell the users to add the jars this way.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #849 from andrewor14/shell-jars and squashes the following commits:
      
      928a7e6 [Andrew Or] ',' -> "," (minor)
      afc357c [Andrew Or] Handle spark.jars == "" in SparkILoop, not SparkSubmit
      c6da113 [Andrew Or] Do not set spark.jars to ""
      d8549f7 [Andrew Or] Respect spark.jars and --jars in spark-shell
      
      (cherry picked from commit 8edbee7d)
      Signed-off-by: default avatarTathagata Das <tathagata.das1565@gmail.com>
      23cc40e3
    • Aaron Davidson's avatar
      Fix UISuite unit test that fails under Jenkins contention · a5662162
      Aaron Davidson authored
      
      Due to perhaps zombie processes on Jenkins, it seems that at least 10
      Spark ports are in use. It also doesn't matter that the port increases
      when used, it could in fact go down -- the only part that matters is
      that it selects a different port rather than failing to bind.
      Changed test to match this.
      
      Thanks to @andrewor14 for helping diagnose this.
      
      Author: Aaron Davidson <aaron@databricks.com>
      
      Closes #857 from aarondav/tiny and squashes the following commits:
      
      c199ec8 [Aaron Davidson] Fix UISuite unit test that fails under Jenkins contention
      
      (cherry picked from commit f9f5fd5f)
      Signed-off-by: default avatarReynold Xin <rxin@apache.org>
      a5662162
    • Xiangrui Meng's avatar
      [SPARK-1870] Make spark-submit --jars work in yarn-cluster mode. · 79cd26c5
      Xiangrui Meng authored
      
      Sent secondary jars to distributed cache of all containers and add the cached jars to classpath before executors start. Tested on a YARN cluster (CDH-5.0).
      
      `spark-submit --jars` also works in standalone server and `yarn-client`. Thanks for @andrewor14 for testing!
      
      I removed "Doesn't work for drivers in standalone mode with "cluster" deploy mode." from `spark-submit`'s help message, though we haven't tested mesos yet.
      
      CC: @dbtsai @sryza
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #848 from mengxr/yarn-classpath and squashes the following commits:
      
      23e7df4 [Xiangrui Meng] rename spark.jar to __spark__.jar and app.jar to __app__.jar to avoid confliction apped $CWD/ and $CWD/* to the classpath remove unused methods
      a40f6ed [Xiangrui Meng] standalone -> cluster
      65e04ad [Xiangrui Meng] update spark-submit help message and add a comment for yarn-client
      11e5354 [Xiangrui Meng] minor changes
      3e7e1c4 [Xiangrui Meng] use sparkConf instead of hadoop conf
      dc3c825 [Xiangrui Meng] add secondary jars to classpath in yarn
      
      (cherry picked from commit dba31402)
      Signed-off-by: default avatarTathagata Das <tathagata.das1565@gmail.com>
      79cd26c5
  5. May 21, 2014
    • Reynold Xin's avatar
      Configuration documentation updates · 75af8bd3
      Reynold Xin authored
      1. Add < code > to configuration options
      2. List env variables in tabular format to be consistent with other pages.
      3. Moved Viewing Spark Properties section up.
      
      This is against branch-1.0, but should be cherry picked into master as well.
      
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #851 from rxin/doc-config and squashes the following commits:
      
      28ac0d3 [Reynold Xin] Add <code> to configuration options, and list env variables in a table.
      75af8bd3
    • Takuya UESHIN's avatar
      [SPARK-1889] [SQL] Apply splitConjunctivePredicates to join condition while finding join ke... · 6e7934ed
      Takuya UESHIN authored
      
      ...ys.
      
      When tables are equi-joined by multiple-keys `HashJoin` should be used, but `CartesianProduct` and then `Filter` are used.
      The join keys are paired by `And` expression so we need to apply `splitConjunctivePredicates` to join condition while finding join keys.
      
      Author: Takuya UESHIN <ueshin@happy-camper.st>
      
      Closes #836 from ueshin/issues/SPARK-1889 and squashes the following commits:
      
      fe1c387 [Takuya UESHIN] Apply splitConjunctivePredicates to join condition while finding join keys.
      
      (cherry picked from commit bb88875a)
      Signed-off-by: default avatarReynold Xin <rxin@apache.org>
      6e7934ed
    • Kan Zhang's avatar
      [SPARK-1519] Support minPartitions param of wholeTextFiles() in PySpark · 30d1df5e
      Kan Zhang authored
      
      Author: Kan Zhang <kzhang@apache.org>
      
      Closes #697 from kanzhang/SPARK-1519 and squashes the following commits:
      
      4f8d1ed [Kan Zhang] [SPARK-1519] Support minPartitions param of wholeTextFiles() in PySpark
      
      (cherry picked from commit f18fd05b)
      Signed-off-by: default avatarReynold Xin <rxin@apache.org>
      30d1df5e
    • Andrew Or's avatar
      [Typo] Stoped -> Stopped · 9b8f7725
      Andrew Or authored
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #847 from andrewor14/yarn-typo and squashes the following commits:
      
      c1906af [Andrew Or] Stoped -> Stopped
      
      (cherry picked from commit ba5d4a99)
      Signed-off-by: default avatarReynold Xin <rxin@apache.org>
      9b8f7725
    • Andrew Or's avatar
      [Minor] Move JdbcRDDSuite to the correct package · bc6bbfa6
      Andrew Or authored
      
      It was in the wrong package
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #839 from andrewor14/jdbc-suite and squashes the following commits:
      
      f948c5a [Andrew Or] cache -> cache()
      b215279 [Andrew Or] Move JdbcRDDSuite to the correct package
      
      (cherry picked from commit 7c79ef7d)
      Signed-off-by: default avatarReynold Xin <rxin@apache.org>
      bc6bbfa6
    • Andrew Or's avatar
      [Docs] Correct example of creating a new SparkConf · 7295dd94
      Andrew Or authored
      
      The example code on the configuration page currently does not compile.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #842 from andrewor14/conf-docs and squashes the following commits:
      
      aabff57 [Andrew Or] Correct example of creating a new SparkConf
      
      (cherry picked from commit 1014668f)
      Signed-off-by: default avatarReynold Xin <rxin@apache.org>
      7295dd94
    • Sumedh Mungee's avatar
      [SPARK-1250] Fixed misleading comments in bin/pyspark, bin/spark-class · 364c14af
      Sumedh Mungee authored
      
      Fixed a couple of misleading comments in bin/pyspark and bin/spark-class. The comments make it seem like the script is looking for the Scala installation when in fact it is looking for Spark.
      
      Author: Sumedh Mungee <smungee@gmail.com>
      
      Closes #843 from smungee/spark-1250-fix-comments and squashes the following commits:
      
      26870f3 [Sumedh Mungee] [SPARK-1250] Fixed misleading comments in bin/pyspark and bin/spark-class
      
      (cherry picked from commit 6e337380)
      Signed-off-by: default avatarReynold Xin <rxin@apache.org>
      364c14af
  6. May 20, 2014
  7. May 19, 2014
    • Xiangrui Meng's avatar
      [SPARK-1874][MLLIB] Clean up MLlib sample data · 1c6c8b5b
      Xiangrui Meng authored
      
      1. Added synthetic datasets for `MovieLensALS`, `LinearRegression`, `BinaryClassification`.
      2. Embedded instructions in the help message of those example apps.
      
      Per discussion with Matei on the JIRA page, new example data is under `data/mllib`.
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #833 from mengxr/mllib-sample-data and squashes the following commits:
      
      59f0a18 [Xiangrui Meng] add sample binary classification data
      3c2f92f [Xiangrui Meng] add linear regression data
      050f1ca [Xiangrui Meng] add a sample dataset for MovieLensALS example
      
      (cherry picked from commit bcb9dce6)
      Signed-off-by: default avatarTathagata Das <tathagata.das1565@gmail.com>
      1c6c8b5b
Loading