- May 25, 2014
-
-
Patrick Wendell authored
Self explanatory. Author: Patrick Wendell <pwendell@gmail.com> Closes #878 from pwendell/java-constructor and squashes the following commits: 2cc1605 [Patrick Wendell] HOTFIX: Add no-arg SparkContext constructor in Java (cherry picked from commit b6d22af0) Signed-off-by:
Aaron Davidson <aaron@databricks.com>
-
Aaron Davidson authored
```scala rdd.aggregate(Sum('val)) ``` is just shorthand for ```scala rdd.groupBy()(Sum('val)) ``` but seems be more natural than doing a groupBy with no grouping expressions when you really just want an aggregation over all rows. Did not add a JavaSchemaRDD or Python API, as these seem to be lacking several other methods like groupBy() already -- leaving that cleanup for future patches. Author: Aaron Davidson <aaron@databricks.com> Closes #874 from aarondav/schemardd and squashes the following commits: e9e68ee [Aaron Davidson] Add comment db6afe2 [Aaron Davidson] Introduce SchemaRDD#aggregate() for simple aggregations (cherry picked from commit c3576ffc) Signed-off-by:
Reynold Xin <rxin@apache.org>
-
Andrew Ash authored
https://issues.apache.org/jira/browse/SPARK-1903 Author: Andrew Ash <andrew@andrewash.com> Closes #856 from ash211/SPARK-1903 and squashes the following commits: 6e7782a [Andrew Ash] Add the technology used on each port 1d9b5d3 [Andrew Ash] Document port for history server 56193ee [Andrew Ash] spark.ui.port becomes worker.ui.port and master.ui.port a774c07 [Andrew Ash] Wording in network section 90e8237 [Andrew Ash] Use real :toc instead of the hand-written one edaa337 [Andrew Ash] Master -> Standalone Cluster Master 57e8869 [Andrew Ash] Port -> Default Port 3d4d289 [Andrew Ash] Title to title case c7d42d9 [Andrew Ash] [WIP] SPARK-1903 Add initial port listing for documentation a416ae9 [Andrew Ash] Word wrap to 100 lines (cherry picked from commit 06595296) Signed-off-by:
Reynold Xin <rxin@apache.org>
-
Reynold Xin authored
Author: Reynold Xin <rxin@apache.org> Closes #871 from rxin/mllib-pep8 and squashes the following commits: 848416f [Reynold Xin] Fixed a typo in the previous cleanup (c -> sc). a8db4cd [Reynold Xin] Fix PEP8 violations in Python mllib. (cherry picked from commit d33d3c61) Signed-off-by:
Reynold Xin <rxin@apache.org>
-
Reynold Xin authored
Mostly related to the following two rules in PEP8 and PEP257: - Line length < 72 chars. - First line should be a concise description of the function/class. Author: Reynold Xin <rxin@apache.org> Closes #869 from rxin/docstring-schemardd and squashes the following commits: 7cf0cbc [Reynold Xin] Updated sql.py for pep8 docstring. 0a4aef9 [Reynold Xin] Merge branch 'master' into docstring-schemardd 6678937 [Reynold Xin] Python docstring update for sql.py. (cherry picked from commit 14f0358b) Signed-off-by:
Reynold Xin <rxin@apache.org>
-
Reynold Xin authored
Author: Reynold Xin <rxin@apache.org> Closes #870 from rxin/examples-python-pep8 and squashes the following commits: 2829e84 [Reynold Xin] Fix PEP8 violations in examples/src/main/python. (cherry picked from commit d79c2b28) Signed-off-by:
Reynold Xin <rxin@apache.org>
-
Tathagata Das authored
-
Tathagata Das authored
-
Reynold Xin authored
(cherry picked from commit fa541f32c5b92e6868a9c99cbb2c87115d624d23) Signed-off-by:
Reynold Xin <rxin@apache.org>
-
Reynold Xin authored
Minor cleanup following #841. Author: Reynold Xin <rxin@apache.org> Closes #868 from rxin/schema-count and squashes the following commits: 5442651 [Reynold Xin] SPARK-1822: Some minor cleanup work on SchemaRDD.count() (cherry picked from commit d66642e3) Signed-off-by:
Reynold Xin <rxin@apache.org>
-
Reynold Xin authored
This sets the max line length to 100 as a PEP8 exception. Author: Reynold Xin <rxin@apache.org> Closes #872 from rxin/pep8 and squashes the following commits: 2f26029 [Reynold Xin] Added PEP8 style configuration file. (cherry picked from commit 5c7faecd) Signed-off-by:
Reynold Xin <rxin@apache.org>
-
Kan Zhang authored
Author: Kan Zhang <kzhang@apache.org> Closes #841 from kanzhang/SPARK-1822 and squashes the following commits: 2f8072a [Kan Zhang] [SPARK-1822] Minor style update cf4baa4 [Kan Zhang] [SPARK-1822] Adding Scaladoc e67c910 [Kan Zhang] [SPARK-1822] SchemaRDD.count() should use optimizer (cherry picked from commit 6052db9d) Signed-off-by:
Reynold Xin <rxin@apache.org>
-
Colin Patrick Mccabe authored
Add an 'exec' at the end of the spark-submit script, to avoid keeping a bash process hanging around while it runs. This makes ps look a little bit nicer. Author: Colin Patrick Mccabe <cmccabe@cloudera.com> Closes #858 from cmccabe/SPARK-1907 and squashes the following commits: 7023b64 [Colin Patrick Mccabe] spark-submit: add exec at the end of the script (cherry picked from commit 6e9fb632) Signed-off-by:
Reynold Xin <rxin@apache.org>
-
- May 24, 2014
-
-
Zhen Peng authored
Author: Zhen Peng <zhenpeng01@baidu.com> Closes #827 from zhpengg/bugfix-executor-id-not-found and squashes the following commits: cd8bb65 [Zhen Peng] bugfix: check executor id existence when executor exit (cherry picked from commit 4e4831b8) Signed-off-by:
Aaron Davidson <aaron@databricks.com>
-
Tathagata Das authored
This reverts commit d8070234.
-
Tathagata Das authored
This reverts commit 67dd53d2.
-
Tathagata Das authored
-
Patrick Wendell authored
This commit requires the user to manually say "yes" when buiding Spark without Java 6. The prompt can be bypassed with a flag (e.g. if the user is scripting around make-distribution). Author: Patrick Wendell <pwendell@gmail.com> Closes #859 from pwendell/java6 and squashes the following commits: 4921133 [Patrick Wendell] Adding Pyspark Notice fee8c9e [Patrick Wendell] SPARK-1911: Emphasize that Spark jars should be built with Java 6. (cherry picked from commit 75a03277) Signed-off-by:
Tathagata Das <tathagata.das1565@gmail.com>
-
Andrew Or authored
If I run the following on a YARN cluster ``` bin/spark-submit sheep.py --master yarn-client ``` it fails because of a mismatch in paths: `spark-submit` thinks that `sheep.py` resides on HDFS, and balks when it can't find the file there. A natural workaround is to add the `file:` prefix to the file: ``` bin/spark-submit file:/path/to/sheep.py --master yarn-client ``` However, this also fails. This time it is because python does not understand URI schemes. This PR fixes this by automatically resolving all paths passed as command line argument to `spark-submit` properly. This has the added benefit of keeping file and jar paths consistent across different cluster modes. For python, we strip the URI scheme before we actually try to run it. Much of the code is originally written by @mengxr. Tested on YARN cluster. More tests pending. Author: Andrew Or <andrewor14@gmail.com> Closes #853 from andrewor14/submit-paths and squashes the following commits: 0bb097a [Andrew Or] Format path correctly before adding it to PYTHONPATH 323b45c [Andrew Or] Include --py-files on PYTHONPATH for pyspark shell 3c36587 [Andrew Or] Improve error messages (minor) 854aa6a [Andrew Or] Guard against NPE if user gives pathological paths 6638a6b [Andrew Or] Fix spark-shell jar paths after #849 went in 3bb0359 [Andrew Or] Update more comments (minor) 2a1f8a0 [Andrew Or] Update comments (minor) 6af2c77 [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-paths a68c4d1 [Andrew Or] Handle Windows python file path correctly 427a250 [Andrew Or] Resolve paths properly for Windows a591a4a [Andrew Or] Update tests for resolving URIs 6c8621c [Andrew Or] Move resolveURIs to Utils db8255e [Andrew Or] Merge branch 'master' of github.com:apache/spark into submit-paths f542dce [Andrew Or] Fix outdated tests 691c4ce [Andrew Or] Ignore special primary resource names 5342ac7 [Andrew Or] Add missing space in error message 02f77f3 [Andrew Or] Resolve command line arguments to spark-submit properly (cherry picked from commit 5081a0a9) Signed-off-by:
Tathagata Das <tathagata.das1565@gmail.com>
-
- May 23, 2014
-
-
baishuo(白硕) authored
the same reason as https://github.com/apache/spark/pull/588 Author: baishuo(白硕) <vc_java@hotmail.com> Closes #815 from baishuo/master and squashes the following commits: 6876c1e [baishuo(白硕)] Update LBFGSSuite.scala (cherry picked from commit a08262d8) Signed-off-by:
Reynold Xin <rxin@apache.org>
-
- May 22, 2014
-
-
Tathagata Das authored
- Added script to automatically generate change list CHANGES.txt - Added test for verifying linking against maven distributions of `spark-sql` and `spark-hive` - Added SBT projects for testing functionality of `spark-sql` and `spark-hive` - Fixed issues in existing tests that might have come up because of changes in Spark 1.0 Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #844 from tdas/update-dev-scripts and squashes the following commits: 25090ba [Tathagata Das] Added missing license e2e20b3 [Tathagata Das] Updated tests for auditing releases. (cherry picked from commit b2bdd0e5) Signed-off-by:
Tathagata Das <tathagata.das1565@gmail.com>
-
Andrew Or authored
The hierarchy for configuring the Spark master in the shell is as follows: ``` MASTER > --master > spark.master (spark-defaults.conf) ``` This is inconsistent with the way we run normal applications, which is: ``` --master > spark.master (spark-defaults.conf) > MASTER ``` I was trying to run a shell locally on a standalone cluster launched through the ec2 scripts, which automatically set `MASTER` in spark-env.sh. It was surprising to me that `--master` didn't take effect, considering that this is the way we tell users to set their masters [here](http://people.apache.org/~pwendell/spark-1.0.0-rc7-docs/scala-programming-guide.html#initializing-spark ). Author: Andrew Or <andrewor14@gmail.com> Closes #846 from andrewor14/shell-master and squashes the following commits: 2cb81c9 [Andrew Or] Respect spark.master before MASTER in REPL (cherry picked from commit cce77457) Signed-off-by:
Tathagata Das <tathagata.das1565@gmail.com>
-
Andrew Or authored
Spark shell currently overwrites `spark.jars` with `ADD_JARS`. In all modes except yarn-cluster, this means the `--jar` flag passed to `bin/spark-shell` is also discarded. However, in the [docs](http://people.apache.org/~pwendell/spark-1.0.0-rc7-docs/scala-programming-guide.html#initializing-spark ), we explicitly tell the users to add the jars this way. Author: Andrew Or <andrewor14@gmail.com> Closes #849 from andrewor14/shell-jars and squashes the following commits: 928a7e6 [Andrew Or] ',' -> "," (minor) afc357c [Andrew Or] Handle spark.jars == "" in SparkILoop, not SparkSubmit c6da113 [Andrew Or] Do not set spark.jars to "" d8549f7 [Andrew Or] Respect spark.jars and --jars in spark-shell (cherry picked from commit 8edbee7d) Signed-off-by:
Tathagata Das <tathagata.das1565@gmail.com>
-
Aaron Davidson authored
Due to perhaps zombie processes on Jenkins, it seems that at least 10 Spark ports are in use. It also doesn't matter that the port increases when used, it could in fact go down -- the only part that matters is that it selects a different port rather than failing to bind. Changed test to match this. Thanks to @andrewor14 for helping diagnose this. Author: Aaron Davidson <aaron@databricks.com> Closes #857 from aarondav/tiny and squashes the following commits: c199ec8 [Aaron Davidson] Fix UISuite unit test that fails under Jenkins contention (cherry picked from commit f9f5fd5f) Signed-off-by:
Reynold Xin <rxin@apache.org>
-
Xiangrui Meng authored
Sent secondary jars to distributed cache of all containers and add the cached jars to classpath before executors start. Tested on a YARN cluster (CDH-5.0). `spark-submit --jars` also works in standalone server and `yarn-client`. Thanks for @andrewor14 for testing! I removed "Doesn't work for drivers in standalone mode with "cluster" deploy mode." from `spark-submit`'s help message, though we haven't tested mesos yet. CC: @dbtsai @sryza Author: Xiangrui Meng <meng@databricks.com> Closes #848 from mengxr/yarn-classpath and squashes the following commits: 23e7df4 [Xiangrui Meng] rename spark.jar to __spark__.jar and app.jar to __app__.jar to avoid confliction apped $CWD/ and $CWD/* to the classpath remove unused methods a40f6ed [Xiangrui Meng] standalone -> cluster 65e04ad [Xiangrui Meng] update spark-submit help message and add a comment for yarn-client 11e5354 [Xiangrui Meng] minor changes 3e7e1c4 [Xiangrui Meng] use sparkConf instead of hadoop conf dc3c825 [Xiangrui Meng] add secondary jars to classpath in yarn (cherry picked from commit dba31402) Signed-off-by:
Tathagata Das <tathagata.das1565@gmail.com>
-
- May 21, 2014
-
-
Reynold Xin authored
1. Add < code > to configuration options 2. List env variables in tabular format to be consistent with other pages. 3. Moved Viewing Spark Properties section up. This is against branch-1.0, but should be cherry picked into master as well. Author: Reynold Xin <rxin@apache.org> Closes #851 from rxin/doc-config and squashes the following commits: 28ac0d3 [Reynold Xin] Add <code> to configuration options, and list env variables in a table.
-
Takuya UESHIN authored
...ys. When tables are equi-joined by multiple-keys `HashJoin` should be used, but `CartesianProduct` and then `Filter` are used. The join keys are paired by `And` expression so we need to apply `splitConjunctivePredicates` to join condition while finding join keys. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #836 from ueshin/issues/SPARK-1889 and squashes the following commits: fe1c387 [Takuya UESHIN] Apply splitConjunctivePredicates to join condition while finding join keys. (cherry picked from commit bb88875a) Signed-off-by:
Reynold Xin <rxin@apache.org>
-
Kan Zhang authored
Author: Kan Zhang <kzhang@apache.org> Closes #697 from kanzhang/SPARK-1519 and squashes the following commits: 4f8d1ed [Kan Zhang] [SPARK-1519] Support minPartitions param of wholeTextFiles() in PySpark (cherry picked from commit f18fd05b) Signed-off-by:
Reynold Xin <rxin@apache.org>
-
Andrew Or authored
Author: Andrew Or <andrewor14@gmail.com> Closes #847 from andrewor14/yarn-typo and squashes the following commits: c1906af [Andrew Or] Stoped -> Stopped (cherry picked from commit ba5d4a99) Signed-off-by:
Reynold Xin <rxin@apache.org>
-
Andrew Or authored
It was in the wrong package Author: Andrew Or <andrewor14@gmail.com> Closes #839 from andrewor14/jdbc-suite and squashes the following commits: f948c5a [Andrew Or] cache -> cache() b215279 [Andrew Or] Move JdbcRDDSuite to the correct package (cherry picked from commit 7c79ef7d) Signed-off-by:
Reynold Xin <rxin@apache.org>
-
Andrew Or authored
The example code on the configuration page currently does not compile. Author: Andrew Or <andrewor14@gmail.com> Closes #842 from andrewor14/conf-docs and squashes the following commits: aabff57 [Andrew Or] Correct example of creating a new SparkConf (cherry picked from commit 1014668f) Signed-off-by:
Reynold Xin <rxin@apache.org>
-
Sumedh Mungee authored
Fixed a couple of misleading comments in bin/pyspark and bin/spark-class. The comments make it seem like the script is looking for the Scala installation when in fact it is looking for Spark. Author: Sumedh Mungee <smungee@gmail.com> Closes #843 from smungee/spark-1250-fix-comments and squashes the following commits: 26870f3 [Sumedh Mungee] [SPARK-1250] Fixed misleading comments in bin/pyspark and bin/spark-class (cherry picked from commit 6e337380) Signed-off-by:
Reynold Xin <rxin@apache.org>
-
- May 20, 2014
-
-
Tathagata Das authored
-
Tathagata Das authored
-
Tathagata Das authored
`lateral_view_outer` query sometimes returns a different set of 10 rows. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #838 from tdas/hive-test-fix2 and squashes the following commits: 9128a0d [Tathagata Das] Blacklisted flaky HiveCompatibility test. (cherry picked from commit 7f0cfe47) Signed-off-by:
Tathagata Das <tathagata.das1565@gmail.com>
-
Tathagata Das authored
This reverts commit 920f947e.
-
Tathagata Das authored
This reverts commit f8e61195.
-
Tathagata Das authored
-
Tathagata Das authored
Updated version of #821 Author: Tathagata Das <tathagata.das1565@gmail.com> Author: Ghidireac <bogdang@u448a5b0a73d45358d94a.ant.amazon.com> Closes #835 from tdas/SPARK-1877 and squashes the following commits: f346f71 [Tathagata Das] Addressed Patrick's comments. fee0c5d [Ghidireac] SPARK-1877: ClassNotFoundException when loading RDD with serialized objects (cherry picked from commit 52eb54d0) Signed-off-by:
Tathagata Das <tathagata.das1565@gmail.com>
-
- May 19, 2014
-
-
Xiangrui Meng authored
1. Added synthetic datasets for `MovieLensALS`, `LinearRegression`, `BinaryClassification`. 2. Embedded instructions in the help message of those example apps. Per discussion with Matei on the JIRA page, new example data is under `data/mllib`. Author: Xiangrui Meng <meng@databricks.com> Closes #833 from mengxr/mllib-sample-data and squashes the following commits: 59f0a18 [Xiangrui Meng] add sample binary classification data 3c2f92f [Xiangrui Meng] add linear regression data 050f1ca [Xiangrui Meng] add a sample dataset for MovieLensALS example (cherry picked from commit bcb9dce6) Signed-off-by:
Tathagata Das <tathagata.das1565@gmail.com>
-