Commits · 39f85e0322cfecefbc30e7d5a30356cfab1e9640 · cs525-sp18-g07 / spark

Apr 23, 2014

[SQL] SPARK-1571 Mistake in java example code · 39f85e03

Michael Armbrust authored 10 years ago

Author: Michael Armbrust <michael@databricks.com>

Closes #496 from marmbrus/javaBeanBug and squashes the following commits:

644fedd [Michael Armbrust] Bean methods must be public.

39f85e03

SPARK-1494 Don't initialize classes loaded by MIMA excludes. · 8e950813

Michael Armbrust authored 10 years ago

[WIP]  Just seeing how Jenkins likes this...

Author: Michael Armbrust <michael@databricks.com>

Closes #494 from marmbrus/mima and squashes the following commits:

6eec616 [Michael Armbrust] Force hive tests to run.
acaf682 [Michael Armbrust] Don't initialize loaded classes.

8e950813

Apr 22, 2014

SPARK-1562 Fix visibility / annotation of Spark SQL APIs · aa77f8a6

Michael Armbrust authored 10 years ago

Author: Michael Armbrust <michael@databricks.com>

Closes #489 from marmbrus/sqlDocFixes and squashes the following commits:

acee4f3 [Michael Armbrust] Fix visibility / annotation of Spark SQL APIs

aa77f8a6

[FIX: SPARK-1376] use --arg instead of --args in SparkSubmit to avoid warning messages · 662c860e

Xiangrui Meng authored 10 years ago

Even if users use `--arg`, `SparkSubmit` still uses `--args` for child args internally, which triggers a warning message that may confuse users:

~~~
--args is deprecated. Use --arg instead.
~~~

@sryza Does it look good to you?

Author: Xiangrui Meng <meng@databricks.com>

Closes #485 from mengxr/submit-arg and squashes the following commits:

5e1b9fe [Xiangrui Meng] update test
cebbeb7 [Xiangrui Meng] use --arg instead of --args in SparkSubmit to avoid warning messages

662c860e

[streaming][SPARK-1578] Removed requirement for TTL in StreamingContext. · f3d19a9f

Tathagata Das authored 10 years ago

Since shuffles and RDDs that are out of context are automatically cleaned by Spark core (using ContextCleaner) there is no need for setting the cleaner TTL while creating a StreamingContext.

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #491 from tdas/ttl-fix and squashes the following commits:

cf01dc7 [Tathagata Das] Removed requirement for TTL in StreamingContext.

f3d19a9f

[Spark-1538] Fix SparkUI incorrectly hiding persisted RDDs · 2de57387

Andrew Or authored 10 years ago

**Bug**: After the following command `sc.parallelize(1 to 1000).persist.map(_ + 1).count()` is run, the the persisted RDD is missing from the storage tab of the SparkUI.

**Cause**: The command creates two RDDs in one stage, a `ParallelCollectionRDD` and a `MappedRDD`. However, the existing StageInfo only keeps the RDDInfo of the last RDD associated with the stage (`MappedRDD`), and so all RDD information regarding the first RDD (`ParallelCollectionRDD`) is discarded. In this case, we persist the first RDD,  but the StorageTab doesn't know about this RDD because it is not encoded in the StageInfo.

**Fix**: Record information of all RDDs in StageInfo, instead of just the last RDD (i.e. `stage.rdd`). Since stage boundaries are marked by shuffle dependencies, the solution is to traverse the last RDD's dependency tree, visiting only ancestor RDDs related through a sequence of narrow dependencies.

---

This PR also moves RDDInfo to its own file, includes a few style fixes, and adds a unit test for constructing StageInfos.

Author: Andrew Or <andrewor14@gmail.com>

Closes #469 from andrewor14/storage-ui-fix and squashes the following commits:

07fc7f0 [Andrew Or] Add back comment that was accidentally removed (minor)
5d799fe [Andrew Or] Add comment to justify testing of getNarrowAncestors with cycles
9d0e2b8 [Andrew Or] Hide details of getNarrowAncestors from outsiders
d2bac8a [Andrew Or] Deal with cycles in RDD dependency graph + add extensive tests
2acb177 [Andrew Or] Move getNarrowAncestors to RDD.scala
bfe83f0 [Andrew Or] Backtrace RDD dependency tree to find all RDDs that belong to a Stage

2de57387

Assorted clean-up for Spark-on-YARN. · 995fdc96

Patrick Wendell authored 10 years ago

In particular when the HADOOP_CONF_DIR is not not specified.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #488 from pwendell/hadoop-cleanup and squashes the following commits:

fe95f13 [Patrick Wendell] Changes based on Andrew's feeback
18d09c1 [Patrick Wendell] Review comments from Andrew
17929cc [Patrick Wendell] Assorted clean-up for Spark-on-YARN.

995fdc96

[SPARK-1570] Fix classloading in JavaSQLContext.applySchema · ea8cea82

Kan Zhang authored 10 years ago

I think I hit a class loading issue when running JavaSparkSQL example using spark-submit in local mode.

Author: Kan Zhang <kzhang@apache.org>

Closes #484 from kanzhang/SPARK-1570 and squashes the following commits:

feaaeba [Kan Zhang] [SPARK-1570] Fix classloading in JavaSQLContext.applySchema

ea8cea82

Fix compilation on Hadoop 2.4.x. · 0ea0b1a2

Marcelo Vanzin authored 10 years ago

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #483 from vanzin/yarn-2.4 and squashes the following commits:

0fc57d8 [Marcelo Vanzin] Fix compilation on Hadoop 2.4.x.

0ea0b1a2

[Fix #204] Eliminate delay between binding and log checking · 745e496c

Andrew Or authored 10 years ago

**Bug**: In the existing history server, there is a `spark.history.updateInterval` seconds delay before application logs show up on the UI.

**Cause**: This is because the following events happen in this order: (1) The background thread that checks for logs starts, but realizes the server has not yet bound and so waits for N seconds, (2) server binds, (3) N seconds later the background thread finds that the server has finally bound to a port, and so finally checks for application logs.

**Fix**: This PR forces the log checking thread to start immediately after binding. It also documents two relevant environment variables that are currently missing.

Author: Andrew Or <andrewor14@gmail.com>

Closes #441 from andrewor14/history-server-fix and squashes the following commits:

b2eb46e [Andrew Or] Document SPARK_PUBLIC_DNS and SPARK_HISTORY_OPTS for the history server
e8d1fbc [Andrew Or] Eliminate delay between binding and checking for logs

745e496c

[SPARK-1506][MLLIB] Documentation improvements for MLlib 1.0 · 26d35f3f

Xiangrui Meng authored 10 years ago

Preview: http://54.82.240.23:4000/mllib-guide.html

Table of contents:

* Basics
  * Data types
  * Summary statistics
* Classification and regression
  * linear support vector machine (SVM)
  * logistic regression
  * linear linear squares, Lasso, and ridge regression
  * decision tree
  * naive Bayes
* Collaborative Filtering
  * alternating least squares (ALS)
* Clustering
  * k-means
* Dimensionality reduction
  * singular value decomposition (SVD)
  * principal component analysis (PCA)
* Optimization
  * stochastic gradient descent
  * limited-memory BFGS (L-BFGS)

Author: Xiangrui Meng <meng@databricks.com>

Closes #422 from mengxr/mllib-doc and squashes the following commits:

944e3a9 [Xiangrui Meng] merge master
f9fda28 [Xiangrui Meng] minor
9474065 [Xiangrui Meng] add alpha to ALS examples
928e630 [Xiangrui Meng] initialization_mode -> initializationMode
5bbff49 [Xiangrui Meng] add imports to labeled point examples
c17440d [Xiangrui Meng] fix python nb example
28f40dc [Xiangrui Meng] remove localhost:4000
369a4d3 [Xiangrui Meng] Merge branch 'master' into mllib-doc
7dc95cc [Xiangrui Meng] update linear methods
053ad8a [Xiangrui Meng] add links to go back to the main page
abbbf7e [Xiangrui Meng] update ALS argument names
648283e [Xiangrui Meng] level down statistics
14e2287 [Xiangrui Meng] add sample libsvm data and use it in guide
8cd2441 [Xiangrui Meng] minor updates
186ab07 [Xiangrui Meng] update section names
6568d65 [Xiangrui Meng] update toc, level up lr and svm
162ee12 [Xiangrui Meng] rename section names
5c1e1b1 [Xiangrui Meng] minor
8aeaba1 [Xiangrui Meng] wrap long lines
6ce6a6f [Xiangrui Meng] add summary statistics to toc
5760045 [Xiangrui Meng] claim beta
cc604bf [Xiangrui Meng] remove classification and regression
92747b3 [Xiangrui Meng] make section titles consistent
e605dd6 [Xiangrui Meng] add LIBSVM loader
f639674 [Xiangrui Meng] add python section to migration guide
c82ffb4 [Xiangrui Meng] clean optimization
31660eb [Xiangrui Meng] update linear algebra and stat
0a40837 [Xiangrui Meng] first pass over linear methods
1fc8271 [Xiangrui Meng] update toc
906ed0a [Xiangrui Meng] add a python example to naive bayes
5f0a700 [Xiangrui Meng] update collaborative filtering
656d416 [Xiangrui Meng] update mllib-clustering
86e143a [Xiangrui Meng] remove data types section from main page
8d1a128 [Xiangrui Meng] move part of linear algebra to data types and add Java/Python examples
d1b5cbf [Xiangrui Meng] merge master
72e4804 [Xiangrui Meng] one pass over tree guide
64f8995 [Xiangrui Meng] move decision tree guide to a separate file
9fca001 [Xiangrui Meng] add first version of linear algebra guide
53c9552 [Xiangrui Meng] update dependencies
f316ec2 [Xiangrui Meng] add migration guide
f399f6c [Xiangrui Meng] move linear-algebra to dimensionality-reduction
182460f [Xiangrui Meng] add guide for naive Bayes
137fd1d [Xiangrui Meng] re-organize toc
a61e434 [Xiangrui Meng] update mllib's toc

26d35f3f

[SPARK-1281] Improve partitioning in ALS · bf9d49b6

Tor Myklebust authored 10 years ago

ALS was using HashPartitioner and explicit uses of `%` together. Further, the naked use of `%` meant that, if the number of partitions corresponded with the stride of arithmetic progressions appearing in user and product ids, users and products could be mapped into buckets in an unfair or unwise way.

This pull request:
1) Makes the Partitioner an instance variable of ALS.
2) Replaces the direct uses of `%` with calls to a Partitioner.
3) Defines an anonymous Partitioner that scrambles the bits of the object's hashCode before reducing to the number of present buckets.

This pull request does not make the partitioner user-configurable.

I'm not all that happy about the way I did (1). It introduces an icky lifetime issue and dances around it by nulling something. However, I don't know a better way to make the partitioner visible everywhere it needs to be visible.

Author: Tor Myklebust <tmyklebu@gmail.com>

Closes #407 from tmyklebu/master and squashes the following commits:

dcf583a [Tor Myklebust] Remove the partitioner member variable; instead, thread that needle everywhere it needs to go.
23d6f91 [Tor Myklebust] Stop making the partitioner configurable.
495784f [Tor Myklebust] Merge branch 'master' of https://github.com/apache/spark
674933a [Tor Myklebust] Fix style.
40edc23 [Tor Myklebust] Fix missing space.
f841345 [Tor Myklebust] Fix daft bug creating 'pairs', also for -> foreach.
5ec9e6c [Tor Myklebust] Clean a couple of things up using 'map'.
36a0f43 [Tor Myklebust] Make the partitioner private.
d872b09 [Tor Myklebust] Add negative id ALS test.
df27697 [Tor Myklebust] Support custom partitioners. Currently we use the same partitioner for users and products.
c90b6d8 [Tor Myklebust] Scramble user and product ids before bucketing.
c774d7d [Tor Myklebust] Make the partitioner a member variable and use it instead of modding directly.

bf9d49b6

fix bugs of dot in python · c919798f

Xusen Yin authored 10 years ago

If there are no `transpose()` in `self.theta`, a

*ValueError: matrices are not aligned*

is occurring. The former test case just ignore this situation.

Author: Xusen Yin <yinxusen@gmail.com>

Closes #463 from yinxusen/python-naive-bayes and squashes the following commits:

fcbe3bc [Xusen Yin] fix bugs of dot in python

c919798f

[SPARK-1560]: Updated Pyrolite Dependency to be Java 6 compatible · 0f87e6ad

Ahir Reddy authored 10 years ago

Changed the Pyrolite dependency to a build which targets Java 6.

Author: Ahir Reddy <ahirreddy@gmail.com>

Closes #479 from ahirreddy/java6-pyrolite and squashes the following commits:

8ea25d3 [Ahir Reddy] Updated maven build to use java 6 compatible pyrolite
dabc703 [Ahir Reddy] Updated Pyrolite dependency to be Java 6 compatible

0f87e6ad

[HOTFIX] SPARK-1399: remove outdated comments · 87de2908

CodingCat authored 10 years ago

as the original PR was merged before this mistake is found....fix here,

Sorry about that @pwendell, @andrewor14, I will be more careful next time

Author: CodingCat <zhunansjtu@gmail.com>

Closes #474 from CodingCat/hotfix_1399 and squashes the following commits:

f3a8ba9 [CodingCat] move outdated comments

87de2908

SPARK-1496: Have jarOfClass return Option[String] · 83084d3b

Patrick Wendell authored 10 years ago

A simple change, mostly had to change a bunch of example code.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #438 from pwendell/jar-of-class and squashes the following commits:

aa010ff [Patrick Wendell] SPARK-1496: Have jarOfClass return Option[String]

83084d3b

[SPARK-1459] Use local path (and not complete URL) when opening local lo... · ac164b79

Marcelo Vanzin authored 10 years ago

...g file.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #375 from vanzin/event-file and squashes the following commits:

f673029 [Marcelo Vanzin] [SPARK-1459] Use local path (and not complete URL) when opening local log file.

ac164b79

[Fix #274] Document + fix annotation usages · b3e5366f

Andrew Or authored 10 years ago

... so that we don't follow an unspoken set of forbidden rules for adding **@AlphaComponent**, **@DeveloperApi**, and **@Experimental** annotations in the code.

In addition, this PR
(1) removes unnecessary `:: * ::` tags,
(2) adds missing `:: * ::` tags, and
(3) removes annotations for internal APIs.

Author: Andrew Or <andrewor14@gmail.com>

Closes #470 from andrewor14/annotations-fix and squashes the following commits:

92a7f42 [Andrew Or] Document + fix annotation usages

b3e5366f

Apr 21, 2014

[SPARK-1439, SPARK-1440] Generate unified Scaladoc across projects and Javadocs · fc783847

Matei Zaharia authored 10 years ago

I used the sbt-unidoc plugin (https://github.com/sbt/sbt-unidoc) to create a unified Scaladoc of our public packages, and generate Javadocs as well. One limitation is that I haven't found an easy way to exclude packages in the Javadoc; there is a SBT task that identifies Java sources to run javadoc on, but it's been very difficult to modify it from outside to change what is set in the unidoc package. Some SBT-savvy people should help with this. The Javadoc site also lacks package-level descriptions and things like that, so we may want to look into that. We may decide not to post these right now if it's too limited compared to the Scala one.

Example of the built doc site: http://people.csail.mit.edu/matei/spark-unified-docs/

Author: Matei Zaharia <matei@databricks.com>

This patch had conflicts when merged, resolved by
Committer: Patrick Wendell <pwendell@gmail.com>

Closes #457 from mateiz/better-docs and squashes the following commits:

a63d4a3 [Matei Zaharia] Skip Java/Scala API docs for Python package
5ea1f43 [Matei Zaharia] Fix links to Java classes in Java guide, fix some JS for scrolling to anchors on page load
f05abc0 [Matei Zaharia] Don't include java.lang package names
995e992 [Matei Zaharia] Skip internal packages and class names with $ in JavaDoc
a14a93c [Matei Zaharia] typo
76ce64d [Matei Zaharia] Add groups to Javadoc index page, and a first package-info.java
ed6f994 [Matei Zaharia] Generate JavaDoc as well, add titles, update doc site to use unified docs
acb993d [Matei Zaharia] Add Unidoc plugin for the projects we want Unidoced

fc783847

[SPARK-1332] Improve Spark Streaming's Network Receiver and InputDStream API [WIP] · 04c37b6f

Tathagata Das authored 10 years ago

The current Network Receiver API makes it slightly complicated to right a new receiver as one needs to create an instance of BlockGenerator as shown in SocketReceiver
https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/SocketInputDStream.scala#L51

Exposing the BlockGenerator interface has made it harder to improve the receiving process. The API of NetworkReceiver (which was not a very stable API anyways) needs to be change if we are to ensure future stability.

Additionally, the functions like streamingContext.socketStream that create input streams, return DStream objects. That makes it hard to expose functionality (say, rate limits) unique to input dstreams. They should return InputDStream or NetworkInputDStream. This is still not yet implemented.

This PR is blocked on the graceful shutdown PR #247

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #300 from tdas/network-receiver-api and squashes the following commits:

ea27b38 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into network-receiver-api
3a4777c [Tathagata Das] Renamed NetworkInputDStream to ReceiverInputDStream, and ActorReceiver related stuff.
838dd39 [Tathagata Das] Added more events to the StreamingListener to report errors and stopped receivers.
a75c7a6 [Tathagata Das] Address some PR comments and fixed other issues.
91bfa72 [Tathagata Das] Fixed bugs.
8533094 [Tathagata Das] Scala style fixes.
028bde6 [Tathagata Das] Further refactored receiver to allow restarting of a receiver.
43f5290 [Tathagata Das] Made functions that create input streams return InputDStream and NetworkInputDStream, for both Scala and Java.
2c94579 [Tathagata Das] Fixed graceful shutdown by removing interrupts on receiving thread.
9e37a0b [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into network-receiver-api
3223e95 [Tathagata Das] Refactored the code that runs the NetworkReceiver into further classes and traits to make them more testable.
a36cc48 [Tathagata Das] Refactored the NetworkReceiver API for future stability.

04c37b6f

Dev script: include RC name in git tag · 5a5b3346
Patrick Wendell authored 10 years ago

5a5b3346

SPARK-1399: show stage failure reason in UI · 43e4a29d

CodingCat authored 10 years ago

https://issues.apache.org/jira/browse/SPARK-1399

refactor StageTable a bit to support additional column for failed stage

Author: CodingCat <zhunansjtu@gmail.com>
Author: Nan Zhu <CodingCat@users.noreply.github.com>

Closes #421 from CodingCat/SPARK-1399 and squashes the following commits:

2caba36 [CodingCat] remove dummy tag
77cf305 [CodingCat] create dummy element to wrap columns
3989ce2 [CodingCat] address Aaron's comments
18fc09f [Nan Zhu] fix compile error
00ea30a [Nan Zhu] address Kay's comments
16ac83d [CodingCat] set a default value of failureReason
35df3df [CodingCat] address andrew's comments
06d21a4 [CodingCat] address andrew's comments
25a6db6 [CodingCat] style fix
dc8856d [CodingCat] show stage failure reason in UI

43e4a29d

SPARK-1539: RDDPage.scala contains RddPage class · b7df31eb

Xiangrui Meng authored 10 years ago

SPARK-1386 changed RDDPage to RddPage but didn't change the filename. I tried sbt/sbt publish-local. Inside the spark-core jar, the unit name is RDDPage.class and hence I got the following error:

~~~
[error] (run-main) java.lang.NoClassDefFoundError: org/apache/spark/ui/storage/RddPage
java.lang.NoClassDefFoundError: org/apache/spark/ui/storage/RddPage
	at org.apache.spark.ui.SparkUI.initialize(SparkUI.scala:59)
	at org.apache.spark.ui.SparkUI.<init>(SparkUI.scala:52)
	at org.apache.spark.ui.SparkUI.<init>(SparkUI.scala:42)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:215)
	at MovieLensALS$.main(MovieLensALS.scala:38)
	at MovieLensALS.main(MovieLensALS.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.ui.storage.RddPage
	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
	at org.apache.spark.ui.SparkUI.initialize(SparkUI.scala:59)
	at org.apache.spark.ui.SparkUI.<init>(SparkUI.scala:52)
	at org.apache.spark.ui.SparkUI.<init>(SparkUI.scala:42)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:215)
	at MovieLensALS$.main(MovieLensALS.scala:38)
	at MovieLensALS.main(MovieLensALS.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
~~~

This can be fixed after renaming RddPage to RDDPage, or renaming RDDPage.scala to RddPage.scala. I chose the former since the name `RDD` is common in Spark code.

Author: Xiangrui Meng <meng@databricks.com>

Closes #454 from mengxr/rddpage-fix and squashes the following commits:

f75e544 [Xiangrui Meng] rename RddPage to RDDPage

b7df31eb

[Hot Fix] Ignore org.apache.spark.ui.UISuite tests · af46f1fd

Andrew Or authored 10 years ago

#446 faced a connection refused exception from these tests, causing them to timeout and fail after a long time. For now, let's disable these tests.

(We recently disabled the corresponding test in streaming in 7863ecca. These tests are very similar).

Author: Andrew Or <andrewor14@gmail.com>

Closes #466 from andrewor14/ignore-ui-tests and squashes the following commits:

6f5a362 [Andrew Or] Ignore org.apache.spark.ui.UISuite tests

af46f1fd

Clean up and simplify Spark configuration · fb98488f

Patrick Wendell authored 10 years ago

Over time as we've added more deployment modes, this have gotten a bit unwieldy with user-facing configuration options in Spark. Going forward we'll advise all users to run `spark-submit` to launch applications. This is a WIP patch but it makes the following improvements:

1. Improved `spark-env.sh.template` which was missing a lot of things users now set in that file.
2. Removes the shipping of SPARK_CLASSPATH, SPARK_JAVA_OPTS, and SPARK_LIBRARY_PATH to the executors on the cluster. This was an ugly hack. Instead it introduces config variables spark.executor.extraJavaOpts, spark.executor.extraLibraryPath, and spark.executor.extraClassPath.
3. Adds ability to set these same variables for the driver using `spark-submit`.
4. Allows you to load system properties from a `spark-defaults.conf` file when running `spark-submit`. This will allow setting both SparkConf options and other system properties utilized by `spark-submit`.
5. Made `SPARK_LOCAL_IP` an environment variable rather than a SparkConf property. This is more consistent with it being set on each node.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #299 from pwendell/config-cleanup and squashes the following commits:

127f301 [Patrick Wendell] Improvements to testing
a006464 [Patrick Wendell] Moving properties file template.
b4b496c [Patrick Wendell] spark-defaults.properties -> spark-defaults.conf
0086939 [Patrick Wendell] Minor style fixes
af09e3e [Patrick Wendell] Mention config file in docs and clean-up docs
b16e6a2 [Patrick Wendell] Cleanup of spark-submit script and Scala quick start guide
af0adf7 [Patrick Wendell] Automatically add user jar
a56b125 [Patrick Wendell] Responses to Tom's review
d50c388 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into config-cleanup
a762901 [Patrick Wendell] Fixing test failures
ffa00fe [Patrick Wendell] Review feedback
fda0301 [Patrick Wendell] Note
308f1f6 [Patrick Wendell] Properly escape quotes and other clean-up for YARN
e83cd8f [Patrick Wendell] Changes to allow re-use of test applications
be42f35 [Patrick Wendell] Handle case where SPARK_HOME is not set
c2a2909 [Patrick Wendell] Test compile fixes
4ee6f9d [Patrick Wendell] Making YARN doc changes consistent
afc9ed8 [Patrick Wendell] Cleaning up line limits and two compile errors.
b08893b [Patrick Wendell] Additional improvements.
ace4ead [Patrick Wendell] Responses to review feedback.
b72d183 [Patrick Wendell] Review feedback for spark env file
46555c1 [Patrick Wendell] Review feedback and import clean-ups
437aed1 [Patrick Wendell] Small fix
761ebcd [Patrick Wendell] Library path and classpath for drivers
7cc70e4 [Patrick Wendell] Clean up terminology inside of spark-env script
5b0ba8e [Patrick Wendell] Don't ship executor envs
84cc5e5 [Patrick Wendell] Small clean-up
1f75238 [Patrick Wendell] SPARK_JAVA_OPTS --> SPARK_MASTER_OPTS for master settings
4982331 [Patrick Wendell] Remove SPARK_LIBRARY_PATH
6eaf7d0 [Patrick Wendell] executorJavaOpts
0faa3b6 [Patrick Wendell] Stash of adding config options in submit script and YARN
ac2d65e [Patrick Wendell] Change spark.local.dir -> SPARK_LOCAL_DIRS

fb98488f

Apr 19, 2014

REPL cleanup. · 3a390bfd

Michael Armbrust authored 10 years ago

Author: Michael Armbrust <michael@databricks.com>

Closes #451 from marmbrus/replCleanup and squashes the following commits:

088526a [Michael Armbrust] REPL cleanup.

3a390bfd

[SPARK-1535] ALS: Avoid the garbage-creating ctor of DoubleMatrix · 25fc3188

Tor Myklebust authored 10 years ago

`new DoubleMatrix(double[])` creates a garbage `double[]` of the same length as its argument and immediately throws it away.  This pull request avoids that constructor in the ALS code.

Author: Tor Myklebust <tmyklebu@gmail.com>

Closes #442 from tmyklebu/foo2 and squashes the following commits:

2784fc5 [Tor Myklebust] Mention that this is probably fixed as of jblas 1.2.4; repunctuate.
a09904f [Tor Myklebust] Helper function for wrapping Array[Double]'s with DoubleMatrix's.

25fc3188

Add insertInto and saveAsTable to Python API. · 10d04213

Michael Armbrust authored 10 years ago

Author: Michael Armbrust <michael@databricks.com>

Closes #447 from marmbrus/pythonInsert and squashes the following commits:

c7ab692 [Michael Armbrust] Keep docstrings < 72 chars.
ff62870 [Michael Armbrust] Add insertInto and saveAsTable to Python API.

10d04213

Use scala deprecation instead of java. · 5d0f58b2

Michael Armbrust authored 10 years ago

This gets rid of a warning when compiling core (since we were depending on a deprecated interface with a non-deprecated function). I also tested with javac, and this does the right thing when compiling java code.

Author: Michael Armbrust <michael@databricks.com>

Closes #452 from marmbrus/scalaDeprecation and squashes the following commits:

f628b4d [Michael Armbrust] Use scala deprecation instead of java.

5d0f58b2

README update · 28238c81

Reynold Xin authored 10 years ago

Author: Reynold Xin <rxin@apache.org>

Closes #443 from rxin/readme and squashes the following commits:

16853de [Reynold Xin] Updated SBT and Scala instructions.
3ac3ceb [Reynold Xin] README update

28238c81

Apr 18, 2014

SPARK-1482: Fix potential resource leaks in saveAsHadoopDataset and save... · 2089e0e7

zsxwing authored 10 years ago

...AsNewAPIHadoopDataset

`writer.close` should be put in the `finally` block to avoid potential resource leaks.

JIRA: https://issues.apache.org/jira/browse/SPARK-1482

Author: zsxwing <zsxwing@gmail.com>

Closes #400 from zsxwing/SPARK-1482 and squashes the following commits:

06b197a [zsxwing] SPARK-1482: Fix potential resource leaks in saveAsHadoopDataset and saveAsNewAPIHadoopDataset

2089e0e7

SPARK-1456 Remove view bounds on Ordered in favor of a context bound on Ordering. · c399baa0

Michael Armbrust authored 10 years ago

This doesn't require creating new Ordering objects per row. Additionally, [view bounds are going to be deprecated](https://issues.scala-lang.org/browse/SI-7629), so we should get rid of them while APIs are still flexible.

Author: Michael Armbrust <michael@databricks.com>

Closes #410 from marmbrus/viewBounds and squashes the following commits:

c574221 [Michael Armbrust] fix example.
812008e [Michael Armbrust] Update Java API.
1b9b85c [Michael Armbrust] Update scala doc.
35798a8 [Michael Armbrust] Remove view bounds on Ordered in favor of a context bound on Ordering.

c399baa0

Fixed broken pyspark shell. · 81a152c5

Reynold Xin authored 10 years ago

Author: Reynold Xin <rxin@apache.org>

Closes #444 from rxin/pyspark and squashes the following commits:

fc11356 [Reynold Xin] Made the PySpark shell version checking compatible with Python 2.6.
571830b [Reynold Xin] Fixed broken pyspark shell.

81a152c5

SPARK-1523: improve the readability of code in AkkaUtil · 3c7a9bae

CodingCat authored 10 years ago

Actually it is separated from https://github.com/apache/spark/pull/85 as suggested by @rxin

compare

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/AkkaUtils.scala#L122

and

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/AkkaUtils.scala#L117

the first one use get and then toLong, the second one getLong....better to make them consistent

very very small fix........

Author: CodingCat <zhunansjtu@gmail.com>

Closes #434 from CodingCat/SPARK-1523 and squashes the following commits:

0e86f3f [CodingCat] improve the readability of code in AkkaUtil

3c7a9bae

SPARK-1357 (addendum). More Experimental items in MLlib · 8aa1f4c4

Sean Owen authored 10 years ago

Per discussion, this is my suggestion to make ALS Rating, ClassificationModel, RegressionModel experimental for now, to reserve the right to possibly change after 1.0. See what you think of this much.

Author: Sean Owen <sowen@cloudera.com>

Closes #372 from srowen/SPARK-1357Addendum and squashes the following commits:

17cf1ea [Sean Owen] Remove (another) blank line after ":: Experimental ::"
6800e4c [Sean Owen] Remove blank line after ":: Experimental ::"
b3a88d2 [Sean Owen] Make ALS Rating, ClassificationModel, RegressionModel experimental for now, to reserve the right to possibly change after 1.0

8aa1f4c4

[SPARK-1520] remove fastutil from dependencies · aa17f022

Xiangrui Meng authored 10 years ago

A quick fix for https://issues.apache.org/jira/browse/SPARK-1520

By excluding fastutil, we bring the number of files in the assembly jar back under 65536, so Java 7 won't create the assembly jar in zip64 format, which cannot be read by Java 6.

With this change, the assembly jar now has about 60000 entries (58000 files), tested with both sbt and maven.

Author: Xiangrui Meng <meng@databricks.com>

Closes #437 from mengxr/remove-fastutil and squashes the following commits:

00f9beb [Xiangrui Meng] remove fastutil from dependencies

aa17f022

Reuses Row object in ExistingRdd.productToRowRdd() · 89f47434

Cheng Lian authored 10 years ago

Author: Cheng Lian <lian.cs.zju@gmail.com>

Closes #432 from liancheng/reuseRow and squashes the following commits:

9e6d083 [Cheng Lian] Simplified code with BufferedIterator
52acec9 [Cheng Lian] Reuses Row object in ExistingRdd.productToRowRdd()

89f47434

SPARK-1483: Rename minSplits to minPartitions in public APIs · e31c8ffc

CodingCat authored 10 years ago

https://issues.apache.org/jira/browse/SPARK-1483

From the original JIRA: " The parameter name is part of the public API in Scala and Python, since you can pass named parameters to a method, so we should name it to this more descriptive term. Everywhere else we refer to "splits" as partitions." - @mateiz

Author: CodingCat <zhunansjtu@gmail.com>

Closes #430 from CodingCat/SPARK-1483 and squashes the following commits:

4b60541 [CodingCat] deprecate defaultMinSplits
ba2c663 [CodingCat] Rename minSplits to minPartitions in public APIs

e31c8ffc

Apr 17, 2014

HOTFIX: Ignore streaming UI test · 7863ecca

Patrick Wendell authored 10 years ago

This is currently causing many builds to hang.

https://issues.apache.org/jira/browse/SPARK-1530

Author: Patrick Wendell <pwendell@gmail.com>

Closes #440 from pwendell/uitest-fix and squashes the following commits:

9a143dc [Patrick Wendell] Ignore streaming UI test

7863ecca

FIX: Don't build Hive in assembly unless running Hive tests. · 6c746ba3

Patrick Wendell authored 10 years ago

This will make the tests more stable when not running SQL tests.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #439 from pwendell/hive-tests and squashes the following commits:

88a6032 [Patrick Wendell] FIX: Don't build Hive in assembly unless running Hive tests.

6c746ba3