Commits · 65869f843d5eb2b9b686d07aadaa7f1a0f16e8c7 · cs525-sp18-g07 / spark

Jan 21, 2014
- Removed SPARK_MEM from run-examples. · 65869f84
  Tathagata Das authored 11 years ago
  
  65869f84
Jan 20, 2014
- Made run-example respect SPARK_JAVA_OPTS and SPARK_MEM. · e0b741d0
  Tathagata Das authored 11 years ago
  
  e0b741d0
Jan 19, 2014

Merge pull request #470 from tgravescs/fix_spark_examples_yarn · 792d9084

Patrick Wendell authored 11 years ago

Only log error on missing jar to allow spark examples to jar.

Right now to run the spark examples on Yarn you have to use the --addJars option and put the jar in hdfs.  To make that nicer  so the user doesn't have to specify the --addJars option change it to simply log an error instead of throwing.

792d9084

Merge pull request #458 from tdas/docs-update · 256a3553

Patrick Wendell authored 11 years ago

Updated java API docs for streaming, along with very minor changes in the code examples.

Docs updated for:
Scala: StreamingContext, DStream, PairDStreamFunctions
Java: JavaStreamingContext, JavaDStream, JavaPairDStream

Example updated:
JavaQueueStream: Not use deprecated method
ActorWordCount: Use the public interface the right way.

256a3553

update comment · dd56b212
Thomas Graves authored 11 years ago

dd56b212
Only log error on missing jar to allow spark examples to jar. · ceb79a39
Thomas Graves authored 11 years ago

ceb79a39

Jan 18, 2014

Merge pull request #459 from srowen/UpdaterL2Regularization · fe8a3546

Patrick Wendell authored 11 years ago

Correct L2 regularized weight update with canonical form

Per thread on the user@ mailing list, and comments from Ameet, I believe the weight update for L2 regularization needs to be corrected. See http://mail-archives.apache.org/mod_mbox/spark-user/201401.mbox/%3CCAH3_EVMetuQuhj3__NdUniDLc4P-FMmmrmxw9TS14or8nT4BNQ%40mail.gmail.com%3E

fe8a3546

Merge pull request #437 from mridulm/master · 73dfd42f

Patrick Wendell authored 11 years ago

Minor api usability changes

- Expose checkpoint directory - since it is autogenerated now
- null check for jars
- Expose SparkHadoopUtil : so that configuration creation is abstracted even from user code to avoid duplication of functionality already in spark.

73dfd42f

Merge pull request #426 from mateiz/py-ml-tests · 4c16f79c

Patrick Wendell authored 11 years ago

Re-enable Python MLlib tests (require Python 2.7 and NumPy 1.7+)

We disabled these earlier because Jenkins didn't have these versions.

4c16f79c

Merge pull request #462 from mateiz/conf-file-fix · bf569954

Patrick Wendell authored 11 years ago

Remove Typesafe Config usage and conf files to fix nested property names

With Typesafe Config we had the subtle problem of no longer allowing
nested property names, which are used for a few of our properties:
http://apache-spark-developers-list.1001551.n3.nabble.com/Config-properties-broken-in-master-td208.html



This PR is for branch 0.9 but should be added into master too.
(cherry picked from commit 34e911ce)

Signed-off-by: Patrick Wendell <pwendell@gmail.com>

bf569954

Merge pull request #461 from pwendell/master · aa981e4e
Patrick Wendell authored 11 years ago
```
Use renamed shuffle spill config in CoGroupedRDD.scala

This one got missed when it was renamed.
```
aa981e4e
Use renamed shuffle spill config in CoGroupedRDD.scala · 5316bcac
Patrick Wendell authored 11 years ago

5316bcac
Correct L2 regularized weight update with canonical form · e91ad3f1
Sean Owen authored 11 years ago

e91ad3f1

Jan 17, 2014
- Address review comment · b690e11d
  Mridul Muralidharan authored 11 years ago
  
  b690e11d
- Merge pull request #451 from Qiuzhuang/master · d749d472
  Patrick Wendell authored 11 years ago
  
  Fixed Window spark shell launch script error. JIRA SPARK-1029:https://spark-project.atlassian.net/browse/SPARK-1029
  d749d472
- Merge pull request #438 from ScrapCodes/clone-records-java-api · d4fd89e3
  Patrick Wendell authored 11 years ago
  
  Clone records java api
  d4fd89e3
Jan 16, 2014

adding clone records field to equivaled java apis · fcb4fc65
Prashant Sharma authored 11 years ago

fcb4fc65
Updated java API docs for streaming, along with very minor changes in the code examples. · 11e6534d
Tathagata Das authored 11 years ago

11e6534d
Use method, not variable · edd82c58
Mridul Muralidharan authored 11 years ago

edd82c58
Address review comments · 1a0da892
Mridul Muralidharan authored 11 years ago

1a0da892
Fixed Window spark shell launch script error. · 4e510b0b
Qiuzhuang Lian authored 11 years ago
```
 JIRA SPARK-1029:https://spark-project.atlassian.net/browse/SPARK-1029
```
4e510b0b

Merge pull request #445 from kayousterhout/exec_lost · c06a307c

Reynold Xin authored 11 years ago

Fail rather than hanging if a task crashes the JVM.

Prior to this commit, if a task crashes the JVM, the task (and
all other tasks running on that executor) is marked at KILLED rather
than FAILED.  As a result, the TaskSetManager will retry the task
indefinitely rather than failing the job after maxFailures. Eventually,
this makes the job hang, because the Standalone Scheduler removes
the application after 10 works have failed, and then the app is left
in a state where it's disconnected from the master and waiting to reconnect.
This commit fixes that problem by marking tasks as FAILED rather than
killed when an executor is lost.

The downside of this commit is that if task A fails because another
task running on the same executor caused the VM to crash, the failure
will incorrectly be counted as a failure of task A. This should not
be an issue because we typically set maxFailures to 3, and it is
unlikely that a task will be co-located with a JVM-crashing task
multiple times.

c06a307c

Updated unit test comment · 718a13c1
Kay Ousterhout authored 11 years ago

718a13c1

Jan 15, 2014

Merge pull request #414 from soulmachine/code-style · 84595ea3

Reynold Xin authored 11 years ago

Code clean up for mllib

* Removed unnecessary parentheses
* Removed unused imports
* Simplified `filter...size()` to `count ...`
* Removed obsoleted parameters' comments

84595ea3

Merge pull request #439 from CrazyJvm/master · 0675ca50

Reynold Xin authored 11 years ago

SPARK-1024 Remove "-XX:+UseCompressedStrings" option from tuning guide

remove "-XX:+UseCompressedStrings" option from tuning guide since jdk7 no longer supports this.

0675ca50

Fail rather than hanging if a task crashes the JVM. · a268d634

Kay Ousterhout authored 11 years ago

Prior to this commit, if a task crashes the JVM, the task (and
all other tasks running on that executor) is marked at KILLED rather
than FAILED.  As a result, the TaskSetManager will retry the task
indefiniteily rather than failing the job after maxFailures. This
commit fixes that problem by marking tasks as FAILED rather than
killed when an executor is lost.

The downside of this commit is that if task A fails because another
task running on the same executor caused the VM to crash, the failure
will incorrectly be counted as a failure of task A. This should not
be an issue because we typically set maxFailures to 3, and it is
unlikely that a task will be co-located with a JVM-crashing task
multiple times.

a268d634

Merge pull request #444 from mateiz/py-version · 4f0c361b
Patrick Wendell authored 11 years ago
```
Clarify that Python 2.7 is only needed for MLlib
```
4f0c361b
Clarify that Python 2.7 is only needed for MLlib · 2ffdaefb
Matei Zaharia authored 11 years ago

2ffdaefb

Merge pull request #442 from pwendell/standalone · 59f475c7

Patrick Wendell authored 11 years ago

Workers should use working directory as spark home if it's not specified

If users don't set SPARK_HOME in their environment file when launching an application, the standalone cluster should default to the spark home of the worker.

59f475c7

Merge pull request #443 from tdas/filestream-fix · 2a05403a

Patrick Wendell authored 11 years ago

Made some classes private[stremaing] and deprecated a method in JavaStreamingContext.

Classes `RawTextHelper`, `RawTextSender` and `RateLimitedOutputStream` are not useful in the streaming API. There are not used by the core functionality and was there as a support classes for an obscure example. One of the classes is RawTextSender has a main function which can be executed using bin/spark-class even if it is made private[streaming]. In future, I will probably completely remove these classes. For the time being, I am just converting them to private[streaming].

Accessing underlying JavaSparkContext in JavaStreamingContext was through `JavaStreamingContext.sc` . This is deprecated and preferred method is `JavaStreamingContext.sparkContext` to keep it consistent with the `StreamingContext.sparkContext`.

2a05403a

Made some classes private[stremaing] and deprecated a method in JavaStreamingContext. · 9e637534
Tathagata Das authored 11 years ago

9e637534

Merge pull request #441 from pwendell/graphx-build · 5fecd251

Patrick Wendell authored 11 years ago

GraphX shouldn't list Spark as provided.

I noticed this when building an application against GraphX to audit the released artifacts.

5fecd251

Workers should use working directory as spark home if it's not specified · 00a3f7ee
Patrick Wendell authored 11 years ago

00a3f7ee
GraphX shouldn't list Spark as provided · 9259d706
Patrick Wendell authored 11 years ago

9259d706
Merge pull request #433 from markhamstra/debFix · 494d3c07
Patrick Wendell authored 11 years ago
```
Updated Debian packaging
```
494d3c07

Merge pull request #366 from colorant/yarn-dev · cef2af9c

Thomas Graves authored 11 years ago

More yarn code refactor

Try to retrive common code in yarn alpha/stable for client and workerRunnable to reduce duplicated codes. By put them into a trait in common dir and extends with them.

Same works could be done for the remaining files in alpha/stable , while the remainning files have much more overlapping codes with different API call here and there within functions, and will need much more close review , aslo it might divide functions into too small trifle ones, thus might not deserve to be done in this way.

So just make it run for these two files firstly.

cef2af9c

remove "-XX:+UseCompressedStrings" option · 263933da

CrazyJvm authored 11 years ago

remove "-XX:+UseCompressedStrings" option from tuning guide since jdk7 no longer supports this.

263933da

Merge pull request #436 from ankurdave/VertexId-case · 3d9e66d9
Reynold Xin authored 11 years ago
```
Rename VertexID -> VertexId in GraphX
```
3d9e66d9
Expose method and class - so that we can use it from user code (particularly... · 0aea33d3
Mridul Muralidharan authored 11 years ago
```
Expose method and class - so that we can use it from user code (particularly since checkpoint directory is autogenerated now
```
0aea33d3

Merge pull request #435 from tdas/filestream-fix · 139c24ef

Patrick Wendell authored 11 years ago

Fixed the flaky tests by making SparkConf not serializable

SparkConf was being serialized with CoGroupedRDD and Aggregator, which somehow caused OptionalJavaException while being deserialized as part of a ShuffleMapTask. SparkConf should not even be serializable (according to conversation with Matei). This change fixes that.

@mateiz @pwendell

139c24ef