Commits · e91ad3f164b64e727f41ced6ae20d70ca4c92521 · cs525-sp18-g07 / spark

Jan 18, 2014
- Correct L2 regularized weight update with canonical form · e91ad3f1
  Sean Owen authored 11 years ago
  
  e91ad3f1
Jan 17, 2014
- Merge pull request #451 from Qiuzhuang/master · d749d472
  Patrick Wendell authored 11 years ago
  
  Fixed Window spark shell launch script error. JIRA SPARK-1029:https://spark-project.atlassian.net/browse/SPARK-1029
  d749d472
- Merge pull request #438 from ScrapCodes/clone-records-java-api · d4fd89e3
  Patrick Wendell authored 11 years ago
  
  Clone records java api
  d4fd89e3
Jan 16, 2014

adding clone records field to equivaled java apis · fcb4fc65
Prashant Sharma authored 11 years ago

fcb4fc65
Fixed Window spark shell launch script error. · 4e510b0b
Qiuzhuang Lian authored 11 years ago
```
 JIRA SPARK-1029:https://spark-project.atlassian.net/browse/SPARK-1029
```
4e510b0b

Merge pull request #445 from kayousterhout/exec_lost · c06a307c

Reynold Xin authored 11 years ago

Fail rather than hanging if a task crashes the JVM.

Prior to this commit, if a task crashes the JVM, the task (and
all other tasks running on that executor) is marked at KILLED rather
than FAILED.  As a result, the TaskSetManager will retry the task
indefinitely rather than failing the job after maxFailures. Eventually,
this makes the job hang, because the Standalone Scheduler removes
the application after 10 works have failed, and then the app is left
in a state where it's disconnected from the master and waiting to reconnect.
This commit fixes that problem by marking tasks as FAILED rather than
killed when an executor is lost.

The downside of this commit is that if task A fails because another
task running on the same executor caused the VM to crash, the failure
will incorrectly be counted as a failure of task A. This should not
be an issue because we typically set maxFailures to 3, and it is
unlikely that a task will be co-located with a JVM-crashing task
multiple times.

c06a307c

Updated unit test comment · 718a13c1
Kay Ousterhout authored 11 years ago

718a13c1

Jan 15, 2014

Merge pull request #414 from soulmachine/code-style · 84595ea3

Reynold Xin authored 11 years ago

Code clean up for mllib

* Removed unnecessary parentheses
* Removed unused imports
* Simplified `filter...size()` to `count ...`
* Removed obsoleted parameters' comments

84595ea3

Merge pull request #439 from CrazyJvm/master · 0675ca50

Reynold Xin authored 11 years ago

SPARK-1024 Remove "-XX:+UseCompressedStrings" option from tuning guide

remove "-XX:+UseCompressedStrings" option from tuning guide since jdk7 no longer supports this.

0675ca50

Fail rather than hanging if a task crashes the JVM. · a268d634

Kay Ousterhout authored 11 years ago

Prior to this commit, if a task crashes the JVM, the task (and
all other tasks running on that executor) is marked at KILLED rather
than FAILED.  As a result, the TaskSetManager will retry the task
indefiniteily rather than failing the job after maxFailures. This
commit fixes that problem by marking tasks as FAILED rather than
killed when an executor is lost.

The downside of this commit is that if task A fails because another
task running on the same executor caused the VM to crash, the failure
will incorrectly be counted as a failure of task A. This should not
be an issue because we typically set maxFailures to 3, and it is
unlikely that a task will be co-located with a JVM-crashing task
multiple times.

a268d634

Merge pull request #444 from mateiz/py-version · 4f0c361b
Patrick Wendell authored 11 years ago
```
Clarify that Python 2.7 is only needed for MLlib
```
4f0c361b
Clarify that Python 2.7 is only needed for MLlib · 2ffdaefb
Matei Zaharia authored 11 years ago

2ffdaefb

Merge pull request #442 from pwendell/standalone · 59f475c7

Patrick Wendell authored 11 years ago

Workers should use working directory as spark home if it's not specified

If users don't set SPARK_HOME in their environment file when launching an application, the standalone cluster should default to the spark home of the worker.

59f475c7

Merge pull request #443 from tdas/filestream-fix · 2a05403a

Patrick Wendell authored 11 years ago

Made some classes private[stremaing] and deprecated a method in JavaStreamingContext.

Classes `RawTextHelper`, `RawTextSender` and `RateLimitedOutputStream` are not useful in the streaming API. There are not used by the core functionality and was there as a support classes for an obscure example. One of the classes is RawTextSender has a main function which can be executed using bin/spark-class even if it is made private[streaming]. In future, I will probably completely remove these classes. For the time being, I am just converting them to private[streaming].

Accessing underlying JavaSparkContext in JavaStreamingContext was through `JavaStreamingContext.sc` . This is deprecated and preferred method is `JavaStreamingContext.sparkContext` to keep it consistent with the `StreamingContext.sparkContext`.

2a05403a

Made some classes private[stremaing] and deprecated a method in JavaStreamingContext. · 9e637534
Tathagata Das authored 11 years ago

9e637534

Merge pull request #441 from pwendell/graphx-build · 5fecd251

Patrick Wendell authored 11 years ago

GraphX shouldn't list Spark as provided.

I noticed this when building an application against GraphX to audit the released artifacts.

5fecd251

Workers should use working directory as spark home if it's not specified · 00a3f7ee
Patrick Wendell authored 11 years ago

00a3f7ee
GraphX shouldn't list Spark as provided · 9259d706
Patrick Wendell authored 11 years ago

9259d706
Merge pull request #433 from markhamstra/debFix · 494d3c07
Patrick Wendell authored 11 years ago
```
Updated Debian packaging
```
494d3c07

Merge pull request #366 from colorant/yarn-dev · cef2af9c

Thomas Graves authored 11 years ago

More yarn code refactor

Try to retrive common code in yarn alpha/stable for client and workerRunnable to reduce duplicated codes. By put them into a trait in common dir and extends with them.

Same works could be done for the remaining files in alpha/stable , while the remainning files have much more overlapping codes with different API call here and there within functions, and will need much more close review , aslo it might divide functions into too small trifle ones, thus might not deserve to be done in this way.

So just make it run for these two files firstly.

cef2af9c

remove "-XX:+UseCompressedStrings" option · 263933da

CrazyJvm authored 11 years ago

remove "-XX:+UseCompressedStrings" option from tuning guide since jdk7 no longer supports this.

263933da

Merge pull request #436 from ankurdave/VertexId-case · 3d9e66d9
Reynold Xin authored 11 years ago
```
Rename VertexID -> VertexId in GraphX
```
3d9e66d9

Merge pull request #435 from tdas/filestream-fix · 139c24ef

Patrick Wendell authored 11 years ago

Fixed the flaky tests by making SparkConf not serializable

SparkConf was being serialized with CoGroupedRDD and Aggregator, which somehow caused OptionalJavaException while being deserialized as part of a ShuffleMapTask. SparkConf should not even be serializable (according to conversation with Matei). This change fixes that.

@mateiz @pwendell

139c24ef

Merge pull request #434 from rxin/graphxmaven · 087487e9
Patrick Wendell authored 11 years ago
```
Fixed SVDPlusPlusSuite in Maven build.

This should go into 0.9.0 also.
```
087487e9
Merge remote-tracking branch 'apache/master' into filestream-fix · 0e15bd78
Tathagata Das authored 11 years ago

0e15bd78
Changed SparkConf to not be serializable. And also fixed unit-test log paths... · 1f4718c4
Tathagata Das authored 11 years ago
```
Changed SparkConf to not be serializable. And also fixed unit-test log paths in log4j.properties of external modules.
```
1f4718c4
Fixed SVDPlusPlusSuite in Maven build. · dfb15244
Reynold Xin authored 11 years ago

dfb15244
Removed repl-bin and updated maven build doc. · 147a943d
Mark Hamstra authored 11 years ago

147a943d
VertexID -> VertexId · f4d9019a
Ankur Dave authored 11 years ago

f4d9019a
Add deb profile to assembly/pom.xml · 148757e8
Mark Hamstra authored 11 years ago

148757e8

Jan 14, 2014
- Merge pull request #424 from jegonzal/GraphXProgrammingGuide · 3a386e23
  Reynold Xin authored 11 years ago
  
  Additional edits for clarity in the graphx programming guide. Added an overview of the Graph and GraphOps functions and fixed numerous typos.
  3a386e23
- Merge pull request #431 from ankurdave/graphx-caching-doc · ad294db3
  Reynold Xin authored 11 years ago
  
  Describe caching and uncaching in GraphX programming guide
  ad294db3
- Describe GraphX caching and uncaching in guide · 1210ec29
  Ankur Dave authored 11 years ago
  
  1210ec29
- Merge pull request #428 from pwendell/writeable-objects · 74b46acd
  Reynold Xin authored 11 years ago
  
  Don't clone records for text files
  74b46acd
- Merge pull request #429 from ankurdave/graphx-examples-pom.xml · 193a0757
  Reynold Xin authored 11 years ago
  
  Add GraphX dependency to examples/pom.xml
  193a0757
- Merge pull request #427 from pwendell/deprecate-aggregator · d601a76d
  Reynold Xin authored 11 years ago
  
  Deprecate rather than remove old combineValuesByKey function
  d601a76d
- Add GraphX dependency to examples/pom.xml · 8ea056d7
  Ankur Dave authored 11 years ago
  
  8ea056d7
- Style fix · b1b22b7a
  Patrick Wendell authored 11 years ago
  
  b1b22b7a
- Adding fix covering combineCombinersByKey as well · 8ea2cd56
  Patrick Wendell authored 11 years ago
  
  8ea2cd56
- Merge pull request #425 from rxin/scaladoc · 2ce23a55
  Reynold Xin authored 11 years ago
  
  API doc update & make Broadcast public In #413 Broadcast was mistakenly made private[spark]. I changed it to public again. Also exposing id in public given the R frontend requires that. Copied some of the documentation from the programming guide to API Doc for Broadcast and Accumulator. This should be cherry picked into branch-0.9 as well for 0.9.0 release.
  2ce23a55