Commits · 22374559a23adbcb5c286e0aadc7cd40c228726f · cs525-sp18-g07 / spark

Jan 09, 2014
- Remove GraphX README · 22374559
  Ankur Dave authored 11 years ago
  
  22374559
Jan 08, 2014

Fix AbstractMethodError by inlining zip{Edge,Vertex}Partitions · 74fdfac1

Ankur Dave authored 11 years ago

The zip{Edge,Vertex}Partitions methods created doubly-nested closures
and passed them to zipPartitions. For some reason this caused an
AbstractMethodError when zipPartitions tried to invoke the closure. This
commit works around the problem by inlining these methods wherever they
are called, eliminating the doubly-nested closure.

74fdfac1

Take SparkConf in constructor of Serializer subclasses · ab861d84
Ankur Dave authored 11 years ago

ab861d84
Manifest -> Tag in variable names · 0ad75cdf
Ankur Dave authored 11 years ago

0ad75cdf
ClassManifest -> ClassTag · ac536345
Ankur Dave authored 11 years ago

ac536345
Fix mis-merge in 44fd30d3 · 78d6b13a
Ankur Dave authored 11 years ago

78d6b13a

Merge remote-tracking branch 'spark-upstream/master' into HEAD · 91227566

Ankur Dave authored 11 years ago

Conflicts:
	README.md
	core/src/main/scala/org/apache/spark/util/collection/OpenHashMap.scala
	core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala
	core/src/main/scala/org/apache/spark/util/collection/PrimitiveKeyOpenHashMap.scala
	pom.xml
	project/SparkBuild.scala
	repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala

91227566

Merge pull request #360 from witgo/master · 04d83fc3
Reynold Xin authored 11 years ago
```
fix make-distribution.sh show version: command not found
```
04d83fc3

Merge pull request #357 from hsaputra/set_boolean_paramname · 56ebfeaa

Reynold Xin authored 11 years ago

Set boolean param name for call to SparkHadoopMapReduceUtil.newTaskAttemptID

Set boolean param name for call to SparkHadoopMapReduceUtil.newTaskAttemptID to make it clear which param being set.

56ebfeaa

Merge pull request #358 from pwendell/add-cdh · bdeaeafb

Patrick Wendell authored 11 years ago

Add CDH Repository to Maven Build

At some point this was removed from the Maven build... so I'm adding it back. It's needed for the Hadoop2 tests we run on Jenkins and it's also included in the SBT build.

bdeaeafb

Merge pull request #356 from hsaputra/remove_deprecated_cleanup_method · 5cae05f5

Reynold Xin authored 11 years ago

Remove calls to deprecated mapred's OutputCommitter.cleanupJob

Since Hadoop 1.0.4 the mapred OutputCommitter.commitJob should do cleanup job via call to OutputCommitter.cleanupJob,

Remove SparkHadoopWriter.cleanup since it is used only by PairRDDFunctions.

In fact the implementation of mapred OutputCommitter.commitJob looks like this:

  public void commitJob(JobContext jobContext) throws IOException {
    cleanupJob(jobContext);
  }

5cae05f5

fix make-distribution.sh show version: command not found · cf4aaf92
liguoqiang authored 11 years ago

cf4aaf92

Merge pull request #345 from colorant/yarn · 6eef78d7

Thomas Graves authored 11 years ago

support distributing extra files to worker for yarn client mode

So that user doesn't need to package all dependency into one assemble jar as spark app jar

6eef78d7

Add CDH Repository to Maven Build · 3209a86f
Patrick Wendell authored 11 years ago

3209a86f
Resolve PR review over 100 chars · aa56585d
Henry Saputra authored 11 years ago

aa56585d
Set boolean param name for two files call to SparkHadoopMapReduceUtil.newTaskAttemptID to make · f6b6f883
Henry Saputra authored 11 years ago
```
it clear which param being set.
```
f6b6f883

Remove calls to deprecated mapred's OutputCommitter.cleanupJob because since Hadoop 1.0.4 · 4517326e

Henry Saputra authored 11 years ago

the mapred OutputCommitter.commitJob should do cleanup job.

In fact the implementation of mapred OutputCommitter.commitJob looks like this:

  public void commitJob(JobContext jobContext) throws IOException {
    cleanupJob(jobContext);
  }

(The jobContext input argument is type of org.apache.hadoop.mapred.JobContext)

4517326e

Merge pull request #322 from falaki/MLLibDocumentationImprovement · bb6a39a6

Patrick Wendell authored 11 years ago

SPARK-1009 Updated MLlib docs to show how to use it in Python

In addition added detailed examples for regression, clustering and recommendation algorithms in a separate Scala section. Fixed a few minor issues with existing documentation.

bb6a39a6

Merge pull request #355 from ScrapCodes/patch-1 · cb1b9273
Patrick Wendell authored 11 years ago
```
Update README.md

The link does not work otherwise.
```
cb1b9273

Merge pull request #313 from tdas/project-refactor · c0f0155e

Patrick Wendell authored 11 years ago

Refactored the streaming project to separate external libraries like Twitter, Kafka, Flume, etc.

At a high level, these are the following changes.

1. All the external code was put in `SPARK_HOME/external/` as separate SBT projects and Maven modules. Their artifact names are `spark-streaming-twitter`, `spark-streaming-kafka`, etc. Both SparkBuild.scala and pom.xml files have been updated. References to external libraries and repositories have been removed from the settings of root and streaming projects/modules.

2. To avail the external functionality (say, creating a Twitter stream), the developer has to `import org.apache.spark.streaming.twitter._` . For Scala API, the developer has to call `TwitterUtils.createStream(streamingContext, ...)`. For the Java API, the developer has to call `TwitterUtils.createStream(javaStreamingContext, ...)`.

3. Each external project has its own scala and java unit tests. Note the unit tests of each external library use classes of the streaming unit tests (`TestSuiteBase`, `LocalJavaStreamingContext`, etc.). To enable this code sharing among test classes, `dependsOn(streaming % "compile->compile,test->test")` was used in the SparkBuild.scala . In the streaming/pom.xml, an additional `maven-jar-plugin` was necessary to capture this dependency (see comment inside the pom.xml for more information).

4. Jars of the external projects have been added to examples project but not to the assembly project.

5. In some files, imports have been rearrange to conform to the Spark coding guidelines.

c0f0155e

Update README.md · d1f28057
Prashant Sharma authored 11 years ago
```
The link does not work otherwise.
```
d1f28057

Jan 07, 2014

Merge pull request #336 from liancheng/akka-remote-lookup · f5f12dc2

Patrick Wendell authored 11 years ago

Get rid of `Either[ActorRef, ActorSelection]'

In this pull request, instead of returning an `Either[ActorRef, ActorSelection]`, `registerOrLookup` identifies the remote actor blockingly to obtain an `ActorRef`, or throws an exception if the remote actor doesn't exist or the lookup times out (configured by `spark.akka.lookupTimeout`).  This function is only called when an `SparkEnv` is constructed (instantiating driver or executor), so the blocking call is considered acceptable.  Executor side `ActorSelection`s/`ActorRef`s to driver side `MapOutputTrackerMasterActor` and `BlockManagerMasterActor` are affected by this pull request.

`ActorSelection` is dangerous and should be used with care.  It's only absolutely safe to send messages via an `ActorSelection` when the remote actor is stateless, so that actor incarnation is irrelevant.  But as pointed by @ScrapCodes in the comments below, executor exits immediately once the connection to the driver lost, `ActorSelection`s are not harmful in this scenario.  So this pull request is mostly a code style patch.

f5f12dc2

Merge pull request #327 from lucarosellini/master · 11891e68

Matei Zaharia authored 11 years ago

Added ‘-i’ command line option to Spark REPL

We had to create a new implementation of both scala.tools.nsc.CompilerCommand and scala.tools.nsc.Settings, because using scala.tools.nsc.GenericRunnerSettings would bring in other options (-howtorun, -save and -execute) which don’t make sense in Spark.
Any new Spark specific command line option could now be added to org.apache.spark.repl.SparkRunnerSettings class.

Since the behavior of loading a script from the command line should be the same as loading it using the “:load” command inside the shell, the script should be loaded when the SparkContext is available, that’s why we had to move the call to ‘loadfiles(settings)’ _after_ the call to postInitialization(). This still doesn’t work if ‘isAsync = true’.

11891e68

Merge pull request #354 from hsaputra/addasfheadertosbt · 7d0aac91
Matei Zaharia authored 11 years ago
```
Add ASF header to the new sbt script.

Add ASF header to the new sbt script.
```
7d0aac91

Merge pull request #350 from mateiz/standalone-limit · d75dc428

Matei Zaharia authored 11 years ago

Add way to limit default # of cores used by apps in standalone mode

Also documents the spark.deploy.spreadOut option, and fixes a config option that had a dash in its name.

d75dc428

Fixed merge conflict · 46cb980a
Hossein Falaki authored 11 years ago

46cb980a
Add ASF header to the new sbt script. · 226b58ad
Henry Saputra authored 11 years ago

226b58ad

Merge pull request #352 from markhamstra/oldArch · 61674bca

Patrick Wendell authored 11 years ago

Don't leave os.arch unset after BlockManagerSuite

Recent SparkConf changes meant that BlockManagerSuite was now leaving the os.arch System.property unset.  That's a problem for any subsequent tests that rely upon having a valid os.arch.  This is true for CompressionCodecSuite in the usual maven build test order, even though it isn't usually true for the sbt build.

61674bca

Merge pull request #328 from falaki/MatrixFactorizationModel-fix · b2e690f8

Patrick Wendell authored 11 years ago

SPARK-1012: DAGScheduler Exception Fix

Added a predict method to MatrixFactorizationModel to enable bulk prediction. This method takes and RDD[(Int, Int)] of users and products and return an RDD with a Rating element per each element in the input RDD.

Also added python bindings to the new bulk prediction methods to address SPARK-1011 issue.

This is ready to be merged now.

b2e690f8

Fix BlockManagerSuite#after · 86ed1ad2
Mark Hamstra authored 11 years ago

86ed1ad2
Address review comments · 2c421749
Matei Zaharia authored 11 years ago

2c421749

Merge pull request #351 from pwendell/maven-fix · 6ccf8ce7

Patrick Wendell authored 11 years ago

Add log4j exclusion rule to maven.

To make this work I had to rename the defaults file. Otherwise
maven's pattern matching rules included it when trying to match
other log4j.properties files.

I also fixed a bug in the existing maven build where two
<transformers> tags were present in assembly/pom.xml
such that one overwrote the other.

6ccf8ce7

Merge branch 'master' into MatrixFactorizationModel-fix · 3a8beb46
Hossein Falaki authored 11 years ago

3a8beb46
Fix unit test compilation · 044c8ad3
Matei Zaharia authored 11 years ago

044c8ad3

Add log4j exclusion rule to maven. · e688e112

Patrick Wendell authored 11 years ago

To make this work I had to rename the defaults file. Otherwise
maven's pattern matching rules included it when trying to match
other log4j.properties files.

I also fixed a bug in the existing maven build where two
<transformers> tags were present in assembly/pom.xml
such that one overwrote the other.

e688e112

Add way to limit default # of cores used by applications on standalone mode · d8bcc8e9
Matei Zaharia authored 11 years ago
```
Also documents the spark.deploy.spreadOut option.
```
d8bcc8e9

Merge pull request #337 from yinxusen/mllib-16-bugfix · 7d5fa175

Reynold Xin authored 11 years ago

Mllib 16 bugfix

Bug fix: https://spark-project.atlassian.net/browse/MLLIB-16

Hi, I fixed the bug and added a test suite for `GradientDescent`. There are 2 checks in the test case. First, the final loss must be lower than the initial one. Second, the trend of loss sequence should be decreasing, i.e., at least 80% iterations have lower losses than their prior iterations.

Thanks!

7d5fa175

Merge pull request #349 from CodingCat/support-worker_dir · 71fc1135

Reynold Xin authored 11 years ago

add the comments about SPARK_WORKER_DIR

this env variable seems to be forgotten

in many cases we need to set this variable, e.g. in EC2, we have to move the large application log files from the EBS to the ephemeral storage

71fc1135

Fixed examples/pom.xml and run-example based on Patrick's suggestions. · 8f02f1c3
Tathagata Das authored 11 years ago

8f02f1c3
add the comments about SPARK_WORKER_DIR · 3633172e
CodingCat authored 11 years ago
```
this env variable seems to be forgotten …
```
3633172e