Commits · 4ae101ff38f4cd107e54d97d9493ecd4a0c3a633 · cs525-sp18-g07 / spark

Jan 03, 2014
- Merge pull request #317 from ScrapCodes/spark-915-segregate-scripts · 4ae101ff
  Patrick Wendell authored 11 years ago
  
  Spark-915 segregate scripts
  4ae101ff
- sbin/compute-classpath* bin/compute-classpath* · 9ae382c3
  Prashant Sharma authored 11 years ago
  
  9ae382c3
- sbin/spark-class* -> bin/spark-class* · 74ba97fc
  Prashant Sharma authored 11 years ago
  
  74ba97fc
- Merge pull request #285 from colorant/yarn-refactor · 30b9db0a
  Patrick Wendell authored 11 years ago
  
  Yarn refactor
  30b9db0a
- fix docs for yarn · f442afc2
  Raymond Liu authored 11 years ago
  
  f442afc2
Jan 02, 2014

minor fix for loginfo · 18b3633e
Raymond Liu authored 11 years ago

18b3633e
move duplicate pom config into parent pom · c5902940
Raymond Liu authored 11 years ago

c5902940
Using name yarn-alpha/yarn instead of yarn-2.0/yarn-2.2 · ebdfa6bb
Raymond Liu authored 11 years ago

ebdfa6bb
Add yarn/common/src/test dir in building script · a47ebf72
Raymond Liu authored 11 years ago

a47ebf72
Fix yarn/README.md · ddc5054b
Raymond Liu authored 11 years ago

ddc5054b
Clean up unused files for yarn · 79b6b4dd
Raymond Liu authored 11 years ago

79b6b4dd
Fix pom for build yarn/2.x with yarn/common into one jar · 7c96faee
Raymond Liu authored 11 years ago

7c96faee
Use unmanaged source dir to include common yarn code · d1a6f7aa
Raymond Liu authored 11 years ago

d1a6f7aa
merge yarn/scheduler yarn/common code into one directory · c5422e02
Raymond Liu authored 11 years ago

c5422e02
Need to send dummy hello message to actually estabilish akka connection. · ad607100
Raymond Liu authored 11 years ago

ad607100
A few clean up for yarn 2.0 code · dd6d347f
Raymond Liu authored 11 years ago

dd6d347f
Update maven build documentation · 7815a3ac
Raymond Liu authored 11 years ago

7815a3ac
Fix yarn/README.md and update docs/running-on-yarn.md · be343d2a
Raymond Liu authored 11 years ago

be343d2a
Add README for yarn modules · 67cd752e
Raymond Liu authored 11 years ago

67cd752e
some code clean up for Yarn 2.2 · e867e311
Raymond Liu authored 11 years ago

e867e311
Fix pom file for scala binary version · 88186617
Raymond Liu authored 11 years ago

88186617
Fix yarn/assemble pom file · 96e25e56
Raymond Liu authored 11 years ago

96e25e56
Change profile name new-yarn to hadoop2.2-yarn · aec96dd1
Raymond Liu authored 11 years ago

aec96dd1
Fix pom for yarn code reorgnaize commit · d1528c7c
Raymond Liu authored 11 years ago

d1528c7c
Reorganize yarn related codes into sub projects to remove duplicate files. · 3dc379ce
Raymond Liu authored 11 years ago

3dc379ce

Merge pull request #323 from tgravescs/sparkconf_yarn_fix · 498a5f0a

Patrick Wendell authored 11 years ago

fix spark on yarn after the sparkConf changes

This fixes it so that spark on yarn now compiles and works after the sparkConf changes.

There are also other issues I discovered along the way that are broken:
- mvn builds for yarn don't assemble correctly
- unset SPARK_EXAMPLES_JAR isn't handled properly anymore
- I'm pretty sure spark.conf doesn't actually work as its not distributed with yarn

those things can be fixed in separate pr unless others disagree.

498a5f0a

Merge pull request #320 from kayousterhout/erroneous_failed_msg · 0475ca8f

Reynold Xin authored 11 years ago

Remove erroneous FAILED state for killed tasks.

Currently, when tasks are killed, the Executor first sends a
status update for the task with a "KILLED" state, and then
sends a second status update with a "FAILED" state saying that
the task failed due to an exception. The second FAILED state is
misleading/unncessary, and occurs due to a NonLocalReturnControl
Exception that gets thrown due to the way we kill tasks. This
commit eliminates that problem.

I'm not at all sure that this is the best way to fix this problem,
so alternate suggestions welcome. @rxin guessing you're the right
person to look at this.

0475ca8f

fix yarn-client · fced7885
Thomas Graves authored 11 years ago

fced7885
Fix yarn build after sparkConf changes · c6de982b
Thomas Graves authored 11 years ago

c6de982b

Merge pull request #297 from tdas/window-improvement · 588a1695

Patrick Wendell authored 11 years ago

Improvements to DStream window ops and refactoring of Spark's CheckpointSuite

- Added a new RDD - PartitionerAwareUnionRDD. Using this RDD, one can take multiple RDDs partitioned by the same partitioner and unify them into a single RDD while preserving the partitioner. So m RDDs with p partitions each will be unified to a single RDD with p partitions and the same partitioner. The preferred location for each partition of the unified RDD will be the most common preferred location of the corresponding partitions of the parent RDDs. For example, location of partition 0 of the unified RDD will be where most of partition 0 of the parent RDDs are located.
- Improved the performance of DStream's reduceByKeyAndWindow and groupByKeyAndWindow. Both these operations work by doing per-batch reduceByKey/groupByKey and then using PartitionerAwareUnionRDD to union the RDDs across the window. This eliminates a shuffle related to the window operation, which can reduce batch processing time by 30-40% for simple workloads.
- Fixed bugs and simplified Spark's CheckpointSuite. Some of the tests were incorrect and unreliable. Added missing tests for ZippedRDD. I can go into greater detail if necessary.
- Added mapSideCombine option to combineByKeyAndWindow.

588a1695

Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/incubator-spark · 7bafb68d
Matei Zaharia authored 11 years ago

7bafb68d

Merge pull request #319 from kayousterhout/remove_error_method · 5e67cdc8

Reynold Xin authored 11 years ago

Removed redundant TaskSetManager.error() function.

This function was leftover from a while ago, and now just
passes all calls through to the abort() function, so this
commit deletes it.

5e67cdc8

Merge pull request #311 from tmyklebu/master · ca67909c

Matei Zaharia authored 11 years ago

SPARK-991: Report information gleaned from a Python stacktrace in the UI

Scala:

- Added setCallSite/clearCallSite to SparkContext and JavaSparkContext.
  These functions mutate a LocalProperty called "externalCallSite."
- Add a wrapper, getCallSite, that checks for an externalCallSite and, if
  none is found, calls the usual Utils.formatSparkCallSite.
- Change everything that calls Utils.formatSparkCallSite to call
  getCallSite instead. Except getCallSite.
- Add wrappers to setCallSite/clearCallSite wrappers to JavaSparkContext.

Python:

- Add a gruesome hack to rdd.py that inspects the traceback and guesses
  what you want to see in the UI.
- Add a RAII wrapper around said gruesome hack that calls
  setCallSite/clearCallSite as appropriate.
- Wire said RAII wrapper up around three calls into the Scala code.
  I'm not sure that I hit all the spots with the RAII wrapper. I'm also
  not sure that my gruesome hack does exactly what we want.

One could also approach this change by refactoring
runJob/submitJob/runApproximateJob to take a call site, then threading
that parameter through everything that needs to know it.

One might object to the pointless-looking wrappers in JavaSparkContext.
Unfortunately, I can't directly access the SparkContext from
Python---or, if I can, I don't know how---so I need to wrap everything
that matters in JavaSparkContext.

Conflicts:
	core/src/main/scala/org/apache/spark/api/java/JavaSparkContext.scala

ca67909c

Remove erroneous FAILED state for killed tasks. · a1b438d9

Kay Ousterhout authored 11 years ago

Currently, when tasks are killed, the Executor first sends a
status update for the task with a "KILLED" state, and then
sends a second status update with a "FAILED" state saying that
the task failed due to an exception. The second FAILED state is
misleading/unncessary, and occurs due to a NonLocalReturnControl
Exception that gets thrown due to the way we kill tasks. This
commit eliminates that problem.

a1b438d9

Removed redundant TaskSetManager.error() function. · 5a3c00c9

Kay Ousterhout authored 11 years ago

This function was leftover from a while ago, and now just
passes all calls through to the abort() function, so this
commit deletes it.

5a3c00c9

a few left over document change · 59e8009b
Prashant Sharma authored 11 years ago

59e8009b
pyspark -> bin/pyspark · a3f90a2e
Prashant Sharma authored 11 years ago

a3f90a2e
run-example -> bin/run-example · 94b7a7fe
Prashant Sharma authored 11 years ago

94b7a7fe
spark-shell -> bin/spark-shell · b810a85c
Prashant Sharma authored 11 years ago

b810a85c

Merge branch 'scripts-reorg' of github.com:shane-huang/incubator-spark into... · 980afd28

Prashant Sharma authored 11 years ago

Merge branch 'scripts-reorg' of github.com:shane-huang/incubator-spark into spark-915-segregate-scripts

Conflicts:
	bin/spark-shell
	core/pom.xml
	core/src/main/scala/org/apache/spark/SparkContext.scala
	core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala
	core/src/main/scala/org/apache/spark/ui/UIWorkloadGenerator.scala
	core/src/test/scala/org/apache/spark/DriverSuite.scala
	python/run-tests
	sbin/compute-classpath.sh
	sbin/spark-class
	sbin/stop-slaves.sh

980afd28