Commits · e211f405bcb3cf02c3ae589cf81d9c9dfc70bc03 · cs525-sp18-g07 / spark

Feb 01, 2013
- Use spark.local.dir for PySpark temp files (SPARK-580). · e211f405
  Josh Rosen authored 12 years ago
  
  e211f405
- Merge pull request #438 from JoshRosen/spark-674 · b6a60921
  Matei Zaharia authored 12 years ago
  
  Do not launch JavaGateways on workers (SPARK-674).
  b6a60921
- Do not launch JavaGateways on workers (SPARK-674). · 9cc6ff9c
  Josh Rosen authored 12 years ago
  
  The problem was that the gateway was being initialized whenever the pyspark.context module was loaded. The fix uses lazy initialization that occurs only when SparkContext instances are actually constructed. I also made the gateway and jvm variables private. This change results in ~3-4x performance improvement when running the PySpark unit tests.
  9cc6ff9c
- Merge pull request #433 from rxin/master · 571af313
  Matei Zaharia authored 12 years ago
  
  Changed PartitionPruningRDD's split to make sure it returns the correct split index.
  571af313
- Merge pull request #435 from JoshRosen/pyspark_stdout_fix · 5ce5efec
  Matei Zaharia authored 12 years ago
  
  Fix stdout redirection in PySpark.
  5ce5efec
- Fix stdout redirection in PySpark. · 57b64d0d
  Josh Rosen authored 12 years ago
  
  57b64d0d
- Moved PruneDependency into PartitionPruningRDD.scala. · f9af9cee
  Reynold Xin authored 12 years ago
  
  f9af9cee
Jan 31, 2013
- Merge pull request #434 from pwendell/python-exceptions · 7e2e046e
  Matei Zaharia authored 12 years ago
  
  SPARK-673: Capture and re-throw Python exceptions
  7e2e046e
- Small fix from last commit · 39ab83e9
  Patrick Wendell authored 12 years ago
  
  39ab83e9
- Some style cleanup · c33f0ef4
  Patrick Wendell authored 12 years ago
  
  c33f0ef4
- Merge pull request #431 from mbautin/revert_default_profile · 95e14fbc
  Matei Zaharia authored 12 years ago
  
  Remove activation of profiles by default
  95e14fbc
- SPARK-673: Capture and re-throw Python exceptions · 3446d5c8
  Patrick Wendell authored 12 years ago
  
  This patch alters the Python <-> executor protocol to pass on exception data when they occur in user Python code.
  3446d5c8
- Removed the TODO comment from PartitionPruningRDD. · 6289d965
  Reynold Xin authored 12 years ago
  
  6289d965
- Changed PartitionPruningRDD's split to make sure it returns the correct · 5b0fc265
  Reynold Xin authored 12 years ago
  
  split index.
  5b0fc265
- Remove activation of profiles by default · fe3eceab
  Mikhail Bautin authored 12 years ago
  
  See the discussion at https://github.com/mesos/spark/pull/355 for why default profile activation is a problem.
  fe3eceab
Jan 30, 2013
- Merge pull request #430 from pwendell/pyspark-guide · 55327a28
  Matei Zaharia authored 12 years ago
  
  Minor improvements to PySpark docs
  55327a28
- Make module help available in python shell. · 3f945e3b
  Patrick Wendell authored 12 years ago
  
  Also, adds a line in doc explaining how to use.
  3f945e3b
- Inclue packaging and launching pyspark in guide. · 58a7d320
  Patrick Wendell authored 12 years ago
  
  It's nicer if all the commands you need are made explicit.
  58a7d320
- Merge pull request #426 from woggling/conn-manager-ips · d12330bd
  Matei Zaharia authored 12 years ago
  
  Remember ConnectionManagerId used to initiate SendingConnections
  d12330bd
- Merge pull request #428 from woggling/mesos-exec-id · 612a9fee
  Matei Zaharia authored 12 years ago
  
  Make ExecutorIDs include SlaveIDs when running Mesos
  612a9fee
- Merge pull request #429 from stephenh/includemessage · dfb721b9
  Matei Zaharia authored 12 years ago
  
  Include message and exitStatus if availalbe.
  dfb721b9
- Include message and exitStatus if availalbe. · 871476d5
  Stephen Haberman authored 12 years ago
  
  871476d5
- Remove remants of attempt to use slaveId-executorId in MesosExecutorBackend · 252845d3
  Charles Reiss authored 12 years ago
  
  252845d3
- Use Mesos ExecutorIDs to hold SlaveIDs. Then we can safely use · f7de6978
  Charles Reiss authored 12 years ago
  
  the Mesos ExecutorID as a Spark ExecutorID.
  f7de6978
Jan 29, 2013

Remember ConnectionManagerId used to initiate SendingConnections. · 16a0789e

Charles Reiss authored 12 years ago

This prevents ConnectionManager from getting confused if a machine
has multiple host names and the one getHostName() finds happens
not to be the one that was passed from, e.g., the BlockManagerMaster.

16a0789e

Merge remote-tracking branch 'stephenh/removefailedjob' · d54b10b6
Matei Zaharia authored 12 years ago
```
Conflicts:
	core/src/main/scala/spark/deploy/master/Master.scala
```
d54b10b6
Merge pull request #425 from stephenh/toDebugString · ccb67ff2
Matei Zaharia authored 12 years ago
```
Add RDD.toDebugString.
```
ccb67ff2
Merge pull request #415 from stephenh/driver · 9ae11603
Matei Zaharia authored 12 years ago
```
Replace old 'master' term with 'driver'.
```
9ae11603

Simplify checkpointing code and RDD class a little: · 64ba6a8c

Matei Zaharia authored 12 years ago

- RDD's getDependencies and getSplits methods are now guaranteed to be
  called only once, so subclasses can safely do computation in there
  without worrying about caching the results.

- The management of a "splits_" variable that is cleared out when we
  checkpoint an RDD is now done in the RDD class.

- A few of the RDD subclasses are simpler.

- CheckpointRDD's compute() method no longer assumes that it is given a
  CheckpointRDDSplit -- it can work just as well on a split from the
  original RDD, because it only looks at its index. This is important
  because things like UnionRDD and ZippedRDD remember the parent's
  splits as part of their own and wouldn't work on checkpointed parents.

- RDD.iterator can now reuse cached data if an RDD is computed before it
  is checkpointed. It seems like it wouldn't do this before (it always
  called iterator() on the CheckpointRDD, which read from HDFS).

64ba6a8c

Fix code that depended on metadata cleaner interval being in minutes · b29599e5
Matei Zaharia authored 12 years ago

b29599e5
Include name, if set, in RDD.toString(). · cbf72bff
Stephen Haberman authored 12 years ago

cbf72bff
Add number of splits. · 3cda14af
Stephen Haberman authored 12 years ago

3cda14af
Merge branch 'master' of github.com:mesos/spark · a1ecec8d
Matei Zaharia authored 12 years ago

a1ecec8d
Add JavaRDDLike.toDebugString(). · 951cfd9b
Stephen Haberman authored 12 years ago

951cfd9b
Merge pull request #413 from pwendell/stage-logging · f6eb1f08
Matei Zaharia authored 12 years ago
```
SPARK-658: Adding logging of stage duration
```
f6eb1f08

Jan 28, 2013

Add RDD.toDebugString. · b45857c9
Stephen Haberman authored 12 years ago
```
Original idea by Nathan Kronenfeld.
```
b45857c9
Units from ms -> s · 7ee824e4
Patrick Wendell authored 12 years ago

7ee824e4

Merge branch 'master' into driver · 13368818

Stephen Haberman authored 12 years ago

Conflicts:
	core/src/main/scala/spark/SparkContext.scala
	core/src/main/scala/spark/SparkEnv.scala
	core/src/main/scala/spark/deploy/LocalSparkCluster.scala
	core/src/main/scala/spark/executor/StandaloneExecutorBackend.scala
	core/src/main/scala/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
	core/src/main/scala/spark/scheduler/cluster/StandaloneClusterMessage.scala
	core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
	core/src/main/scala/spark/storage/BlockManagerMaster.scala
	core/src/main/scala/spark/storage/ThreadingTest.scala
	core/src/test/scala/spark/MapOutputTrackerSuite.scala

13368818

Merge pull request #424 from pwendell/logging-cleanup · dda2ce01
Matei Zaharia authored 12 years ago
```
Some DEBUG-level log cleanup.
```
dda2ce01
Merge pull request #423 from squito/long_float_accums · 8160f03a
Matei Zaharia authored 12 years ago
```
add long and float accumulatorparams
```
8160f03a