- Feb 01, 2013
-
-
Josh Rosen authored
-
Matei Zaharia authored
Do not launch JavaGateways on workers (SPARK-674).
-
Josh Rosen authored
The problem was that the gateway was being initialized whenever the pyspark.context module was loaded. The fix uses lazy initialization that occurs only when SparkContext instances are actually constructed. I also made the gateway and jvm variables private. This change results in ~3-4x performance improvement when running the PySpark unit tests.
-
Matei Zaharia authored
Changed PartitionPruningRDD's split to make sure it returns the correct split index.
-
Matei Zaharia authored
Fix stdout redirection in PySpark.
-
Josh Rosen authored
-
Reynold Xin authored
-
- Jan 31, 2013
-
-
Matei Zaharia authored
SPARK-673: Capture and re-throw Python exceptions
-
Patrick Wendell authored
-
Patrick Wendell authored
-
Matei Zaharia authored
Remove activation of profiles by default
-
Patrick Wendell authored
This patch alters the Python <-> executor protocol to pass on exception data when they occur in user Python code.
-
Reynold Xin authored
-
Reynold Xin authored
split index.
-
Mikhail Bautin authored
See the discussion at https://github.com/mesos/spark/pull/355 for why default profile activation is a problem.
-
- Jan 30, 2013
-
-
Matei Zaharia authored
Minor improvements to PySpark docs
-
Patrick Wendell authored
Also, adds a line in doc explaining how to use.
-
Patrick Wendell authored
It's nicer if all the commands you need are made explicit.
-
Matei Zaharia authored
Remember ConnectionManagerId used to initiate SendingConnections
-
Matei Zaharia authored
Make ExecutorIDs include SlaveIDs when running Mesos
-
Matei Zaharia authored
Include message and exitStatus if availalbe.
-
Stephen Haberman authored
-
Charles Reiss authored
-
Charles Reiss authored
the Mesos ExecutorID as a Spark ExecutorID.
-
- Jan 29, 2013
-
-
Charles Reiss authored
This prevents ConnectionManager from getting confused if a machine has multiple host names and the one getHostName() finds happens not to be the one that was passed from, e.g., the BlockManagerMaster.
-
Matei Zaharia authored
Conflicts: core/src/main/scala/spark/deploy/master/Master.scala
-
Matei Zaharia authored
Add RDD.toDebugString.
-
Matei Zaharia authored
Replace old 'master' term with 'driver'.
-
Matei Zaharia authored
- RDD's getDependencies and getSplits methods are now guaranteed to be called only once, so subclasses can safely do computation in there without worrying about caching the results. - The management of a "splits_" variable that is cleared out when we checkpoint an RDD is now done in the RDD class. - A few of the RDD subclasses are simpler. - CheckpointRDD's compute() method no longer assumes that it is given a CheckpointRDDSplit -- it can work just as well on a split from the original RDD, because it only looks at its index. This is important because things like UnionRDD and ZippedRDD remember the parent's splits as part of their own and wouldn't work on checkpointed parents. - RDD.iterator can now reuse cached data if an RDD is computed before it is checkpointed. It seems like it wouldn't do this before (it always called iterator() on the CheckpointRDD, which read from HDFS).
-
Matei Zaharia authored
-
Stephen Haberman authored
-
Stephen Haberman authored
-
Matei Zaharia authored
-
Stephen Haberman authored
-
Matei Zaharia authored
SPARK-658: Adding logging of stage duration
-
- Jan 28, 2013
-
-
Stephen Haberman authored
Original idea by Nathan Kronenfeld.
-
Patrick Wendell authored
-
Stephen Haberman authored
Conflicts: core/src/main/scala/spark/SparkContext.scala core/src/main/scala/spark/SparkEnv.scala core/src/main/scala/spark/deploy/LocalSparkCluster.scala core/src/main/scala/spark/executor/StandaloneExecutorBackend.scala core/src/main/scala/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala core/src/main/scala/spark/scheduler/cluster/StandaloneClusterMessage.scala core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala core/src/main/scala/spark/storage/BlockManagerMaster.scala core/src/main/scala/spark/storage/ThreadingTest.scala core/src/test/scala/spark/MapOutputTrackerSuite.scala
-
Matei Zaharia authored
Some DEBUG-level log cleanup.
-
Matei Zaharia authored
add long and float accumulatorparams
-