- Sep 08, 2013
-
-
Stephen Haberman authored
Include the useful tip that if shuffle=true, coalesce can actually increase the number of partitions. This makes coalesce more like a generic `RDD.repartition` operation. (Ideally this `RDD.repartition` could automatically choose either a coalesce or a shuffle if numPartitions was either less than or greater than, respectively, the current number of partitions.)
-
Matei Zaharia authored
SPARK-660: Add StorageLevel support in Python
-
Aaron Davidson authored
-
Matei Zaharia authored
Provide docs to describe running on CDH/HDP cluster.
-
- Sep 07, 2013
-
-
Patrick Wendell authored
Adding Apache license to two files
-
Patrick Wendell authored
-
Aaron Davidson authored
-
Patrick Wendell authored
-
Matei Zaharia authored
0.8 Doc changes for make-distribution.sh
-
Matei Zaharia authored
Fixed the bug that ResultTask was not properly deserializing outputId.
-
Patrick Wendell authored
-
Aaron Davidson authored
The sc.StorageLevel -> StorageLevel pathway is a bit janky, but otherwise the shell would have to call a private method of SparkContext. Having StorageLevel available in sc also doesn't seem like the end of the world. There may be a better solution, though. As for creating the StorageLevel object itself, this seems to be the best way in Python 2 for creating singleton, enum-like objects: http://stackoverflow.com/questions/36932/how-can-i-represent-an-enum-in-python
-
Evan Chan authored
-
Matei Zaharia authored
YARN build fixes
-
Reynold Xin authored
-
- Sep 06, 2013
-
-
Aaron Davidson authored
-
Patrick Wendell authored
Docs describing Spark monitoring and instrumentation
-
Evan Chan authored
-
Evan Chan authored
-
Evan Chan authored
-
Evan Chan authored
-
Patrick Wendell authored
-
Patrick Wendell authored
This doc consolidates information relevant to CDH/HDP users in a single place.
-
Jey Kottalam authored
-
Jey Kottalam authored
-
Jey Kottalam authored
-
Reynold Xin authored
-
Patrick Wendell authored
SPARK-821: Don't cache results when action run locally on driver
-
Aaron Davidson authored
It uses reflection... I am not proud of that fact, but it at least ensures compatibility (sans refactoring of the StorageLevel stuff).
-
- Sep 05, 2013
-
-
Aaron Davidson authored
-
Matei Zaharia authored
[SPARK-864]DAGScheduler Exception if we delete Worker and StandaloneExecutorBackend then add Worker
-
Aaron Davidson authored
-
Aaron Davidson authored
Caching the results of local actions (e.g., rdd.first()) causes the driver to store entire partitions in its own memory, which may be highly constrained. This patch simply makes the CacheManager avoid caching the result of all locally-run computations.
-
Andrew xia authored
-
Patrick Wendell authored
SPARK-884: Add unit test to validate Spark JSON output
-
Aaron Davidson authored
-
- Sep 04, 2013
-
-
Aaron Davidson authored
-
Matei Zaharia authored
Updating assembly README to reflect recent changes in the build.
-
Konstantin Boudnik authored
-
Aaron Davidson authored
This unit test simply validates that the outputs of the JsonProtocol methods are syntactically valid JSON.
-