- Sep 09, 2013
-
-
Evan Chan authored
-
Evan Chan authored
listFiles() could return null if the I/O fails, and this currently results in an ugly NPE which is hard to diagnose.
-
Y.CORP.YAHOO.COM\tgraves authored
-
Stephen Haberman authored
-
Stephen Haberman authored
-
- Sep 08, 2013
-
-
Matei Zaharia authored
StandaloneSchedulerBackend instead of the smaller IDs used within Spark (that lack the application name). This was reported by ClearStory in https://github.com/clearstorydata/spark/pull/9. Also fixed some messages that said slave instead of executor.
-
Matei Zaharia authored
-
Patrick Wendell authored
-
Patrick Wendell authored
-
Stephen Haberman authored
Include the useful tip that if shuffle=true, coalesce can actually increase the number of partitions. This makes coalesce more like a generic `RDD.repartition` operation. (Ideally this `RDD.repartition` could automatically choose either a coalesce or a shuffle if numPartitions was either less than or greater than, respectively, the current number of partitions.)
-
Patrick Wendell authored
-
Matei Zaharia authored
Also changed uses of "job" terminology to "application" when they referred to an entire Spark program, to avoid confusion.
-
Matei Zaharia authored
- Add job scheduling docs - Rename some fair scheduler properties - Organize intro page better - Link to Apache wiki for "contributing to Spark"
-
- Sep 07, 2013
-
-
Aaron Davidson authored
-
Aaron Davidson authored
The sc.StorageLevel -> StorageLevel pathway is a bit janky, but otherwise the shell would have to call a private method of SparkContext. Having StorageLevel available in sc also doesn't seem like the end of the world. There may be a better solution, though. As for creating the StorageLevel object itself, this seems to be the best way in Python 2 for creating singleton, enum-like objects: http://stackoverflow.com/questions/36932/how-can-i-represent-an-enum-in-python
-
Reynold Xin authored
-
- Sep 06, 2013
-
-
Aaron Davidson authored
-
Reynold Xin authored
-
Aaron Davidson authored
It uses reflection... I am not proud of that fact, but it at least ensures compatibility (sans refactoring of the StorageLevel stuff).
-
- Sep 05, 2013
-
-
Aaron Davidson authored
-
Aaron Davidson authored
-
Aaron Davidson authored
Caching the results of local actions (e.g., rdd.first()) causes the driver to store entire partitions in its own memory, which may be highly constrained. This patch simply makes the CacheManager avoid caching the result of all locally-run computations.
-
Andrew xia authored
-
Aaron Davidson authored
-
- Sep 04, 2013
-
-
Aaron Davidson authored
-
Aaron Davidson authored
This unit test simply validates that the outputs of the JsonProtocol methods are syntactically valid JSON.
-
- Sep 03, 2013
-
-
Mridul Muralidharan authored
-
Mridul Muralidharan authored
-
Patrick Wendell authored
-
Y.CORP.YAHOO.COM\tgraves authored
-
Y.CORP.YAHOO.COM\tgraves authored
-
Ali Ghodsi authored
-
- Sep 02, 2013
-
-
Ali Ghodsi authored
-
Ali Ghodsi authored
-
Ali Ghodsi authored
-
Ali Ghodsi authored
-
Matei Zaharia authored
-
Matei Zaharia authored
-
Matei Zaharia authored
-
- Sep 01, 2013
-
-
Matei Zaharia authored
-