- Nov 10, 2013
-
-
Josh Rosen authored
For now, this only adds MarshalSerializer, but it lays the groundwork for other supporting custom serializers. Many of these mechanisms can also be used to support deserialization of different data formats sent by Java, such as data encoded by MsgPack. This also fixes a bug in SparkContext.union().
-
- Oct 19, 2013
-
-
Ewen Cheslack-Postava authored
Add a regular method for adding a term to accumulators in pyspark. Currently if you have a non-global accumulator, adding to it is awkward. The += operator can't be used for non-global accumulators captured via closure because it's involves an assignment. The only way to do it is using __iadd__ directly. Adding this method lets you write code like this: def main(): sc = SparkContext() accum = sc.accumulator(0) rdd = sc.parallelize([1,2,3]) def f(x): accum.add(x) rdd.foreach(f) print accum.value where using accum += x instead would have caused UnboundLocalError exceptions in workers. Currently it would have to be written as accum.__iadd__(x).
-
- Jul 16, 2013
-
-
Matei Zaharia authored
-
- Feb 03, 2013
-
-
Josh Rosen authored
-
- Jan 23, 2013
-
-
Josh Rosen authored
cloudpickle runs into issues while pickling subclasses of AccumulatorParam, which may be related to this Python issue: http://bugs.python.org/issue7689 This seems hard to fix and the ABCMeta wasn't necessary, so I removed it.
-
- Jan 22, 2013
-
-
Josh Rosen authored
-
- Jan 20, 2013
-
-
Josh Rosen authored
-
Matei Zaharia authored
-
Matei Zaharia authored
-