Commits · 1b74a27da026aba7dbe2088ee64974d772feb23d · cs525-sp18-g07 / spark

Nov 10, 2013

Add custom serializer support to PySpark. · cbb7f04a

Josh Rosen authored 11 years ago

For now, this only adds MarshalSerializer, but it lays the groundwork
for other supporting custom serializers.  Many of these mechanisms
can also be used to support deserialization of different data formats
sent by Java, such as data encoded by MsgPack.

This also fixes a bug in SparkContext.union().

cbb7f04a

Oct 19, 2013

Add an add() method to pyspark accumulators. · 7eaa56de

Ewen Cheslack-Postava authored 11 years ago

Add a regular method for adding a term to accumulators in
pyspark. Currently if you have a non-global accumulator, adding to it
is awkward. The += operator can't be used for non-global accumulators
captured via closure because it's involves an assignment. The only way
to do it is using __iadd__ directly.

Adding this method lets you write code like this:

def main():
    sc = SparkContext()
    accum = sc.accumulator(0)

    rdd = sc.parallelize([1,2,3])
    def f(x):
        accum.add(x)
    rdd.foreach(f)
    print accum.value

where using accum += x instead would have caused UnboundLocalError
exceptions in workers. Currently it would have to be written as
accum.__iadd__(x).

7eaa56de

Jul 16, 2013
- Add Apache license headers and LICENSE and NOTICE files · af3c9d50
  Matei Zaharia authored 12 years ago
  
  af3c9d50
Feb 03, 2013
- Remove unnecessary doctest __main__ methods. · e6172911
  Josh Rosen authored 12 years ago
  
  e6172911
Jan 23, 2013

Remove use of abc.ABCMeta due to cloudpickle issue. · b47d054c

Josh Rosen authored 12 years ago

cloudpickle runs into issues while pickling subclasses of AccumulatorParam,
which may be related to this Python issue:

    http://bugs.python.org/issue7689

This seems hard to fix and the ABCMeta wasn't necessary, so I removed it.

b47d054c

Jan 22, 2013
- Make AccumulatorParam an abstract base class. · c75ae362
  Josh Rosen authored 12 years ago
  
  c75ae362
Jan 20, 2013
- Add __repr__ to Accumulator; fix bug in sc.accumulator · 17035db1
  Josh Rosen authored 12 years ago
  
  17035db1
- Add a class comment to Accumulator · a23ed25f
  Matei Zaharia authored 12 years ago
  
  a23ed25f
- Added accumulators to PySpark · 8e7f098a
  Matei Zaharia authored 12 years ago
  
  8e7f098a