Skip to content
Snippets Groups Projects
  1. Nov 10, 2013
    • Josh Rosen's avatar
      Add custom serializer support to PySpark. · cbb7f04a
      Josh Rosen authored
      For now, this only adds MarshalSerializer, but it lays the groundwork
      for other supporting custom serializers.  Many of these mechanisms
      can also be used to support deserialization of different data formats
      sent by Java, such as data encoded by MsgPack.
      
      This also fixes a bug in SparkContext.union().
      cbb7f04a
  2. Oct 19, 2013
    • Ewen Cheslack-Postava's avatar
      Add an add() method to pyspark accumulators. · 7eaa56de
      Ewen Cheslack-Postava authored
      Add a regular method for adding a term to accumulators in
      pyspark. Currently if you have a non-global accumulator, adding to it
      is awkward. The += operator can't be used for non-global accumulators
      captured via closure because it's involves an assignment. The only way
      to do it is using __iadd__ directly.
      
      Adding this method lets you write code like this:
      
      def main():
          sc = SparkContext()
          accum = sc.accumulator(0)
      
          rdd = sc.parallelize([1,2,3])
          def f(x):
              accum.add(x)
          rdd.foreach(f)
          print accum.value
      
      where using accum += x instead would have caused UnboundLocalError
      exceptions in workers. Currently it would have to be written as
      accum.__iadd__(x).
      7eaa56de
  3. Jul 16, 2013
  4. Feb 03, 2013
  5. Jan 23, 2013
  6. Jan 22, 2013
  7. Jan 20, 2013
Loading