Skip to content
Snippets Groups Projects
  • Ewen Cheslack-Postava's avatar
    7eaa56de
    Add an add() method to pyspark accumulators. · 7eaa56de
    Ewen Cheslack-Postava authored
    Add a regular method for adding a term to accumulators in
    pyspark. Currently if you have a non-global accumulator, adding to it
    is awkward. The += operator can't be used for non-global accumulators
    captured via closure because it's involves an assignment. The only way
    to do it is using __iadd__ directly.
    
    Adding this method lets you write code like this:
    
    def main():
        sc = SparkContext()
        accum = sc.accumulator(0)
    
        rdd = sc.parallelize([1,2,3])
        def f(x):
            accum.add(x)
        rdd.foreach(f)
        print accum.value
    
    where using accum += x instead would have caused UnboundLocalError
    exceptions in workers. Currently it would have to be written as
    accum.__iadd__(x).
    7eaa56de
    History
    Add an add() method to pyspark accumulators.
    Ewen Cheslack-Postava authored
    Add a regular method for adding a term to accumulators in
    pyspark. Currently if you have a non-global accumulator, adding to it
    is awkward. The += operator can't be used for non-global accumulators
    captured via closure because it's involves an assignment. The only way
    to do it is using __iadd__ directly.
    
    Adding this method lets you write code like this:
    
    def main():
        sc = SparkContext()
        accum = sc.accumulator(0)
    
        rdd = sc.parallelize([1,2,3])
        def f(x):
            accum.add(x)
        rdd.foreach(f)
        print accum.value
    
    where using accum += x instead would have caused UnboundLocalError
    exceptions in workers. Currently it would have to be written as
    accum.__iadd__(x).