Skip to content
Snippets Groups Projects
  1. Nov 10, 2013
    • Josh Rosen's avatar
      Add custom serializer support to PySpark. · cbb7f04a
      Josh Rosen authored
      For now, this only adds MarshalSerializer, but it lays the groundwork
      for other supporting custom serializers.  Many of these mechanisms
      can also be used to support deserialization of different data formats
      sent by Java, such as data encoded by MsgPack.
      
      This also fixes a bug in SparkContext.union().
      cbb7f04a
  2. Nov 03, 2013
  3. Oct 22, 2013
    • Ewen Cheslack-Postava's avatar
      Pass self to SparkContext._ensure_initialized. · 317a9eb1
      Ewen Cheslack-Postava authored
      The constructor for SparkContext should pass in self so that we track
      the current context and produce errors if another one is created. Add
      a doctest to make sure creating multiple contexts triggers the
      exception.
      317a9eb1
    • Ewen Cheslack-Postava's avatar
      Add classmethod to SparkContext to set system properties. · 56d230e6
      Ewen Cheslack-Postava authored
      Add a new classmethod to SparkContext to set system properties like is
      possible in Scala/Java. Unlike the Java/Scala implementations, there's
      no access to System until the JVM bridge is created. Since
      SparkContext handles that, move the initialization of the JVM
      connection to a separate classmethod that can safely be called
      repeatedly as long as the same instance (or no instance) is provided.
      56d230e6
  4. Sep 08, 2013
  5. Sep 07, 2013
  6. Sep 06, 2013
  7. Sep 01, 2013
  8. Aug 16, 2013
  9. Jul 29, 2013
    • Matei Zaharia's avatar
      SPARK-815. Python parallelize() should split lists before batching · feba7ee5
      Matei Zaharia authored
      One unfortunate consequence of this fix is that we materialize any
      collections that are given to us as generators, but this seems necessary
      to get reasonable behavior on small collections. We could add a
      batchSize parameter later to bypass auto-computation of batch size if
      this becomes a problem (e.g. if users really want to parallelize big
      generators nicely)
      feba7ee5
  10. Jul 16, 2013
  11. Feb 03, 2013
  12. Feb 01, 2013
  13. Jan 23, 2013
  14. Jan 22, 2013
  15. Jan 21, 2013
  16. Jan 20, 2013
  17. Jan 10, 2013
  18. Jan 03, 2013
  19. Jan 01, 2013
  20. Dec 29, 2012
  21. Dec 27, 2012
  22. Dec 26, 2012
  23. Dec 24, 2012
  24. Oct 19, 2012
  25. Aug 27, 2012
  26. Aug 21, 2012
Loading