- Feb 01, 2013
-
-
Josh Rosen authored
The problem was that the gateway was being initialized whenever the pyspark.context module was loaded. The fix uses lazy initialization that occurs only when SparkContext instances are actually constructed. I also made the gateway and jvm variables private. This change results in ~3-4x performance improvement when running the PySpark unit tests.
-
- Jan 23, 2013
-
-
Josh Rosen authored
Fix minor documentation formatting issues.
-
- Jan 22, 2013
-
-
Josh Rosen authored
-
Josh Rosen authored
-
- Jan 21, 2013
-
-
Josh Rosen authored
This should avoid exceptions caused by existing files with different contents. I also removed some unused code.
-
- Jan 20, 2013
-
-
Josh Rosen authored
-
Josh Rosen authored
-
Josh Rosen authored
-
Matei Zaharia authored
-
- Jan 10, 2013
-
-
Josh Rosen authored
-
- Jan 03, 2013
-
-
Josh Rosen authored
-
- Jan 01, 2013
-
-
Josh Rosen authored
-
- Dec 29, 2012
-
-
Josh Rosen authored
-
Josh Rosen authored
-
- Dec 27, 2012
-
-
Josh Rosen authored
Add options to pyspark.SparkContext constructor.
-
Josh Rosen authored
-
- Dec 26, 2012
-
-
Josh Rosen authored
-
- Dec 24, 2012
-
-
Josh Rosen authored
Passing large volumes of data through Py4J seems to be slow. It appears to be faster to write the data to the local filesystem and read it back from Python.
-
- Oct 19, 2012
-
-
Josh Rosen authored
-
Josh Rosen authored
-
- Aug 27, 2012
-
-
Josh Rosen authored
-
Josh Rosen authored
-
Josh Rosen authored
-
- Aug 21, 2012
-
-
Josh Rosen authored
Objects serialized with JSON can be compared for equality, but JSON can be slow to serialize and only supports a limited range of data types.
-
- Aug 19, 2012
-
-
Josh Rosen authored
-