Commit 4608902f authored 12 years ago by Josh Rosen

Use filesystem to collect RDDs in PySpark.

Passing large volumes of data through Py4J seems
to be slow.  It appears to be faster to write the
data to the local filesystem and read it back from
Python.

parent ccd075cf

No related branches found

No related tags found

No related merge requests found

Hide whitespace changes

Inline Side-by-side

Showing with 66 additions and 63 deletions

Please register or to comment