- Sep 29, 2012
-
-
Matei Zaharia authored
-
Matei Zaharia authored
-
Matei Zaharia authored
Added mapPartitionsWithSplit to the programming guide.
-
Matei Zaharia authored
-
Matei Zaharia authored
-
Matei Zaharia authored
-
Matei Zaharia authored
number of open files. Also optimized sending of disk-based blocks.
-
Reynold Xin authored
-
Matei Zaharia authored
Allow controlling number of splits in distinct().
-
Josh Rosen authored
-
Josh Rosen authored
-
Matei Zaharia authored
-
Matei Zaharia authored
-
- Sep 28, 2012
-
-
Matei Zaharia authored
-
Matei Zaharia authored
-
Matei Zaharia authored
-
Matei Zaharia authored
-
Matei Zaharia authored
Log message which records RDD origin
-
Matei Zaharia authored
custom serializers or Kryo registrators can be loaded.
-
Patrick Wendell authored
-
Patrick Wendell authored
-
Patrick Wendell authored
This adds tracking to determine the "origin" of an RDD. Origin is defined by the boundary between the user's code and the spark code, during an RDD's instantiation. It is meant to help users understand where a Spark RDD is coming from in their code. This patch also logs origin data when stages are submitted to the scheduler. Finally, it adds a new log message to fix an inconsitency in the way that dependent stages (those missing parents) and independent stages (those without) are logged during submission.
-
Matei Zaharia authored
-
- Sep 27, 2012
-
-
Matei Zaharia authored
-
Matei Zaharia authored
-
Matei Zaharia authored
-
Matei Zaharia authored
-
Matei Zaharia authored
-
Matei Zaharia authored
-
Matei Zaharia authored
Added MapPartitionsWithSplitRDD.
-
Matei Zaharia authored
-
- Sep 26, 2012
-
-
Matei Zaharia authored
-
Matei Zaharia authored
-
Reynold Xin authored
-
Matei Zaharia authored
-
Matei Zaharia authored
-
Matei Zaharia authored
number of samples used for sorting
-
Matei Zaharia authored
-
Matei Zaharia authored
-
Matei Zaharia authored
-