-
- Downloads
[SPARK-12004] Preserve the RDD partitioner through RDD checkpointing
The solution is the save the RDD partitioner in a separate file in the RDD checkpoint directory. That is, `<checkpoint dir>/_partitioner`. In most cases, whether the RDD partitioner was recovered or not, does not affect the correctness, only reduces performance. So this solution makes a best-effort attempt to save and recover the partitioner. If either fails, the checkpointing is not affected. This makes this patch safe and backward compatible. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #9983 from tdas/SPARK-12004.
Showing
- core/src/main/scala/org/apache/spark/rdd/ReliableCheckpointRDD.scala 116 additions, 6 deletions...in/scala/org/apache/spark/rdd/ReliableCheckpointRDD.scala
- core/src/main/scala/org/apache/spark/rdd/ReliableRDDCheckpointData.scala 1 addition, 20 deletions...cala/org/apache/spark/rdd/ReliableRDDCheckpointData.scala
- core/src/test/scala/org/apache/spark/CheckpointSuite.scala 56 additions, 5 deletionscore/src/test/scala/org/apache/spark/CheckpointSuite.scala
Please register or sign in to comment