-
- Downloads
[SPARK-1415] Hadoop min split for wholeTextFiles()
JIRA issue [here](https://issues.apache.org/jira/browse/SPARK-1415). New Hadoop API of `InputFormat` does not provide the `minSplits` parameter, which makes the API incompatible between `HadoopRDD` and `NewHadoopRDD`. The PR is for constructing compatible APIs. Though `minSplits` is deprecated by New Hadoop API, we think it is better to make APIs compatible here. **Note** that `minSplits` in `wholeTextFiles` could only be treated as a *suggestion*, the real number of splits may not be greater than `minSplits` due to `isSplitable()=false`. Author: Xusen Yin <yinxusen@gmail.com> Closes #376 from yinxusen/hadoop-min-split and squashes the following commits: 76417f6 [Xusen Yin] refine comments c10af60 [Xusen Yin] refine comments and rewrite new class for wholeTextFile 766d05b [Xusen Yin] refine Java API and comments 4875755 [Xusen Yin] add minSplits for WholeTextFiles
Showing
- core/src/main/scala/org/apache/spark/SparkContext.scala 12 additions, 5 deletionscore/src/main/scala/org/apache/spark/SparkContext.scala
- core/src/main/scala/org/apache/spark/api/java/JavaSparkContext.scala 13 additions, 1 deletion...in/scala/org/apache/spark/api/java/JavaSparkContext.scala
- core/src/main/scala/org/apache/spark/input/WholeTextFileInputFormat.scala 14 additions, 0 deletions...ala/org/apache/spark/input/WholeTextFileInputFormat.scala
- core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala 49 additions, 11 deletionscore/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala
- core/src/test/java/org/apache/spark/JavaAPISuite.java 1 addition, 1 deletioncore/src/test/java/org/apache/spark/JavaAPISuite.java
- core/src/test/scala/org/apache/spark/input/WholeTextFileRecordReaderSuite.scala 1 addition, 1 deletion...g/apache/spark/input/WholeTextFileRecordReaderSuite.scala
Loading
Please register or sign in to comment