Skip to content
Snippets Groups Projects
Commit ada384b7 authored by Sean Owen's avatar Sean Owen Committed by Andrew Or
Browse files

[SPARK-8437] [DOCS] Corrected: Using directory path without wildcard for...

[SPARK-8437] [DOCS] Corrected: Using directory path without wildcard for filename slow for large number of files with wholeTextFiles and binaryFiles

Note that 'dir/*' can be more efficient in some Hadoop FS implementations that 'dir/' (now fixed scaladoc by using HTML entity for *)

Author: Sean Owen <sowen@cloudera.com>

Closes #7126 from srowen/SPARK-8437.2 and squashes the following commits:

7bb45da [Sean Owen] Note that 'dir/*' can be more efficient in some Hadoop FS implementations that 'dir/' (now fixed scaladoc by using HTML entity for *)
parent 689da28a
No related branches found
No related tags found
No related merge requests found
...@@ -831,7 +831,8 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli ...@@ -831,7 +831,8 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli
* }}} * }}}
* *
* @note Small files are preferred, large file is also allowable, but may cause bad performance. * @note Small files are preferred, large file is also allowable, but may cause bad performance.
* * @note On some filesystems, `.../path/&#42;` can be a more efficient way to read all files
* in a directory rather than `.../path/` or `.../path`
* @param minPartitions A suggestion value of the minimal splitting number for input data. * @param minPartitions A suggestion value of the minimal splitting number for input data.
*/ */
def wholeTextFiles( def wholeTextFiles(
...@@ -878,9 +879,10 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli ...@@ -878,9 +879,10 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli
* (a-hdfs-path/part-nnnnn, its content) * (a-hdfs-path/part-nnnnn, its content)
* }}} * }}}
* *
* @param minPartitions A suggestion value of the minimal splitting number for input data.
*
* @note Small files are preferred; very large files may cause bad performance. * @note Small files are preferred; very large files may cause bad performance.
* @note On some filesystems, `.../path/&#42;` can be a more efficient way to read all files
* in a directory rather than `.../path/` or `.../path`
* @param minPartitions A suggestion value of the minimal splitting number for input data.
*/ */
@Experimental @Experimental
def binaryFiles( def binaryFiles(
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment