Skip to content
Snippets Groups Projects
Commit 9a0272fb authored by Yuhao Yang's avatar Yuhao Yang Committed by Sean Owen
Browse files

[SPARK-6177][MLlib]Add note in LDA example to remind possible coalesce

JIRA: https://issues.apache.org/jira/browse/SPARK-6177
Add comment to introduce coalesce to LDA example to avoid the possible massive partitions from `sc.textFile`.

sc.textFile will create RDD with one partition for each file, and the possible massive partitions downgrades LDA performance.

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #4899 from hhbyyh/adjustPartition and squashes the following commits:

a499630 [Yuhao Yang] update comment
9a2d7b6 [Yuhao Yang] move to comment
f7fd5d4 [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into adjustPartition
26a564a [Yuhao Yang] add coalesce to LDAExample
parent 8767565c
No related branches found
No related tags found
No related merge requests found
......@@ -173,7 +173,9 @@ object LDAExample {
stopwordFile: String): (RDD[(Long, Vector)], Array[String], Long) = {
// Get dataset of document texts
// One document per line in each text file.
// One document per line in each text file. If the input consists of many small files,
// this can result in a large number of small partitions, which can degrade performance.
// In this case, consider using coalesce() to create fewer, larger partitions.
val textRDD: RDD[String] = sc.textFile(paths.mkString(","))
// Split text into words
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment