Skip to content
Snippets Groups Projects
Commit 2f68631f authored by Liang-Chi Hsieh's avatar Liang-Chi Hsieh Committed by Wenchen Fan
Browse files

[SPARK-20848][SQL] Shutdown the pool after reading parquet files

## What changes were proposed in this pull request?

From JIRA: On each call to spark.read.parquet, a new ForkJoinPool is created. One of the threads in the pool is kept in the WAITING state, and never stopped, which leads to unbounded growth in number of threads.

We should shutdown the pool after reading parquet files.

## How was this patch tested?

Added a test to ParquetFileFormatSuite.

Please review http://spark.apache.org/contributing.html

 before opening a pull request.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #18073 from viirya/SPARK-20848.

(cherry picked from commit f72ad303)
Signed-off-by: default avatarWenchen Fan <wenchen@databricks.com>
parent 13adc0fc
No related branches found
No related tags found
No related merge requests found
......@@ -496,7 +496,8 @@ object ParquetFileFormat extends Logging {
partFiles: Seq[FileStatus],
ignoreCorruptFiles: Boolean): Seq[Footer] = {
val parFiles = partFiles.par
parFiles.tasksupport = new ForkJoinTaskSupport(new ForkJoinPool(8))
val pool = new ForkJoinPool(8)
parFiles.tasksupport = new ForkJoinTaskSupport(pool)
parFiles.flatMap { currentFile =>
try {
// Skips row group information since we only need the schema.
......@@ -512,6 +513,8 @@ object ParquetFileFormat extends Logging {
} else {
throw new IOException(s"Could not read footer for file: $currentFile", e)
}
} finally {
pool.shutdown()
}
}.seq
}
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment