Commit 400b2f86 authored 8 years ago by Davies Liu Committed by Davies Liu 8 years ago

[SPARK-14259] [SQL] Merging small files together based on the cost of opening

## What changes were proposed in this pull request?

This PR basically re-do the things in #12068 but with a different model, which should work better in case of small files with different sizes.

## How was this patch tested?

Updated existing tests.

Ran a query on thousands of partitioned small files locally, with all default settings (the cost to open a file should be over estimated), the durations of tasks become smaller and smaller, which is good (the last few tasks will be shortest).

Author: Davies Liu <davies@databricks.com>

Closes #12095 from davies/file_cost.

parent cc70f174

No related branches found

No related tags found

No related merge requests found

Hide whitespace changes

Inline Side-by-side

Showing with 21 additions and 19 deletions

Please register or to comment