Commit 20a4d7db authored 10 years ago by Cheng Lian

[SPARK-8501] [SQL] Avoids reading schema from empty ORC files

ORC writes empty schema (`struct<>`) to ORC files containing zero rows. This is OK for Hive since the table schema is managed by the metastore. But it causes trouble when reading raw ORC files via Spark SQL since we have to discover the schema from the files.

Notice that the ORC data source always avoids writing empty ORC files, but it's still problematic when reading Hive tables which contain empty part-files.

Author: Cheng Lian <lian@databricks.com>

Closes #7199 from liancheng/spark-8501 and squashes the following commits:

bb8cd95 [Cheng Lian] Addresses comments
a290221 [Cheng Lian] Avoids reading schema from empty ORC files

parent dfd8bac8

No related branches found

No related tags found

No related merge requests found

Hide whitespace changes

Inline Side-by-side

Showing with 135 additions and 52 deletions

Please register or to comment