-
- Downloads
[SPARK-19611][SQL] Introduce configurable table schema inference
Add a new configuration option that allows Spark SQL to infer a case-sensitive schema from a Hive Metastore table's data files when a case-sensitive schema can't be read from the table properties. - Add spark.sql.hive.caseSensitiveInferenceMode param to SQLConf - Add schemaPreservesCase field to CatalogTable (set to false when schema can't successfully be read from Hive table props) - Perform schema inference in HiveMetastoreCatalog if schemaPreservesCase is false, depending on spark.sql.hive.caseSensitiveInferenceMode - Add alterTableSchema() method to the ExternalCatalog interface - Add HiveSchemaInferenceSuite tests - Refactor and move ParquetFileForamt.meregeMetastoreParquetSchema() as HiveMetastoreCatalog.mergeWithMetastoreSchema - Move schema merging tests from ParquetSchemaSuite to HiveSchemaInferenceSuite [JIRA for this change](https://issues.apache.org/jira/browse/SPARK-19611) The tests in ```HiveSchemaInferenceSuite``` should verify that schema inference is working as expected. ```ExternalCatalogSuite``` has also been extended to cover the new ```alterTableSchema()``` API. Author: Budde <budde@amazon.com> Closes #17229 from budde/SPARK-19611-2.1.
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala 14 additions, 1 deletion...g/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala 10 additions, 0 deletions...g/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala 7 additions, 1 deletion...ala/org/apache/spark/sql/catalyst/catalog/interface.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala 14 additions, 1 deletion...che/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/trees/TreeNodeSuite.scala 2 additions, 1 deletion...a/org/apache/spark/sql/catalyst/trees/TreeNodeSuite.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala 0 additions, 65 deletions...sql/execution/datasources/parquet/ParquetFileFormat.scala
- sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 22 additions, 0 deletions...rc/main/scala/org/apache/spark/sql/internal/SQLConf.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaSuite.scala 0 additions, 82 deletions...ql/execution/datasources/parquet/ParquetSchemaSuite.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala 21 additions, 2 deletions...scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala 90 additions, 7 deletions...cala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSchemaInferenceSuite.scala 333 additions, 0 deletions.../org/apache/spark/sql/hive/HiveSchemaInferenceSuite.scala
Loading
Please register or sign in to comment