-
- Downloads
[SPARK-7858] [SQL] Use output schema, not relation schema, for data source input conversion
In `DataSourceStrategy.createPhysicalRDD`, we use the relation schema as the target schema for converting incoming rows into Catalyst rows. However, we should be using the output schema instead, since our scan might return a subset of the relation's columns. This patch incorporates #6414 by liancheng, which fixes an issue in `SimpleTestRelation` that prevented this bug from being caught by our old tests: > In `SimpleTextRelation`, we specified `needsConversion` to `true`, indicating that values produced by this testing relation should be of Scala types, and need to be converted to Catalyst types when necessary. However, we also used `Cast` to convert strings to expected data types. And `Cast` always produces values of Catalyst types, thus no conversion is done at all. This PR makes `SimpleTextRelation` produce Scala values so that data conversion code paths can be properly tested. Closes #5986. Author: Josh Rosen <joshrosen@databricks.com> Author: Cheng Lian <lian@databricks.com> Author: Cheng Lian <liancheng@users.noreply.github.com> Closes #6400 from JoshRosen/SPARK-7858 and squashes the following commits: e71c866 [Josh Rosen] Re-fix bug so that the tests pass again 56b13e5 [Josh Rosen] Add regression test to hadoopFsRelationSuites 2169a0f [Josh Rosen] Remove use of SpecificMutableRow and BufferedIterator 6cd7366 [Josh Rosen] Fix SPARK-7858 by using output types for conversion. 5a00e66 [Josh Rosen] Add assertions in order to reproduce SPARK-7858 8ba195c [Cheng Lian] Merge 9968fba9979287aaa1f141ba18bfb9d4c116a3b3 into 61664732 9968fba [Cheng Lian] Tests the data type conversion code paths
Showing
- sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala 1 addition, 1 deletion...core/src/main/scala/org/apache/spark/sql/SQLContext.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala 24 additions, 38 deletions...in/scala/org/apache/spark/sql/execution/ExistingRDD.scala
- sql/core/src/main/scala/org/apache/spark/sql/sources/DataSourceStrategy.scala 1 addition, 1 deletion...ala/org/apache/spark/sql/sources/DataSourceStrategy.scala
- sql/hive/src/test/scala/org/apache/spark/sql/sources/SimpleTextRelation.scala 5 additions, 1 deletion...ala/org/apache/spark/sql/sources/SimpleTextRelation.scala
- sql/hive/src/test/scala/org/apache/spark/sql/sources/hadoopFsRelationSuites.scala 6 additions, 0 deletions...org/apache/spark/sql/sources/hadoopFsRelationSuites.scala
Loading
Please register or sign in to comment