-
- Downloads
[SPARK-15632][SQL] Typed Filter should NOT change the Dataset schema
## What changes were proposed in this pull request? This PR makes sure the typed Filter doesn't change the Dataset schema. **Before the change:** ``` scala> val df = spark.range(0,9) scala> df.schema res12: org.apache.spark.sql.types.StructType = StructType(StructField(id,LongType,false)) scala> val afterFilter = df.filter(_=>true) scala> afterFilter.schema // !!! schema is CHANGED!!! Column name is changed from id to value, nullable is changed from false to true. res13: org.apache.spark.sql.types.StructType = StructType(StructField(value,LongType,true)) ``` SerializeFromObject and DeserializeToObject are inserted to wrap the Filter, and these two can possibly change the schema of Dataset. **After the change:** ``` scala> afterFilter.schema // schema is NOT changed. res47: org.apache.spark.sql.types.StructType = StructType(StructField(id,LongType,false)) ``` ## How was this patch tested? Unit test. Author: Sean Zhong <seanzhong@databricks.com> Closes #13529 from clockfly/spark-15632.
Showing
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/TypedFilterOptimizationSuite.scala 3 additions, 1 deletion...sql/catalyst/optimizer/TypedFilterOptimizationSuite.scala
- sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala 8 additions, 8 deletionssql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
- sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java 13 additions, 0 deletions...test/java/test/org/apache/spark/sql/JavaDatasetSuite.java
- sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala 6 additions, 0 deletions...re/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala 1 addition, 1 deletion...g/apache/spark/sql/execution/WholeStageCodegenSuite.scala
Loading
Please register or sign in to comment