-
- Downloads
[SPARK-15112][SQL] Disables EmbedSerializerInFilter for plan fragments that change schema
## What changes were proposed in this pull request? `EmbedSerializerInFilter` implicitly assumes that the plan fragment being optimized doesn't change plan schema, which is reasonable because `Dataset.filter` should never change the schema. However, due to another issue involving `DeserializeToObject` and `SerializeFromObject`, typed filter *does* change plan schema (see [SPARK-15632][1]). This breaks `EmbedSerializerInFilter` and causes corrupted data. This PR disables `EmbedSerializerInFilter` when there's a schema change to avoid data corruption. The schema change issue should be addressed in follow-up PRs. ## How was this patch tested? New test case added in `DatasetSuite`. [1]: https://issues.apache.org/jira/browse/SPARK-15632 Author: Cheng Lian <lian@databricks.com> Closes #13362 from liancheng/spark-15112-corrupted-filter.
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala 20 additions, 1 deletion...a/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
- sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala 15 additions, 1 deletion...re/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
Loading
Please register or sign in to comment