-
- Downloads
[SPARK-20373][SQL][SS] Batch queries with 'Dataset/DataFrame.withWatermark()` does not execute
## What changes were proposed in this pull request? Any Dataset/DataFrame batch query with the operation `withWatermark` does not execute because the batch planner does not have any rule to explicitly handle the EventTimeWatermark logical plan. The right solution is to simply remove the plan node, as the watermark should not affect any batch query in any way. Changes: - In this PR, we add a new rule `EliminateEventTimeWatermark` to check if we need to ignore the event time watermark. We will ignore watermark in any batch query. Depends upon: - [SPARK-20672](https://issues.apache.org/jira/browse/SPARK-20672). We can not add this rule into analyzer directly, because streaming query will be copied to `triggerLogicalPlan ` in every trigger, and the rule will be applied to `triggerLogicalPlan` mistakenly. Others: - A typo fix in example. ## How was this patch tested? add new unit test. Author: uncleGen <hustyugm@gmail.com> Closes #17896 from uncleGen/SPARK-20373.
Showing
- docs/structured-streaming-programming-guide.md 3 additions, 0 deletionsdocs/structured-streaming-programming-guide.md
- examples/src/main/scala/org/apache/spark/examples/sql/streaming/StructuredSessionization.scala 2 additions, 2 deletions...ark/examples/sql/streaming/StructuredSessionization.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala 10 additions, 0 deletions...ala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
- sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala 2 additions, 1 deletionsql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
- sql/core/src/test/scala/org/apache/spark/sql/streaming/EventTimeWatermarkSuite.scala 10 additions, 0 deletions.../apache/spark/sql/streaming/EventTimeWatermarkSuite.scala
Loading
Please register or sign in to comment