-
- Downloads
[SPARK-20717][SS] Minor tweaks to the MapGroupsWithState behavior
## What changes were proposed in this pull request? Timeout and state data are two independent entities and should be settable independently. Therefore, in the same call of the user-defined function, one should be able to set the timeout before initializing the state and also after removing the state. Whether timeouts can be set or not, should not depend on the current state, and vice versa. However, a limitation of the current implementation is that state cannot be null while timeout is set. This is checked lazily after the function call has completed. ## How was this patch tested? - Updated existing unit tests that test the behavior of GroupState.setTimeout*** wrt to the current state - Added new tests that verify the disallowed cases where state is undefined but timeout is set. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #17957 from tdas/SPARK-20717.
Showing
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala 14 additions, 1 deletion.../sql/execution/streaming/FlatMapGroupsWithStateExec.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/GroupStateImpl.scala 2 additions, 14 deletions...apache/spark/sql/execution/streaming/GroupStateImpl.scala
- sql/core/src/main/scala/org/apache/spark/sql/streaming/GroupState.scala 1 addition, 1 deletion...ain/scala/org/apache/spark/sql/streaming/GroupState.scala
- sql/core/src/test/scala/org/apache/spark/sql/streaming/FlatMapGroupsWithStateSuite.scala 122 additions, 39 deletions...che/spark/sql/streaming/FlatMapGroupsWithStateSuite.scala
Loading
Please register or sign in to comment