-
- Downloads
[SPARK-13809][SQL] State store for streaming aggregations
## What changes were proposed in this pull request? In this PR, I am implementing a new abstraction for management of streaming state data - State Store. It is a key-value store for persisting running aggregates for aggregate operations in streaming dataframes. The motivation and design is discussed here. https://docs.google.com/document/d/1-ncawFx8JS5Zyfq1HAEGBx56RDet9wfVp_hDM8ZL254/edit# ## How was this patch tested? - [x] Unit tests - [x] Cluster tests **Coverage from unit tests** <img width="952" alt="screen shot 2016-03-21 at 3 09 40 pm" src="https://cloud.githubusercontent.com/assets/663212/13935872/fdc8ba86-ef76-11e5-93e8-9fa310472c7b.png"> ## TODO - [x] Fix updates() iterator to avoid duplicate updates for same key - [x] Use Coordinator in ContinuousQueryManager - [x] Plugging in hadoop conf and other confs - [x] Unit tests - [x] StateStore object lifecycle and methods - [x] StateStoreCoordinator communication and logic - [x] StateStoreRDD fault-tolerance - [x] StateStoreRDD preferred location using StateStoreCoordinator - [ ] Cluster tests - [ ] Whether preferred locations are set correctly - [ ] Whether recovery works correctly with distributed storage - [x] Basic performance tests - [x] Docs Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #11645 from tdas/state-store.
Showing
- sql/core/src/main/scala/org/apache/spark/sql/ContinuousQueryManager.scala 3 additions, 0 deletions...n/scala/org/apache/spark/sql/ContinuousQueryManager.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala 584 additions, 0 deletions...cution/streaming/state/HDFSBackedStateStoreProvider.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala 247 additions, 0 deletions...ache/spark/sql/execution/streaming/state/StateStore.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreConf.scala 37 additions, 0 deletions.../spark/sql/execution/streaming/state/StateStoreConf.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreCoordinator.scala 146 additions, 0 deletions...sql/execution/streaming/state/StateStoreCoordinator.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreRDD.scala 70 additions, 0 deletions...e/spark/sql/execution/streaming/state/StateStoreRDD.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/package.scala 75 additions, 0 deletions.../apache/spark/sql/execution/streaming/state/package.scala
- sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 13 additions, 0 deletions...rc/main/scala/org/apache/spark/sql/internal/SQLConf.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreCoordinatorSuite.scala 123 additions, 0 deletions...xecution/streaming/state/StateStoreCoordinatorSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreRDDSuite.scala 192 additions, 0 deletions...rk/sql/execution/streaming/state/StateStoreRDDSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreSuite.scala 562 additions, 0 deletions...spark/sql/execution/streaming/state/StateStoreSuite.scala
Loading
Please register or sign in to comment