-
- Downloads
[SPARK-17844] Simplify DataFrame API for defining frame boundaries in window functions
## What changes were proposed in this pull request? When I was creating the example code for SPARK-10496, I realized it was pretty convoluted to define the frame boundaries for window functions when there is no partition column or ordering column. The reason is that we don't provide a way to create a WindowSpec directly with the frame boundaries. We can trivially improve this by adding rowsBetween and rangeBetween to Window object. As an example, to compute cumulative sum using the natural ordering, before this pr: ``` df.select('key, sum("value").over(Window.partitionBy(lit(1)).rowsBetween(Long.MinValue, 0))) ``` After this pr: ``` df.select('key, sum("value").over(Window.rowsBetween(Long.MinValue, 0))) ``` Note that you could argue there is no point specifying a window frame without partitionBy/orderBy -- but it is strange that only rowsBetween and rangeBetween are not the only two APIs not available. This also fixes https://issues.apache.org/jira/browse/SPARK-17656 (removing _root_.scala). ## How was this patch tested? Added test cases to compute cumulative sum in DataFrameWindowSuite for Scala/Java and tests.py for Python. Author: Reynold Xin <rxin@databricks.com> Closes #15412 from rxin/SPARK-17844.
Showing
- python/pyspark/sql/tests.py 9 additions, 0 deletionspython/pyspark/sql/tests.py
- python/pyspark/sql/window.py 48 additions, 0 deletionspython/pyspark/sql/window.py
- sql/core/src/main/scala/org/apache/spark/sql/expressions/Window.scala 42 additions, 4 deletions.../main/scala/org/apache/spark/sql/expressions/Window.scala
- sql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala 6 additions, 4 deletions...n/scala/org/apache/spark/sql/expressions/WindowSpec.scala
- sql/core/src/main/scala/org/apache/spark/sql/expressions/udaf.scala 2 additions, 2 deletions...rc/main/scala/org/apache/spark/sql/expressions/udaf.scala
- sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowSuite.scala 12 additions, 0 deletions...est/scala/org/apache/spark/sql/DataFrameWindowSuite.scala
Loading
Please register or sign in to comment