-
- Downloads
[SPARK-17515] CollectLimit.execute() should perform per-partition limits
## What changes were proposed in this pull request? CollectLimit.execute() incorrectly omits per-partition limits, leading to performance regressions in case this case is hit (which should not happen in normal operation, but can occur in some cases (see #15068 for one example). ## How was this patch tested? Regression test in SQLQuerySuite that asserts the number of records scanned from the input RDD. Author: Josh Rosen <joshrosen@databricks.com> Closes #15070 from JoshRosen/SPARK-17515.
Showing
- sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala 2 additions, 1 deletion...src/main/scala/org/apache/spark/sql/execution/limit.scala
- sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 9 additions, 0 deletions...e/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
Please register or sign in to comment