-
- Downloads
[SPARK-15764][SQL] Replace N^2 loop in BindReferences
BindReferences contains a n^2 loop which causes performance issues when operating over large schemas: to determine the ordinal of an attribute reference, we perform a linear scan over the `input` array. Because input can sometimes be a `List`, the call to `input(ordinal).nullable` can also be O(n). Instead of performing a linear scan, we can convert the input into an array and build a hash map to map from expression ids to ordinals. The greater up-front cost of the map construction is offset by the fact that an expression can contain multiple attribute references, so the cost of the map construction is amortized across a number of lookups. Perf. benchmarks to follow. /cc ericl Author: Josh Rosen <joshrosen@databricks.com> Closes #13505 from JoshRosen/bind-references-improvement.
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeMap.scala 0 additions, 7 deletions.../apache/spark/sql/catalyst/expressions/AttributeMap.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala 3 additions, 3 deletions...pache/spark/sql/catalyst/expressions/BoundAttribute.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala 33 additions, 1 deletion...a/org/apache/spark/sql/catalyst/expressions/package.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala 1 addition, 1 deletion...scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala 1 addition, 1 deletion...che/spark/sql/execution/aggregate/HashAggregateExec.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala 2 additions, 2 deletions.../spark/sql/execution/columnar/InMemoryTableScanExec.scala
Loading
Please register or sign in to comment