-
- Downloads
[SPARK-8077] [SQL] Optimization for TreeNodes with large numbers of children
For example large IN clauses Large IN clauses are parsed very slowly. For example SQL below (10K items in IN) takes 45-50s. s"""SELECT * FROM Person WHERE ForeName IN ('${(1 to 10000).map("n" + _).mkString("','")}')""" This is principally due to TreeNode which repeatedly call contains on children, where children in this case is a List that is 10K long. In effect parsing for large IN clauses is O(N squared). A lazily initialised Set based on children for contains reduces parse time to around 2.5s Author: Michael Davies <Michael.BellDavies@gmail.com> Closes #6673 from MickDavies/SPARK-8077 and squashes the following commits: 38cd425 [Michael Davies] SPARK-8077: Optimization for TreeNodes with large numbers of children d80103b [Michael Davies] SPARK-8077: Optimization for TreeNodes with large numbers of children e6be8be [Michael Davies] SPARK-8077: Optimization for TreeNodes with large numbers of children
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala 1 addition, 1 deletion...apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala 16 additions, 11 deletions.../scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
Loading
Please register or sign in to comment