-
- Downloads
[SPARK-11111] [SQL] fast null-safe join
Currently, we use CartesianProduct for join with null-safe-equal condition. ``` scala> sqlContext.sql("select * from t a join t b on (a.i <=> b.i)").explain == Physical Plan == TungstenProject [i#2,j#3,i#7,j#8] Filter (i#2 <=> i#7) CartesianProduct LocalTableScan [i#2,j#3], [[1,1]] LocalTableScan [i#7,j#8], [[1,1]] ``` Actually, we can have an equal-join condition as `coalesce(i, default) = coalesce(b.i, default)`, then an partitioned join algorithm could be used. After this PR, the plan will become: ``` >>> sqlContext.sql("select * from a join b ON a.id <=> b.id").explain() TungstenProject [id#0L,id#1L] Filter (id#0L <=> id#1L) SortMergeJoin [coalesce(id#0L,0)], [coalesce(id#1L,0)] TungstenSort [coalesce(id#0L,0) ASC], false, 0 TungstenExchange hashpartitioning(coalesce(id#0L,0),200) ConvertToUnsafe Scan PhysicalRDD[id#0L] TungstenSort [coalesce(id#1L,0) ASC], false, 0 TungstenExchange hashpartitioning(coalesce(id#1L,0),200) ConvertToUnsafe Scan PhysicalRDD[id#1L] ``` Author: Davies Liu <davies@databricks.com> Closes #9120 from davies/null_safe.
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala 27 additions, 2 deletions.../org/apache/spark/sql/catalyst/expressions/literals.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala 23 additions, 12 deletions...ala/org/apache/spark/sql/catalyst/planning/patterns.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralExpressionSuite.scala 27 additions, 1 deletion...ark/sql/catalyst/expressions/LiteralExpressionSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 14 additions, 0 deletions...e/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/joins/InnerJoinSuite.scala 14 additions, 0 deletions...org/apache/spark/sql/execution/joins/InnerJoinSuite.scala
Loading
Please register or sign in to comment