-
- Downloads
[SPARK-19425][SQL] Make ExtractEquiJoinKeys support UDT columns
## What changes were proposed in this pull request? DataFrame.except doesn't work for UDT columns. It is because `ExtractEquiJoinKeys` will run `Literal.default` against UDT. However, we don't handle UDT in `Literal.default` and an exception will throw like: java.lang.RuntimeException: no default for type org.apache.spark.ml.linalg.VectorUDT3bfc3ba7 at org.apache.spark.sql.catalyst.expressions.Literal$.default(literals.scala:179) at org.apache.spark.sql.catalyst.planning.ExtractEquiJoinKeys$$anonfun$4.apply(patterns.scala:117) at org.apache.spark.sql.catalyst.planning.ExtractEquiJoinKeys$$anonfun$4.apply(patterns.scala:110) More simple fix is just let `Literal.default` handle UDT by its sql type. So we can use more efficient join type on UDT. Besides `except`, this also fixes other similar scenarios, so in summary this fixes: * `except` on two Datasets with UDT * `intersect` on two Datasets with UDT * `Join` with the join conditions using `<=>` on UDT columns ## How was this patch tested? Jenkins tests. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #16765 from viirya/df-except-for-udt.
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala 1 addition, 0 deletions.../org/apache/spark/sql/catalyst/expressions/literals.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralExpressionSuite.scala 3 additions, 0 deletions...ark/sql/catalyst/expressions/LiteralExpressionSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala 9 additions, 0 deletions...est/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala
Please register or sign in to comment