-
- Downloads
[SPARK-14137] [SQL] Cleanup hash join
## What changes were proposed in this pull request? This PR did a few cleanup on HashedRelation and HashJoin: 1) Merge HashedRelation and UniqueHashedRelation together 2) Return an iterator from HashedRelation, so we donot need a create many UnsafeRow objects. 3) Return a copy of HashedRelation for thread-safety in BroadcastJoin, so we can re-use the UnafeRow objects. 4) Cleanup HashJoin, share most of the code between BroadcastHashJoin and ShuffleHashJoin 5) Removed UniqueLongHashedRelation, which will be replaced by LongUnsafeMap (another PR). 6) Update benchmark, before this patch, the selectivity of joins are too high. ## How was this patch tested? Existing tests. Author: Davies Liu <davies@databricks.com> Closes #12102 from davies/cleanup_hash.
Showing
- sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoin.scala 21 additions, 57 deletions.../apache/spark/sql/execution/joins/BroadcastHashJoin.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashJoin.scala 81 additions, 136 deletions...scala/org/apache/spark/sql/execution/joins/HashJoin.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala 116 additions, 148 deletions...org/apache/spark/sql/execution/joins/HashedRelation.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoin.scala 1 addition, 31 deletions...g/apache/spark/sql/execution/joins/ShuffledHashJoin.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/BenchmarkWholeStageCodegen.scala 42 additions, 22 deletions...ache/spark/sql/execution/BenchmarkWholeStageCodegen.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/joins/HashedRelationSuite.scala 7 additions, 7 deletions...pache/spark/sql/execution/joins/HashedRelationSuite.scala
Loading
Please register or sign in to comment