Skip to content
Snippets Groups Projects
Commit 9256840c authored by Davies Liu's avatar Davies Liu Committed by Reynold Xin
Browse files

[SPARK-13661][SQL] avoid the copy in HashedRelation

## What changes were proposed in this pull request?

Avoid the copy in HashedRelation, since most of the HashedRelation are built with Array[Row], added the copy() for LeftSemiJoinHash. This could help to reduce the memory consumption for Broadcast join.

## How was this patch tested?

Existing tests.

Author: Davies Liu <davies@databricks.com>

Closes #11666 from davies/remove_copy.
parent e76679a8
No related branches found
No related tags found
No related merge requests found
......@@ -156,6 +156,11 @@ private[joins] class UniqueKeyHashedRelation(
private[execution] object HashedRelation {
/**
* Create a HashedRelation from an Iterator of InternalRow.
*
* Note: The caller should make sure that these InternalRow are different objects.
*/
def apply(
input: Iterator[InternalRow],
keyGenerator: Projection,
......@@ -188,7 +193,7 @@ private[execution] object HashedRelation {
keyIsUnique = false
existingMatchList
}
matchList += currentRow.copy()
matchList += currentRow
}
}
......@@ -438,7 +443,7 @@ private[joins] object UnsafeHashedRelation {
} else {
existingMatchList
}
matchList += unsafeRow.copy()
matchList += unsafeRow
}
}
......@@ -622,7 +627,7 @@ private[joins] object LongHashedRelation {
keyIsUnique = false
existingMatchList
}
matchList += unsafeRow.copy()
matchList += unsafeRow
}
}
......
......@@ -47,7 +47,7 @@ case class LeftSemiJoinHash(
val numOutputRows = longMetric("numOutputRows")
right.execute().zipPartitions(left.execute()) { (buildIter, streamIter) =>
val hashRelation = HashedRelation(buildIter, rightKeyGenerator)
val hashRelation = HashedRelation(buildIter.map(_.copy()), rightKeyGenerator)
hashSemiJoin(streamIter, hashRelation, numOutputRows)
}
}
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment