Skip to content
Snippets Groups Projects
Commit 2ef4c596 authored by Davies Liu's avatar Davies Liu Committed by Davies Liu
Browse files

[SPARK-13830] prefer block manager than direct result for large result

## What changes were proposed in this pull request?

The current RPC can't handle large blocks very well, it's very slow to fetch 100M block (about 1 minute). Once switch to block manager to fetch that, it took about 10 seconds (still could be improved).

## How was this patch tested?

existing unit tests.

Author: Davies Liu <davies@databricks.com>

Closes #11659 from davies/direct_result.
parent 66d9d0ed
No related branches found
No related tags found
No related merge requests found
......@@ -97,9 +97,11 @@ private[spark] class Executor(
// Set the classloader for serializer
env.serializer.setDefaultClassLoader(replClassLoader)
// Max RPC message size. If task result is bigger than this, we use the block manager
// Max size of direct result. If task result is bigger than this, we use the block manager
// to send the result back.
private val maxRpcMessageSize = RpcUtils.maxMessageSizeBytes(conf)
private val maxDirectResultSize = Math.min(
conf.getSizeAsBytes("spark.task.maxDirectResultSize", 1L << 20),
RpcUtils.maxMessageSizeBytes(conf))
// Limit of bytes for total size of results (default is 1GB)
private val maxResultSize = Utils.getMaxResultSize(conf)
......@@ -279,6 +281,7 @@ private[spark] class Executor(
// Note: accumulator updates must be collected after TaskMetrics is updated
val accumUpdates = task.collectAccumulatorUpdates()
// TODO: do not serialize value twice
val directResult = new DirectTaskResult(valueBytes, accumUpdates)
val serializedDirectResult = ser.serialize(directResult)
val resultSize = serializedDirectResult.limit
......@@ -290,7 +293,7 @@ private[spark] class Executor(
s"(${Utils.bytesToString(resultSize)} > ${Utils.bytesToString(maxResultSize)}), " +
s"dropping it.")
ser.serialize(new IndirectTaskResult[Any](TaskResultBlockId(taskId), resultSize))
} else if (resultSize >= maxRpcMessageSize) {
} else if (resultSize > maxDirectResultSize) {
val blockId = TaskResultBlockId(taskId)
env.blockManager.putBytes(
blockId, serializedDirectResult, StorageLevel.MEMORY_AND_DISK_SER)
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment