Marcelo Vanzin
authored
The current interface used to fetch shuffle data is not very efficient for large buffers; it requires the receiver to buffer the entirety of the contents being downloaded in memory before processing the data. To use the network library to transfer large files (such as those that can be added using SparkContext addJar / addFile), this change adds a more efficient way of downloding data, by streaming the data and feeding it to a callback as data arrives. This is achieved by a custom frame decoder that replaces the current netty one; this decoder allows entering a mode where framing is skipped and data is instead provided directly to a callback. The existing netty classes (ByteToMessageDecoder and LengthFieldBasedFrameDecoder) could not be reused since their semantics do not allow for the interception approach the new decoder uses. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #9206 from vanzin/SPARK-11235.