Skip to content
Snippets Groups Projects
Commit 7425bec3 authored by Michael Davies's avatar Michael Davies Committed by Michael Armbrust
Browse files

[SPARK-4386] Improve performance when writing Parquet files

Convert type of RowWriteSupport.attributes to Array.

Analysis of performance for writing very wide tables shows that time is spent predominantly in apply method on  attributes var. Type of attributes previously was LinearSeqOptimized and apply is O(N) which made write O(N squared).

Measurements on 575 column table showed this change made a 6x improvement in write times.

Author: Michael Davies <Michael.BellDavies@gmail.com>

Closes #3843 from MickDavies/SPARK-4386 and squashes the following commits:

892519d [Michael Davies] [SPARK-4386] Improve performance when writing Parquet files
parent 61a99f6a
No related branches found
No related tags found
No related merge requests found
......@@ -130,7 +130,7 @@ private[parquet] object RowReadSupport {
private[parquet] class RowWriteSupport extends WriteSupport[Row] with Logging {
private[parquet] var writer: RecordConsumer = null
private[parquet] var attributes: Seq[Attribute] = null
private[parquet] var attributes: Array[Attribute] = null
override def init(configuration: Configuration): WriteSupport.WriteContext = {
val origAttributesStr: String = configuration.get(RowWriteSupport.SPARK_ROW_SCHEMA)
......@@ -138,7 +138,7 @@ private[parquet] class RowWriteSupport extends WriteSupport[Row] with Logging {
metadata.put(RowReadSupport.SPARK_METADATA_KEY, origAttributesStr)
if (attributes == null) {
attributes = ParquetTypesConverter.convertFromString(origAttributesStr)
attributes = ParquetTypesConverter.convertFromString(origAttributesStr).toArray
}
log.debug(s"write support initialized for requested schema $attributes")
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment