[SPARK-9973] [SQL] Correct in-memory columnar buffer size

The `initialSize` argument of `ColumnBuilder.initialize()` should be the number of rows rather than bytes. However `InMemoryColumnarTableScan` passes in a byte size, which makes Spark SQL allocate more memory than necessary when building in-memory columnar buffers. Author: Kun Xu <viper_kun@163.com> Closes #8189 from viper-kun/errorSize.

[SPARK-9973] [SQL] Correct in-memory columnar buffer size
182f9b7a · Kun Xu · Cheng Lian · 1db7179f · 182f9b7a
Commit 182f9b7a authored 10 years ago by Kun Xu Committed by Cheng Lian 10 years ago
--- a/sql/core/src/main/scala/org/apache/spark/sql/columnar/InMemoryColumnarTableScan.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/columnar/InMemoryColumnarTableScan.scala
@@ -121,8 +121,7 @@ private[sql] case class InMemoryRelation(
        def next(): CachedBatch = {
          val columnBuilders = output.map { attribute =>
            val columnType = ColumnType(attribute.dataType)
-            val initialBufferSize = columnType.defaultSize * batchSize
-            ColumnBuilder(attribute.dataType, initialBufferSize, attribute.name, useCompression)
+            ColumnBuilder(attribute.dataType, batchSize, attribute.name, useCompression)
          }.toArray

          var rowCount = 0