-
- Downloads
[SPARK-17480][SQL] Improve performance by removing or caching List.length which is O(n)
## What changes were proposed in this pull request? Scala's List.length method is O(N) and it makes the gatherCompressibilityStats function O(N^2). Eliminate the List.length calls by writing it in Scala way. https://github.com/scala/scala/blob/2.10.x/src/library/scala/collection/LinearSeqOptimized.scala#L36 As suggested. Extended the fix to HiveInspectors and AggregationIterator classes as well. ## How was this patch tested? Profiled a Spark job and found that CompressibleColumnBuilder is using 39% of the CPU. Out of this 39% CompressibleColumnBuilder->gatherCompressibilityStats is using 23% of it. 6.24% of the CPU is spend on List.length which is called inside gatherCompressibilityStats. After this change we started to save 6.24% of the CPU. Author: Ergin Seyfe <eseyfe@fb.com> Closes #15032 from seyfe/gatherCompressibilityStats.
Showing
- sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggregationIterator.scala 4 additions, 3 deletions...e/spark/sql/execution/aggregate/AggregationIterator.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/compression/CompressibleColumnBuilder.scala 1 addition, 5 deletions...tion/columnar/compression/CompressibleColumnBuilder.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala 4 additions, 2 deletions...main/scala/org/apache/spark/sql/hive/HiveInspectors.scala
Loading
Please register or sign in to comment