-
- Downloads
[SPARK-19746][ML] Faster indexing for logistic aggregator
## What changes were proposed in this pull request? JIRA: [SPARK-19746](https://issues.apache.org/jira/browse/SPARK-19746) The following code is inefficient: ````scala val localCoefficients: Vector = bcCoefficients.value features.foreachActive { (index, value) => val stdValue = value / localFeaturesStd(index) var j = 0 while (j < numClasses) { margins(j) += localCoefficients(index * numClasses + j) * stdValue j += 1 } } ```` `localCoefficients(index * numClasses + j)` calls `Vector.apply` which creates a new Breeze vector and indexes that. Even if it is not that slow to create the object, we will generate a lot of extra garbage that may result in longer GC pauses. This is a hot inner loop, so we should optimize wherever possible. ## How was this patch tested? I don't think there's a great way to test this patch. It's purely performance related, so unit tests should guarantee that we haven't made any unwanted changes. Empirically I observed between 10-40% speedups just running short local tests. I suspect the big differences will be seen when large data/coefficient sizes have to pause for GC more often. I welcome other ideas for testing. Author: sethah <seth.hendrickson16@gmail.com> Closes #17078 from sethah/logistic_agg_indexing.
Showing
- mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala 8 additions, 3 deletions...g/apache/spark/ml/classification/LogisticRegression.scala
- mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala 26 additions, 0 deletions...che/spark/ml/classification/LogisticRegressionSuite.scala
Loading
Please register or sign in to comment