-
- Downloads
[SPARK-19745][ML] SVCAggregator captures coefficients in its closure
## What changes were proposed in this pull request? JIRA: [SPARK-19745](https://issues.apache.org/jira/browse/SPARK-19745) Reorganize SVCAggregator to avoid serializing coefficients. This patch also makes the gradient array a `lazy val` which will avoid materializing a large array on the driver before shipping the class to the executors. This improvement stems from https://github.com/apache/spark/pull/16037. Actually, probably all ML aggregators can benefit from this. We can either: a.) separate the gradient improvement into another patch b.) keep what's here _plus_ add the lazy evaluation to all other aggregators in this patch or c.) keep it as is. ## How was this patch tested? This is an interesting question! I don't know of a reasonable way to test this right now. Ideally, we could perform an optimization and look at the shuffle write data for each task, and we could compare the size to what it we know it should be: `numCoefficients * 8 bytes`. Not sure if there is a good way to do that right now? We could discuss this here or in another JIRA, but I suspect it would be a significant undertaking. Author: sethah <seth.hendrickson16@gmail.com> Closes #17076 from sethah/svc_agg.
Showing
- mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala 12 additions, 17 deletions.../scala/org/apache/spark/ml/classification/LinearSVC.scala
- mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala 1 addition, 1 deletion...g/apache/spark/ml/classification/LogisticRegression.scala
- mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala 3 additions, 3 deletions...cala/org/apache/spark/ml/clustering/GaussianMixture.scala
- mllib/src/main/scala/org/apache/spark/ml/regression/AFTSurvivalRegression.scala 1 addition, 1 deletion...rg/apache/spark/ml/regression/AFTSurvivalRegression.scala
- mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala 1 addition, 1 deletion...ala/org/apache/spark/ml/regression/LinearRegression.scala
- mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala 16 additions, 1 deletion...a/org/apache/spark/ml/classification/LinearSVCSuite.scala
Loading
Please register or sign in to comment