-
- Downloads
[SPARK-18710][ML] Add offset in GLM
## What changes were proposed in this pull request? Add support for offset in GLM. This is useful for at least two reasons: 1. Account for exposure: e.g., when modeling the number of accidents, we may need to use miles driven as an offset to access factors on frequency. 2. Test incremental effects of new variables: we can use predictions from the existing model as offset and run a much smaller model on only new variables. This avoids re-estimating the large model with all variables (old + new) and can be very important for efficient large-scaled analysis. ## How was this patch tested? New test. yanboliang srowen felixcheung sethah Author: actuaryzhang <actuaryzhang10@gmail.com> Closes #16699 from actuaryzhang/offset.
Showing
- mllib/src/main/scala/org/apache/spark/ml/feature/Instance.scala 21 additions, 0 deletions...src/main/scala/org/apache/spark/ml/feature/Instance.scala
- mllib/src/main/scala/org/apache/spark/ml/optim/IterativelyReweightedLeastSquares.scala 7 additions, 7 deletions...he/spark/ml/optim/IterativelyReweightedLeastSquares.scala
- mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala 1 addition, 1 deletion...cala/org/apache/spark/ml/optim/WeightedLeastSquares.scala
- mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala 125 additions, 59 deletions...che/spark/ml/regression/GeneralizedLinearRegression.scala
- mllib/src/test/scala/org/apache/spark/ml/optim/IterativelyReweightedLeastSquaresSuite.scala 20 additions, 20 deletions...ark/ml/optim/IterativelyReweightedLeastSquaresSuite.scala
- mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala 360 additions, 274 deletions...park/ml/regression/GeneralizedLinearRegressionSuite.scala
This diff is collapsed.
Please register or sign in to comment