-
- Downloads
[SPARK-17141][ML] MinMaxScaler should remain NaN value.
## What changes were proposed in this pull request? In the existing code, ```MinMaxScaler``` handle ```NaN``` value indeterminately. * If a column has identity value, that is ```max == min```, ```MinMaxScalerModel``` transformation will output ```0.5``` for all rows even the original value is ```NaN```. * Otherwise, it will remain ```NaN``` after transformation. I think we should unify the behavior by remaining ```NaN``` value at any condition, since we don't know how to transform a ```NaN``` value. In Python sklearn, it will throw exception when there is ```NaN``` in the dataset. ## How was this patch tested? Unit tests. Author: Yanbo Liang <ybliang8@gmail.com> Closes #14716 from yanboliang/spark-17141.
Showing
- mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala 4 additions, 2 deletions...main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala
- mllib/src/test/scala/org/apache/spark/ml/feature/MinMaxScalerSuite.scala 27 additions, 0 deletions...scala/org/apache/spark/ml/feature/MinMaxScalerSuite.scala
Please register or sign in to comment