Skip to content
Snippets Groups Projects
Commit 0b713e04 authored by Yuhao Yang's avatar Yuhao Yang Committed by Nick Pentreath
Browse files

[SPARK-13512][ML] add example and doc for MaxAbsScaler

## What changes were proposed in this pull request?

jira: https://issues.apache.org/jira/browse/SPARK-13512
Add example and doc for ml.feature.MaxAbsScaler.

## How was this patch tested?
 unit tests

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #11392 from hhbyyh/maxabsdoc.
parent 6ca990fb
No related branches found
No related tags found
No related merge requests found
......@@ -773,6 +773,38 @@ for more details on the API.
</div>
</div>
## MaxAbsScaler
`MaxAbsScaler` transforms a dataset of `Vector` rows, rescaling each feature to range [-1, 1]
by dividing through the maximum absolute value in each feature. It does not shift/center the
data, and thus does not destroy any sparsity.
`MaxAbsScaler` computes summary statistics on a data set and produces a `MaxAbsScalerModel`. The
model can then transform each feature individually to range [-1, 1].
The following example demonstrates how to load a dataset in libsvm format and then rescale each feature to [-1, 1].
<div class="codetabs">
<div data-lang="scala" markdown="1">
Refer to the [MaxAbsScaler Scala docs](api/scala/index.html#org.apache.spark.ml.feature.MaxAbsScaler)
and the [MaxAbsScalerModel Scala docs](api/scala/index.html#org.apache.spark.ml.feature.MaxAbsScalerModel)
for more details on the API.
{% include_example scala/org/apache/spark/examples/ml/MaxAbsScalerExample.scala %}
</div>
<div data-lang="java" markdown="1">
Refer to the [MaxAbsScaler Java docs](api/java/org/apache/spark/ml/feature/MaxAbsScaler.html)
and the [MaxAbsScalerModel Java docs](api/java/org/apache/spark/ml/feature/MaxAbsScalerModel.html)
for more details on the API.
{% include_example java/org/apache/spark/examples/ml/JavaMaxAbsScalerExample.java %}
</div>
</div>
## Bucketizer
`Bucketizer` transforms a column of continuous features to a column of feature buckets, where the buckets are specified by users. It takes a parameter:
......
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.spark.examples.ml;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
// $example on$
import org.apache.spark.ml.feature.MaxAbsScaler;
import org.apache.spark.ml.feature.MaxAbsScalerModel;
import org.apache.spark.sql.DataFrame;
// $example off$
import org.apache.spark.sql.SQLContext;
public class JavaMaxAbsScalerExample {
public static void main(String[] args) {
SparkConf conf = new SparkConf().setAppName("JavaMaxAbsScalerExample");
JavaSparkContext jsc = new JavaSparkContext(conf);
SQLContext jsql = new SQLContext(jsc);
// $example on$
DataFrame dataFrame = jsql.read().format("libsvm").load("data/mllib/sample_libsvm_data.txt");
MaxAbsScaler scaler = new MaxAbsScaler()
.setInputCol("features")
.setOutputCol("scaledFeatures");
// Compute summary statistics and generate MaxAbsScalerModel
MaxAbsScalerModel scalerModel = scaler.fit(dataFrame);
// rescale each feature to range [-1, 1].
DataFrame scaledData = scalerModel.transform(dataFrame);
scaledData.show();
// $example off$
jsc.stop();
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
// scalastyle:off println
package org.apache.spark.examples.ml
import org.apache.spark.{SparkConf, SparkContext}
// $example on$
import org.apache.spark.ml.feature.MaxAbsScaler
// $example off$
import org.apache.spark.sql.SQLContext
object MaxAbsScalerExample {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("MaxAbsScalerExample")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
// $example on$
val dataFrame = sqlContext.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt")
val scaler = new MaxAbsScaler()
.setInputCol("features")
.setOutputCol("scaledFeatures")
// Compute summary statistics and generate MaxAbsScalerModel
val scalerModel = scaler.fit(dataFrame)
// rescale each feature to range [-1, 1]
val scaledData = scalerModel.transform(dataFrame)
scaledData.show()
// $example off$
sc.stop()
}
}
// scalastyle:on println
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment