diff --git a/docs/ml-guide.md b/docs/ml-guide.md index 012fbd91e698b91958d3b896b16fca0c78b47c11..1c2e27341473b31f180053efbcf771dd83a034bb 100644 --- a/docs/ml-guide.md +++ b/docs/ml-guide.md @@ -31,7 +31,7 @@ E.g., a learning algorithm is an `Estimator` which trains on a dataset and produ * **[`Pipeline`](ml-guide.html#pipeline)**: A `Pipeline` chains multiple `Transformer`s and `Estimator`s together to specify an ML workflow. -* **[`Param`](ml-guide.html#param)**: All `Transformer`s and `Estimator`s now share a common API for specifying parameters. +* **[`Param`](ml-guide.html#parameters)**: All `Transformer`s and `Estimator`s now share a common API for specifying parameters. ## ML Dataset @@ -134,7 +134,7 @@ Each stage's `transform()` method updates the dataset and passes it to the next Spark ML `Estimator`s and `Transformer`s use a uniform API for specifying parameters. A [`Param`](api/scala/index.html#org.apache.spark.ml.param.Param) is a named parameter with self-contained documentation. -A [`ParamMap`](api/scala/index.html#org.apache.spark.ml.param.ParamMap)] is a set of (parameter, value) pairs. +A [`ParamMap`](api/scala/index.html#org.apache.spark.ml.param.ParamMap) is a set of (parameter, value) pairs. There are two main ways to pass parameters to an algorithm: @@ -148,7 +148,7 @@ This is useful if there are two algorithms with the `maxIter` parameter in a `Pi # Code Examples This section gives code examples illustrating the functionality discussed above. -There is not yet documentation for specific algorithms in Spark ML. For more info, please refer to the [API Documentation](api/scala/index.html). Spark ML algorithms are currently wrappers for MLlib algorithms, and the [MLlib programming guide](mllib-guide.html) has details on specific algorithms. +There is not yet documentation for specific algorithms in Spark ML. For more info, please refer to the [API Documentation](api/scala/index.html#org.apache.spark.ml.package). Spark ML algorithms are currently wrappers for MLlib algorithms, and the [MLlib programming guide](mllib-guide.html) has details on specific algorithms. ## Example: Estimator, Transformer, and Param @@ -492,7 +492,7 @@ The `ParamMap` which produces the best evaluation metric (averaged over the `$k$ `CrossValidator` finally fits the `Estimator` using the best `ParamMap` and the entire dataset. The following example demonstrates using `CrossValidator` to select from a grid of parameters. -To help construct the parameter grid, we use the [`ParamGridBuilder`](api/scala/index.html#org.apache.spark.ml.tuning.ParamGridGuilder) utility. +To help construct the parameter grid, we use the [`ParamGridBuilder`](api/scala/index.html#org.apache.spark.ml.tuning.ParamGridBuilder) utility. Note that cross-validation over a grid of parameters is expensive. E.g., in the example below, the parameter grid has 3 values for `hashingTF.numFeatures` and 2 values for `lr.regParam`, and `CrossValidator` uses 2 folds. This multiplies out to `$(3 \times 2) \times 2 = 12$` different models being trained. diff --git a/examples/src/main/java/org/apache/spark/examples/ml/JavaCrossValidatorExample.java b/examples/src/main/java/org/apache/spark/examples/ml/JavaCrossValidatorExample.java index 3b156fa0482fc6f462884efb10156bfc78c4dadd..f4b4f8d8c7b2fdbf693f34b4964149e54f533476 100644 --- a/examples/src/main/java/org/apache/spark/examples/ml/JavaCrossValidatorExample.java +++ b/examples/src/main/java/org/apache/spark/examples/ml/JavaCrossValidatorExample.java @@ -23,7 +23,6 @@ import com.google.common.collect.Lists; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaSparkContext; -import org.apache.spark.ml.Model; import org.apache.spark.ml.Pipeline; import org.apache.spark.ml.PipelineStage; import org.apache.spark.ml.classification.LogisticRegression; diff --git a/examples/src/main/java/org/apache/spark/examples/ml/JavaSimpleParamsExample.java b/examples/src/main/java/org/apache/spark/examples/ml/JavaSimpleParamsExample.java index cf58f4dfaa15bb7e2a021844911955f5ea1da608..e25b271777ed4f0422a0101e18d4a07789f90136 100644 --- a/examples/src/main/java/org/apache/spark/examples/ml/JavaSimpleParamsExample.java +++ b/examples/src/main/java/org/apache/spark/examples/ml/JavaSimpleParamsExample.java @@ -47,7 +47,7 @@ public class JavaSimpleParamsExample { JavaSQLContext jsql = new JavaSQLContext(jsc); // Prepare training data. - // We use LabeledPoint, which is a case class. Spark SQL can convert RDDs of Java Beans + // We use LabeledPoint, which is a JavaBean. Spark SQL can convert RDDs of JavaBeans // into SchemaRDDs, where it uses the bean metadata to infer the schema. List<LabeledPoint> localTraining = Lists.newArrayList( new LabeledPoint(1.0, Vectors.dense(0.0, 1.1, 0.1)), diff --git a/mllib/src/main/scala/org/apache/spark/ml/param/params.scala b/mllib/src/main/scala/org/apache/spark/ml/param/params.scala index 4b4340af543b0574236fb5631247cacea76d7ab0..04f9cfb1bfc2fdefaf36dfb67ec84a2a7d1d0215 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/param/params.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/param/params.scala @@ -220,7 +220,6 @@ class ParamMap private[ml] (private val map: mutable.Map[Param[Any], Any]) exten /** * Puts a list of param pairs (overwrites if the input params exists). - * Not usable from Java */ @varargs def put(paramPairs: ParamPair[_]*): this.type = {