[SPARK-12630][PYSPARK] [DOC] PySpark classification parameter desc to consistent format

Part of task for [SPARK-11219](https://issues.apache.org/jira/browse/SPARK-11219) to make PySpark MLlib parameter description formatting consistent. This is for the classification module. Author: vijaykiran <mail@vijaykiran.com> Author: Bryan Cutler <cutlerb@gmail.com> Closes #11183 from BryanCutler/pyspark-consistent-param-classification-SPARK-12630.

[SPARK-12630][PYSPARK] [DOC] PySpark classification parameter desc to consistent format
42d65681 · vijaykiran · Xiangrui Meng · 90de6b2f · 42d65681
Commit 42d65681 authored 9 years ago by vijaykiran Committed by Xiangrui Meng 9 years ago
--- a/python/pyspark/mllib/classification.py
+++ b/python/pyspark/mllib/classification.py
@@ -94,16 +94,19 @@ class LogisticRegressionModel(LinearClassificationModel):
    Classification model trained using Multinomial/Binary Logistic
    Regression.
-    :param weights: Weights computed for every feature.
+    :param weights:
-    :param intercept: Intercept computed for this model. (Only used
+      Weights computed for every feature.
-            in Binary Logistic Regression. In Multinomial Logistic
+    :param intercept:
-            Regression, the intercepts will not be a single value,
+      Intercept computed for this model. (Only used in Binary Logistic
-            so the intercepts will be part of the weights.)
+      Regression. In Multinomial Logistic Regression, the intercepts will
-    :param numFeatures: the dimension of the features.
+      not bea single value, so the intercepts will be part of the
-    :param numClasses: the number of possible outcomes for k classes
+      weights.)
-            classification problem in Multinomial Logistic Regression.
+    :param numFeatures:
-            By default, it is binary logistic regression so numClasses
+      The dimension of the features.
-            will be set to 2.
+    :param numClasses:
+      The number of possible outcomes for k classes classification problem
+      in Multinomial Logistic Regression. By default, it is binary
+      logistic regression so numClasses will be set to 2.
    >>> data = [
    ...     LabeledPoint(0.0, [0.0, 1.0]),
@@ -189,8 +192,8 @@ class LogisticRegressionModel(LinearClassificationModel):
    @since('1.4.0')
    def numClasses(self):
        """
-        Number of possible outcomes for k classes classification problem in Multinomial
+        Number of possible outcomes for k classes classification problem
-        Logistic Regression.
+        in Multinomial Logistic Regression.
        """
        return self._numClasses
@@ -272,37 +275,42 @@ class LogisticRegressionWithSGD(object):
        """
        Train a logistic regression model on the given data.
-        :param data:              The training data, an RDD of
+        :param data:
-                                  LabeledPoint.
+          The training data, an RDD of LabeledPoint.
-        :param iterations:        The number of iterations
+        :param iterations:
-                                  (default: 100).
+          The number of iterations.
-        :param step:              The step parameter used in SGD
+          (default: 100)
-                                  (default: 1.0).
+        :param step:
-        :param miniBatchFraction: Fraction of data to be used for each
+          The step parameter used in SGD.
-                                  SGD iteration (default: 1.0).
+          (default: 1.0)
-        :param initialWeights:    The initial weights (default: None).
+        :param miniBatchFraction:
-        :param regParam:          The regularizer parameter
+          Fraction of data to be used for each SGD iteration.
-                                  (default: 0.01).
+          (default: 1.0)
-        :param regType:           The type of regularizer used for
+        :param initialWeights:
-                                  training our model.
+          The initial weights.
+          (default: None)
-                                  :Allowed values:
+        :param regParam:
-                                     - "l1" for using L1 regularization
+          The regularizer parameter.
-                                     - "l2" for using L2 regularization
+          (default: 0.01)
-                                     - None for no regularization
+        :param regType:
+          The type of regularizer used for training our model.
-                                     (default: "l2")
+          Allowed values:
-        :param intercept:         Boolean parameter which indicates the
+            - "l1" for using L1 regularization
-                                  use or not of the augmented representation
+            - "l2" for using L2 regularization (default)
-                                  for training data (i.e. whether bias
+            - None for no regularization
-                                  features are activated or not,
+        :param intercept:
-                                  default: False).
+          Boolean parameter which indicates the use or not of the
-        :param validateData:      Boolean parameter which indicates if
+          augmented representation for training data (i.e., whether bias
-                                  the algorithm should validate data
+          features are activated or not).
-                                  before training. (default: True)
+          (default: False)
-        :param convergenceTol:    A condition which decides iteration termination.
+        :param validateData:
-                                  (default: 0.001)
+          Boolean parameter which indicates if the algorithm should
+          validate data before training.
+          (default: True)
+        :param convergenceTol:
+          A condition which decides iteration termination.
+          (default: 0.001)
        """
        def train(rdd, i):
            return callMLlibFunc("trainLogisticRegressionModelWithSGD", rdd, int(iterations),
@@ -323,38 +331,43 @@ class LogisticRegressionWithLBFGS(object):
        """
        Train a logistic regression model on the given data.
-        :param data:           The training data, an RDD of
+        :param data:
-                               LabeledPoint.
+          The training data, an RDD of LabeledPoint.
-        :param iterations:     The number of iterations
+        :param iterations:
-                               (default: 100).
+          The number of iterations.
-        :param initialWeights: The initial weights (default: None).
+          (default: 100)
-        :param regParam:       The regularizer parameter
+        :param initialWeights:
-                               (default: 0.01).
+          The initial weights.
-        :param regType:        The type of regularizer used for
+          (default: None)
-                               training our model.
+        :param regParam:
+          The regularizer parameter.
-                               :Allowed values:
+          (default: 0.01)
-                                 - "l1" for using L1 regularization
+        :param regType:
-                                 - "l2" for using L2 regularization
+          The type of regularizer used for training our model.
-                                 - None for no regularization
+          Allowed values:
-                                 (default: "l2")
+            - "l1" for using L1 regularization
+            - "l2" for using L2 regularization (default)
-        :param intercept:      Boolean parameter which indicates the
+            - None for no regularization
-                               use or not of the augmented representation
+        :param intercept:
-                               for training data (i.e. whether bias
+          Boolean parameter which indicates the use or not of the
-                               features are activated or not,
+          augmented representation for training data (i.e., whether bias
-                               default: False).
+          features are activated or not).
-        :param corrections:    The number of corrections used in the
+          (default: False)
-                               LBFGS update (default: 10).
+        :param corrections:
-        :param tolerance:      The convergence tolerance of iterations
+          The number of corrections used in the LBFGS update.
-                               for L-BFGS (default: 1e-4).
+          (default: 10)
-        :param validateData:   Boolean parameter which indicates if the
+        :param tolerance:
-                               algorithm should validate data before
+          The convergence tolerance of iterations for L-BFGS.
-                               training. (default: True)
+          (default: 1e-4)
-        :param numClasses:     The number of classes (i.e., outcomes) a
+        :param validateData:
-                               label can take in Multinomial Logistic
+          Boolean parameter which indicates if the algorithm should
-                               Regression (default: 2).
+          validate data before training.
+          (default: True)
+        :param numClasses:
+          The number of classes (i.e., outcomes) a label can take in
+          Multinomial Logistic Regression.
+          (default: 2)
        >>> data = [
        ...     LabeledPoint(0.0, [0.0, 1.0]),
@@ -387,8 +400,10 @@ class SVMModel(LinearClassificationModel):
    """
    Model for Support Vector Machines (SVMs).
-    :param weights: Weights computed for every feature.
+    :param weights:
-    :param intercept: Intercept computed for this model.
+      Weights computed for every feature.
+    :param intercept:
+      Intercept computed for this model.
    >>> data = [
    ...     LabeledPoint(0.0, [0.0]),
@@ -490,37 +505,42 @@ class SVMWithSGD(object):
        """
        Train a support vector machine on the given data.
-        :param data:              The training data, an RDD of
+        :param data:
-                                  LabeledPoint.
+          The training data, an RDD of LabeledPoint.
-        :param iterations:        The number of iterations
+        :param iterations:
-                                  (default: 100).
+          The number of iterations.
-        :param step:              The step parameter used in SGD
+          (default: 100)
-                                  (default: 1.0).
+        :param step:
-        :param regParam:          The regularizer parameter
+          The step parameter used in SGD.
-                                  (default: 0.01).
+          (default: 1.0)
-        :param miniBatchFraction: Fraction of data to be used for each
+        :param regParam:
-                                  SGD iteration (default: 1.0).
+          The regularizer parameter.
-        :param initialWeights:    The initial weights (default: None).
+          (default: 0.01)
-        :param regType:           The type of regularizer used for
+        :param miniBatchFraction:
-                                  training our model.
+          Fraction of data to be used for each SGD iteration.
+          (default: 1.0)
-                                  :Allowed values:
+        :param initialWeights:
-                                     - "l1" for using L1 regularization
+          The initial weights.
-                                     - "l2" for using L2 regularization
+          (default: None)
-                                     - None for no regularization
+        :param regType:
+          The type of regularizer used for training our model.
-                                     (default: "l2")
+          Allowed values:
-        :param intercept:         Boolean parameter which indicates the
+            - "l1" for using L1 regularization
-                                  use or not of the augmented representation
+            - "l2" for using L2 regularization (default)
-                                  for training data (i.e. whether bias
+            - None for no regularization
-                                  features are activated or not,
+        :param intercept:
-                                  default: False).
+          Boolean parameter which indicates the use or not of the
-        :param validateData:      Boolean parameter which indicates if
+          augmented representation for training data (i.e. whether bias
-                                  the algorithm should validate data
+          features are activated or not).
-                                  before training. (default: True)
+          (default: False)
-        :param convergenceTol:    A condition which decides iteration termination.
+        :param validateData:
-                                  (default: 0.001)
+          Boolean parameter which indicates if the algorithm should
+          validate data before training.
+          (default: True)
+        :param convergenceTol:
+          A condition which decides iteration termination.
+          (default: 0.001)
        """
        def train(rdd, i):
            return callMLlibFunc("trainSVMModelWithSGD", rdd, int(iterations), float(step),
@@ -536,11 +556,13 @@ class NaiveBayesModel(Saveable, Loader):
    """
    Model for Naive Bayes classifiers.
-    :param labels: list of labels.
+    :param labels:
-    :param pi: log of class priors, whose dimension is C,
+      List of labels.
-            number of labels.
+    :param pi:
-    :param theta: log of class conditional probabilities, whose
+      Log of class priors, whose dimension is C, number of labels.
-            dimension is C-by-D, where D is number of features.
+    :param theta:
+      Log of class conditional probabilities, whose dimension is C-by-D,
+      where D is number of features.
    >>> data = [
    ...     LabeledPoint(0.0, [0.0, 0.0]),
@@ -639,8 +661,11 @@ class NaiveBayes(object):
        it can also be used as Bernoulli NB (U{http://tinyurl.com/p7c96j6}).
        The input feature values must be nonnegative.
-        :param data: RDD of LabeledPoint.
+        :param data:
-        :param lambda_: The smoothing parameter (default: 1.0).
+          RDD of LabeledPoint.
+        :param lambda_:
+          The smoothing parameter.
+          (default: 1.0)
        """
        first = data.first()
        if not isinstance(first, LabeledPoint):
@@ -652,9 +677,9 @@ class NaiveBayes(object):
 @inherit_doc
 class StreamingLogisticRegressionWithSGD(StreamingLinearAlgorithm):
    """
-    Train or predict a logistic regression model on streaming data. Training uses
+    Train or predict a logistic regression model on streaming data.
-    Stochastic Gradient Descent to update the model based on each new batch of
+    Training uses Stochastic Gradient Descent to update the model based on
-    incoming data from a DStream.
+    each new batch of incoming data from a DStream.
    Each batch of data is assumed to be an RDD of LabeledPoints.
    The number of data points per batch can vary, but the number