Skip to content
Snippets Groups Projects
  1. Jul 15, 2016
    • Joseph K. Bradley's avatar
      [SPARK-14817][ML][MLLIB][DOC] Made DataFrame-based API primary in MLlib guide · 5ffd5d38
      Joseph K. Bradley authored
      ## What changes were proposed in this pull request?
      
      Made DataFrame-based API primary
      * Spark doc menu bar and other places now link to ml-guide.html, not mllib-guide.html
      * mllib-guide.html keeps RDD-specific list of features, with a link at the top redirecting people to ml-guide.html
      * ml-guide.html includes a "maintenance mode" announcement about the RDD-based API
        * **Reviewers: please check this carefully**
      * (minor) Titles for DF API no longer include "- spark.ml" suffix.  Titles for RDD API have "- RDD-based API" suffix
      * Moved migration guide to ml-guide from mllib-guide
        * Also moved past guides from mllib-migration-guides to ml-migration-guides, with a redirect link on mllib-migration-guides
        * **Reviewers**: I did not change any of the content of the migration guides.
      
      Reorganized DataFrame-based guide:
      * ml-guide.html mimics the old mllib-guide.html page in terms of content: overview, migration guide, etc.
      * Moved Pipeline description into ml-pipeline.html and moved tuning into ml-tuning.html
        * **Reviewers**: I did not change the content of these guides, except some intro text.
      * Sidebar remains the same, but with pipeline and tuning sections added
      
      Other:
      * ml-classification-regression.html: Moved text about linear methods to new section in page
      
      ## How was this patch tested?
      
      Generated docs locally
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #14213 from jkbradley/ml-guide-2.0.
      5ffd5d38
  2. Jun 11, 2016
    • Dongjoon Hyun's avatar
      [SPARK-15883][MLLIB][DOCS] Fix broken links in mllib documents · ad102af1
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      This issue fixes all broken links on Spark 2.0 preview MLLib documents. Also, this contains some editorial change.
      
      **Fix broken links**
        * mllib-data-types.md
        * mllib-decision-tree.md
        * mllib-ensembles.md
        * mllib-feature-extraction.md
        * mllib-pmml-model-export.md
        * mllib-statistics.md
      
      **Fix malformed section header and scala coding style**
        * mllib-linear-methods.md
      
      **Replace indirect forward links with direct one**
        * ml-classification-regression.md
      
      ## How was this patch tested?
      
      Manual tests (with `cd docs; jekyll build`.)
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #13608 from dongjoon-hyun/SPARK-15883.
      ad102af1
  3. Dec 10, 2015
    • Timothy Hunter's avatar
      [SPARK-12212][ML][DOC] Clarifies the difference between spark.ml, spark.mllib... · 2ecbe02d
      Timothy Hunter authored
      [SPARK-12212][ML][DOC] Clarifies the difference between spark.ml, spark.mllib and mllib in the documentation.
      
      Replaces a number of occurences of `MLlib` in the documentation that were meant to refer to the `spark.mllib` package instead. It should clarify for new users the difference between `spark.mllib` (the package) and MLlib (the umbrella project for ML in spark).
      
      It also removes some files that I forgot to delete with #10207
      
      Author: Timothy Hunter <timhunter@databricks.com>
      
      Closes #10234 from thunterdb/12212.
      2ecbe02d
  4. Nov 13, 2015
  5. Oct 07, 2015
  6. Aug 19, 2015
  7. Mar 20, 2015
    • MechCoder's avatar
      [SPARK-6025] [MLlib] Add helper method evaluateEachIteration to extract learning curve · 25e271d9
      MechCoder authored
      Added evaluateEachIteration to allow the user to manually extract the error for each iteration of GradientBoosting. The internal optimisation can be dealt with later.
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #4906 from MechCoder/spark-6025 and squashes the following commits:
      
      67146ab [MechCoder] Minor
      352001f [MechCoder] Minor
      6e8aa10 [MechCoder] Made the following changes Used mapPartition instead of map Refactored computeError and unpersisted broadcast variables
      bc99ac6 [MechCoder] Refactor the method and stuff
      dbda033 [MechCoder] [SPARK-6025] Add helper method evaluateEachIteration to extract learning curve
      25e271d9
  8. Mar 03, 2015
    • Xiangrui Meng's avatar
      [SPARK-6097][MLLIB] Support tree model save/load in PySpark/MLlib · 7e53a79c
      Xiangrui Meng authored
      Similar to `MatrixFactorizaionModel`, we only need wrappers to support save/load for tree models in Python.
      
      jkbradley
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #4854 from mengxr/SPARK-6097 and squashes the following commits:
      
      4586a4d [Xiangrui Meng] fix more typos
      8ebcac2 [Xiangrui Meng] fix python style
      91172d8 [Xiangrui Meng] fix typos
      201b3b9 [Xiangrui Meng] update user guide
      b5158e2 [Xiangrui Meng] support tree model save/load in PySpark/MLlib
      7e53a79c
  9. Feb 27, 2015
    • Joseph K. Bradley's avatar
      [SPARK-4587] [mllib] [docs] Fixed save,load calls in ML guide examples · d17cb2ba
      Joseph K. Bradley authored
      Should pass spark context to save/load
      
      CC: mengxr
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #4816 from jkbradley/ml-io-doc-fix and squashes the following commits:
      
      83d369d [Joseph K. Bradley] added comment to save,load parts of ML guide examples
      2841170 [Joseph K. Bradley] Fixed save,load calls in ML guide examples
      d17cb2ba
  10. Feb 25, 2015
    • Joseph K. Bradley's avatar
      [SPARK-5974] [SPARK-5980] [mllib] [python] [docs] Update ML guide with save/load, Python GBT · d20559b1
      Joseph K. Bradley authored
      * Add GradientBoostedTrees Python examples to ML guide
        * I ran these in the pyspark shell, and they worked.
      * Add save/load to examples in ML guide
      * Added note to python docs about predict,transform not working within RDD actions,transformations in some cases (See SPARK-5981)
      
      CC: mengxr
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #4750 from jkbradley/SPARK-5974 and squashes the following commits:
      
      c410e38 [Joseph K. Bradley] Added note to LabeledPoint about attributes
      bcae18b [Joseph K. Bradley] Added import of models for save/load examples in ml guide.  Fixed line length for tree.py, feature.py (but not other ML Pyspark files yet).
      6d81c3e [Joseph K. Bradley] completed python GBT examples
      9903309 [Joseph K. Bradley] Added note to python docs about predict,transform not working within RDD actions,transformations in some cases
      c7dfad8 [Joseph K. Bradley] Added model save/load to ML guide.  Added GBT examples to ML guide
      d20559b1
  11. Feb 24, 2015
    • MechCoder's avatar
      [SPARK-5436] [MLlib] Validate GradientBoostedTrees using runWithValidation · 2a0fe348
      MechCoder authored
      One can early stop if the decrease in error rate is lesser than a certain tol or if the error increases if the training data is overfit.
      
      This introduces a new method runWithValidation which takes in a pair of RDD's , one for the training data and the other for the validation.
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #4677 from MechCoder/spark-5436 and squashes the following commits:
      
      1bb21d4 [MechCoder] Combine regression and classification tests into a single one
      e4d799b [MechCoder] Addresses indentation and doc comments
      b48a70f [MechCoder] COSMIT
      b928a19 [MechCoder] Move validation while training section under usage tips
      fad9b6e [MechCoder] Made the following changes 1. Add section to documentation 2. Return corresponding to bestValidationError 3. Allow negative tolerance.
      55e5c3b [MechCoder] One liner for prevValidateError
      3e74372 [MechCoder] TST: Add test for classification
      77549a9 [MechCoder] [SPARK-5436] Validate GradientBoostedTrees using runWithValidation
      2a0fe348
  12. Feb 18, 2015
  13. Dec 03, 2014
    • Joseph K. Bradley's avatar
      [SPARK-4580] [SPARK-4610] [mllib] [docs] Documentation for tree ensembles + DecisionTree API fix · 657a8883
      Joseph K. Bradley authored
      Major changes:
      * Added programming guide sections for tree ensembles
      * Added examples for tree ensembles
      * Updated DecisionTree programming guide with more info on parameters
      * **API change**: Standardized the tree parameter for the number of classes (for classification)
      
      Minor changes:
      * Updated decision tree documentation
      * Updated existing tree and tree ensemble examples
       * Use train/test split, and compute test error instead of training error.
       * Fixed decision_tree_runner.py to actually use the number of classes it computes from data. (small bug fix)
      
      Note: I know this is a lot of lines, but most is covered by:
      * Programming guide sections for gradient boosting and random forests.  (The changes are probably best viewed by generating the docs locally.)
      * New examples (which were copied from the programming guide)
      * The "numClasses" renaming
      
      I have run all examples and relevant unit tests.
      
      CC: mengxr manishamde codedeft
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      Author: Joseph K. Bradley <joseph.kurata.bradley@gmail.com>
      
      Closes #3461 from jkbradley/ensemble-docs and squashes the following commits:
      
      70a75f3 [Joseph K. Bradley] updated forest vs boosting comparison
      d1de753 [Joseph K. Bradley] Added note about toString and toDebugString for DecisionTree to migration guide
      8e87f8f [Joseph K. Bradley] Combined GBT and RandomForest guides into one ensembles guide
      6fab846 [Joseph K. Bradley] small fixes based on review
      b9f8576 [Joseph K. Bradley] updated decision tree doc
      375204c [Joseph K. Bradley] fixed python style
      2b60b6e [Joseph K. Bradley] merged Java RandomForest examples into 1 file.  added header.  Fixed small bug in same example in the programming guide.
      706d332 [Joseph K. Bradley] updated python DT runner to print full model if it is small
      c76c823 [Joseph K. Bradley] added migration guide for mllib
      abe5ed7 [Joseph K. Bradley] added examples for random forest in Java and Python to examples folder
      07fc11d [Joseph K. Bradley] Renamed numClassesForClassification to numClasses everywhere in trees and ensembles. This is a breaking API change, but it was necessary to correct an API inconsistency in Spark 1.1 (where Python DecisionTree used numClasses but Scala used numClassesForClassification).
      cdfdfbc [Joseph K. Bradley] added examples for GBT
      6372a2b [Joseph K. Bradley] updated decision tree examples to use random split.  tested all of them.
      ad3e695 [Joseph K. Bradley] added gbt and random forest to programming guide.  still need to update their examples
      657a8883
Loading