Skip to content
Snippets Groups Projects
  1. Jul 15, 2016
    • Joseph K. Bradley's avatar
      [SPARK-14817][ML][MLLIB][DOC] Made DataFrame-based API primary in MLlib guide · 5ffd5d38
      Joseph K. Bradley authored
      ## What changes were proposed in this pull request?
      
      Made DataFrame-based API primary
      * Spark doc menu bar and other places now link to ml-guide.html, not mllib-guide.html
      * mllib-guide.html keeps RDD-specific list of features, with a link at the top redirecting people to ml-guide.html
      * ml-guide.html includes a "maintenance mode" announcement about the RDD-based API
        * **Reviewers: please check this carefully**
      * (minor) Titles for DF API no longer include "- spark.ml" suffix.  Titles for RDD API have "- RDD-based API" suffix
      * Moved migration guide to ml-guide from mllib-guide
        * Also moved past guides from mllib-migration-guides to ml-migration-guides, with a redirect link on mllib-migration-guides
        * **Reviewers**: I did not change any of the content of the migration guides.
      
      Reorganized DataFrame-based guide:
      * ml-guide.html mimics the old mllib-guide.html page in terms of content: overview, migration guide, etc.
      * Moved Pipeline description into ml-pipeline.html and moved tuning into ml-tuning.html
        * **Reviewers**: I did not change the content of these guides, except some intro text.
      * Sidebar remains the same, but with pipeline and tuning sections added
      
      Other:
      * ml-classification-regression.html: Moved text about linear methods to new section in page
      
      ## How was this patch tested?
      
      Generated docs locally
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #14213 from jkbradley/ml-guide-2.0.
      5ffd5d38
  2. Feb 22, 2016
  3. Feb 16, 2016
  4. Dec 10, 2015
    • Timothy Hunter's avatar
      [SPARK-12212][ML][DOC] Clarifies the difference between spark.ml, spark.mllib... · 2ecbe02d
      Timothy Hunter authored
      [SPARK-12212][ML][DOC] Clarifies the difference between spark.ml, spark.mllib and mllib in the documentation.
      
      Replaces a number of occurences of `MLlib` in the documentation that were meant to refer to the `spark.mllib` package instead. It should clarify for new users the difference between `spark.mllib` (the package) and MLlib (the umbrella project for ML in spark).
      
      It also removes some files that I forgot to delete with #10207
      
      Author: Timothy Hunter <timhunter@databricks.com>
      
      Closes #10234 from thunterdb/12212.
      2ecbe02d
  5. Nov 09, 2015
  6. Oct 15, 2015
  7. Oct 07, 2015
  8. Jul 01, 2015
    • Yuhao Yang's avatar
      [SPARK-8308] [MLLIB] add missing save load for python example · 20129133
      Yuhao Yang authored
      jira: https://issues.apache.org/jira/browse/SPARK-8308
      
      1. add some missing save/load in python examples. , LogisticRegression, LinearRegression and NaiveBayes
      2. tune down iterations for MatrixFactorization, since current number will trigger StackOverflow for default java configuration (>1M)
      
      Author: Yuhao Yang <hhbyyh@gmail.com>
      
      Closes #6760 from hhbyyh/docUpdate and squashes the following commits:
      
      9bd3383 [Yuhao Yang] update scala example
      8a44692 [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into docUpdate
      077cbb8 [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into docUpdate
      3e948dc [Yuhao Yang] add missing save load for python example
      20129133
  9. May 26, 2015
    • Mike Dusenberry's avatar
      [SPARK-7883] [DOCS] [MLLIB] Fixing broken trainImplicit Scala example in MLlib... · 0463428b
      Mike Dusenberry authored
      [SPARK-7883] [DOCS] [MLLIB] Fixing broken trainImplicit Scala example in MLlib Collaborative Filtering documentation.
      
      Fixing broken trainImplicit Scala example in MLlib Collaborative Filtering documentation to match one of the possible ALS.trainImplicit function signatures.
      
      Author: Mike Dusenberry <dusenberrymw@gmail.com>
      
      Closes #6422 from dusenberrymw/Fix_MLlib_Collab_Filtering_trainImplicit_Example and squashes the following commits:
      
      36492f4 [Mike Dusenberry] Fixing broken trainImplicit example in MLlib Collaborative Filtering documentation to match one of the possible ALS.trainImplicit function signatures.
      0463428b
  10. May 01, 2015
    • MechCoder's avatar
      [SPARK-6257] [PYSPARK] [MLLIB] MLlib API missing items in Recommendation · c24aeb6a
      MechCoder authored
      Adds
      
      rank, recommendUsers and RecommendProducts to MatrixFactorizationModel in PySpark.
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #5807 from MechCoder/spark-6257 and squashes the following commits:
      
      09629c6 [MechCoder] doc
      953b326 [MechCoder] [SPARK-6257] MLlib API missing items in Recommendation
      c24aeb6a
  11. Mar 01, 2015
    • Xiangrui Meng's avatar
      [SPARK-6053][MLLIB] support save/load in PySpark's ALS · aedbbaa3
      Xiangrui Meng authored
      A simple wrapper to save/load `MatrixFactorizationModel` in Python. jkbradley
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #4811 from mengxr/SPARK-5991 and squashes the following commits:
      
      f135dac [Xiangrui Meng] update save doc
      57e5200 [Xiangrui Meng] address comments
      06140a4 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-5991
      282ec8d [Xiangrui Meng] support save/load in PySpark's ALS
      aedbbaa3
  12. Feb 27, 2015
    • Joseph K. Bradley's avatar
      [SPARK-4587] [mllib] [docs] Fixed save,load calls in ML guide examples · d17cb2ba
      Joseph K. Bradley authored
      Should pass spark context to save/load
      
      CC: mengxr
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #4816 from jkbradley/ml-io-doc-fix and squashes the following commits:
      
      83d369d [Joseph K. Bradley] added comment to save,load parts of ML guide examples
      2841170 [Joseph K. Bradley] Fixed save,load calls in ML guide examples
      d17cb2ba
  13. Feb 25, 2015
    • Joseph K. Bradley's avatar
      [SPARK-5974] [SPARK-5980] [mllib] [python] [docs] Update ML guide with save/load, Python GBT · d20559b1
      Joseph K. Bradley authored
      * Add GradientBoostedTrees Python examples to ML guide
        * I ran these in the pyspark shell, and they worked.
      * Add save/load to examples in ML guide
      * Added note to python docs about predict,transform not working within RDD actions,transformations in some cases (See SPARK-5981)
      
      CC: mengxr
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #4750 from jkbradley/SPARK-5974 and squashes the following commits:
      
      c410e38 [Joseph K. Bradley] Added note to LabeledPoint about attributes
      bcae18b [Joseph K. Bradley] Added import of models for save/load examples in ml guide.  Fixed line length for tree.py, feature.py (but not other ML Pyspark files yet).
      6d81c3e [Joseph K. Bradley] completed python GBT examples
      9903309 [Joseph K. Bradley] Added note to python docs about predict,transform not working within RDD actions,transformations in some cases
      c7dfad8 [Joseph K. Bradley] Added model save/load to ML guide.  Added GBT examples to ML guide
      d20559b1
  14. Jan 27, 2015
    • Davies Liu's avatar
      [MLlib] fix python example of ALS in guide · fdaad4eb
      Davies Liu authored
      fix python example of ALS in guide, use Rating instead of np.array.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4226 from davies/fix_als_guide and squashes the following commits:
      
      1433d76 [Davies Liu] fix python example of als in guide
      fdaad4eb
  15. Oct 14, 2014
    • Sean Owen's avatar
      SPARK-1307 [DOCS] Don't use term 'standalone' to refer to a Spark Application · 18ab6bd7
      Sean Owen authored
      HT to Diana, just proposing an implementation of her suggestion, which I rather agreed with. Is there a second/third for the motion?
      
      Refer to "self-contained" rather than "standalone" apps to avoid confusion with standalone deployment mode. And fix placement of reference to this in MLlib docs.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #2787 from srowen/SPARK-1307 and squashes the following commits:
      
      b5b82e2 [Sean Owen] Refer to "self-contained" rather than "standalone" apps to avoid confusion with standalone deployment mode. And fix placement of reference to this in MLlib docs.
      18ab6bd7
  16. Aug 20, 2014
  17. Aug 12, 2014
    • Ameet Talwalkar's avatar
      SPARK-2830 [MLlib]: re-organize mllib documentation · c235b83e
      Ameet Talwalkar authored
      As per discussions with Xiangrui, I've reorganized and edited the mllib documentation.
      
      Author: Ameet Talwalkar <atalwalkar@gmail.com>
      
      Closes #1908 from atalwalkar/master and squashes the following commits:
      
      fe6938a [Ameet Talwalkar] made xiangruis suggested changes
      840028b [Ameet Talwalkar] made xiangruis suggested changes
      7ec366a [Ameet Talwalkar] reorganize and edit mllib documentation
      c235b83e
  18. Jul 20, 2014
    • Michael Giannakopoulos's avatar
      [SPARK-1945][MLLIB] Documentation Improvements for Spark 1.0 · db56f2df
      Michael Giannakopoulos authored
      Standalone application examples are added to 'mllib-linear-methods.md' file written in Java.
      This commit is related to the issue [Add full Java Examples in MLlib docs](https://issues.apache.org/jira/browse/SPARK-1945).
      Also I changed the name of the sigmoid function from 'logit' to 'f'. This is because the logit function
      is the inverse of sigmoid.
      
      Thanks,
      Michael
      
      Author: Michael Giannakopoulos <miccagiann@gmail.com>
      
      Closes #1311 from miccagiann/master and squashes the following commits:
      
      8ffe5ab [Michael Giannakopoulos] Update code so as to comply with code standards.
      f7ad5cc [Michael Giannakopoulos] Merge remote-tracking branch 'upstream/master'
      38d92c7 [Michael Giannakopoulos] Adding PCA, SVD and LBFGS examples in Java. Performing minor updates in the already committed examples so as to eradicate the call of 'productElement' function whenever is possible.
      cc0a089 [Michael Giannakopoulos] Modyfied Java examples so as to comply with coding standards.
      b1141b2 [Michael Giannakopoulos] Added Java examples for Clustering and Collaborative Filtering [mllib-clustering.md & mllib-collaborative-filtering.md].
      837f7a8 [Michael Giannakopoulos] Merge remote-tracking branch 'upstream/master'
      15f0eb4 [Michael Giannakopoulos] Java examples included in 'mllib-linear-methods.md' file.
      db56f2df
  19. Jul 13, 2014
    • Sean Owen's avatar
      SPARK-2363. Clean MLlib's sample data files · 635888cb
      Sean Owen authored
      (Just made a PR for this, mengxr was the reporter of:)
      
      MLlib has sample data under serveral folders:
      1) data/mllib
      2) data/
      3) mllib/data/*
      Per previous discussion with Matei Zaharia, we want to put them under `data/mllib` and clean outdated files.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #1394 from srowen/SPARK-2363 and squashes the following commits:
      
      54313dd [Sean Owen] Move ML example data from /mllib/data/ and /data/ into /data/mllib/
      635888cb
  20. May 18, 2014
    • Xiangrui Meng's avatar
      [WIP][SPARK-1871][MLLIB] Improve MLlib guide for v1.0 · df0aa835
      Xiangrui Meng authored
      Some improvements to MLlib guide:
      
      1. [SPARK-1872] Update API links for unidoc.
      2. [SPARK-1783] Added `page.displayTitle` to the global layout. If it is defined, use it instead of `page.title` for title display.
      3. Add more Java/Python examples.
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #816 from mengxr/mllib-doc and squashes the following commits:
      
      ec2e407 [Xiangrui Meng] format scala example for ALS
      cd9f40b [Xiangrui Meng] add a paragraph to summarize distributed matrix types
      4617f04 [Xiangrui Meng] add python example to loadLibSVMFile and fix Java example
      d6509c2 [Xiangrui Meng] [SPARK-1783] update mllib titles
      561fdc0 [Xiangrui Meng] add a displayTitle option to global layout
      195d06f [Xiangrui Meng] add Java example for summary stats and minor fix
      9f1ff89 [Xiangrui Meng] update java api links in mllib-basics
      7dad18e [Xiangrui Meng] update java api links in NB
      3a0f4a6 [Xiangrui Meng] api/pyspark -> api/python
      35bdeb9 [Xiangrui Meng] api/mllib -> api/scala
      e4afaa8 [Xiangrui Meng] explicity state what might change
      df0aa835
  21. May 06, 2014
    • Sean Owen's avatar
      SPARK-1727. Correct small compile errors, typos, and markdown issues in (primarly) MLlib docs · 25ad8f93
      Sean Owen authored
      While play-testing the Scala and Java code examples in the MLlib docs, I noticed a number of small compile errors, and some typos. This led to finding and fixing a few similar items in other docs.
      
      Then in the course of building the site docs to check the result, I found a few small suggestions for the build instructions. I also found a few more formatting and markdown issues uncovered when I accidentally used maruku instead of kramdown.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #653 from srowen/SPARK-1727 and squashes the following commits:
      
      6e7c38a [Sean Owen] Final doc updates - one more compile error, and use of mean instead of sum and count
      8f5e847 [Sean Owen] Fix markdown syntax issues that maruku flags, even though we use kramdown (but only those that do not affect kramdown's output)
      99966a9 [Sean Owen] Update issue tracker URL in docs
      23c9ac3 [Sean Owen] Add Scala Naive Bayes example, to use existing example data file (whose format needed a tweak)
      8c81982 [Sean Owen] Fix small compile errors and typos across MLlib docs
      25ad8f93
  22. Apr 22, 2014
    • Xiangrui Meng's avatar
      [SPARK-1506][MLLIB] Documentation improvements for MLlib 1.0 · 26d35f3f
      Xiangrui Meng authored
      Preview: http://54.82.240.23:4000/mllib-guide.html
      
      Table of contents:
      
      * Basics
        * Data types
        * Summary statistics
      * Classification and regression
        * linear support vector machine (SVM)
        * logistic regression
        * linear linear squares, Lasso, and ridge regression
        * decision tree
        * naive Bayes
      * Collaborative Filtering
        * alternating least squares (ALS)
      * Clustering
        * k-means
      * Dimensionality reduction
        * singular value decomposition (SVD)
        * principal component analysis (PCA)
      * Optimization
        * stochastic gradient descent
        * limited-memory BFGS (L-BFGS)
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #422 from mengxr/mllib-doc and squashes the following commits:
      
      944e3a9 [Xiangrui Meng] merge master
      f9fda28 [Xiangrui Meng] minor
      9474065 [Xiangrui Meng] add alpha to ALS examples
      928e630 [Xiangrui Meng] initialization_mode -> initializationMode
      5bbff49 [Xiangrui Meng] add imports to labeled point examples
      c17440d [Xiangrui Meng] fix python nb example
      28f40dc [Xiangrui Meng] remove localhost:4000
      369a4d3 [Xiangrui Meng] Merge branch 'master' into mllib-doc
      7dc95cc [Xiangrui Meng] update linear methods
      053ad8a [Xiangrui Meng] add links to go back to the main page
      abbbf7e [Xiangrui Meng] update ALS argument names
      648283e [Xiangrui Meng] level down statistics
      14e2287 [Xiangrui Meng] add sample libsvm data and use it in guide
      8cd2441 [Xiangrui Meng] minor updates
      186ab07 [Xiangrui Meng] update section names
      6568d65 [Xiangrui Meng] update toc, level up lr and svm
      162ee12 [Xiangrui Meng] rename section names
      5c1e1b1 [Xiangrui Meng] minor
      8aeaba1 [Xiangrui Meng] wrap long lines
      6ce6a6f [Xiangrui Meng] add summary statistics to toc
      5760045 [Xiangrui Meng] claim beta
      cc604bf [Xiangrui Meng] remove classification and regression
      92747b3 [Xiangrui Meng] make section titles consistent
      e605dd6 [Xiangrui Meng] add LIBSVM loader
      f639674 [Xiangrui Meng] add python section to migration guide
      c82ffb4 [Xiangrui Meng] clean optimization
      31660eb [Xiangrui Meng] update linear algebra and stat
      0a40837 [Xiangrui Meng] first pass over linear methods
      1fc8271 [Xiangrui Meng] update toc
      906ed0a [Xiangrui Meng] add a python example to naive bayes
      5f0a700 [Xiangrui Meng] update collaborative filtering
      656d416 [Xiangrui Meng] update mllib-clustering
      86e143a [Xiangrui Meng] remove data types section from main page
      8d1a128 [Xiangrui Meng] move part of linear algebra to data types and add Java/Python examples
      d1b5cbf [Xiangrui Meng] merge master
      72e4804 [Xiangrui Meng] one pass over tree guide
      64f8995 [Xiangrui Meng] move decision tree guide to a separate file
      9fca001 [Xiangrui Meng] add first version of linear algebra guide
      53c9552 [Xiangrui Meng] update dependencies
      f316ec2 [Xiangrui Meng] add migration guide
      f399f6c [Xiangrui Meng] move linear-algebra to dimensionality-reduction
      182460f [Xiangrui Meng] add guide for naive Bayes
      137fd1d [Xiangrui Meng] re-organize toc
      a61e434 [Xiangrui Meng] update mllib's toc
      26d35f3f
  23. Apr 21, 2014
    • Matei Zaharia's avatar
      [SPARK-1439, SPARK-1440] Generate unified Scaladoc across projects and Javadocs · fc783847
      Matei Zaharia authored
      I used the sbt-unidoc plugin (https://github.com/sbt/sbt-unidoc) to create a unified Scaladoc of our public packages, and generate Javadocs as well. One limitation is that I haven't found an easy way to exclude packages in the Javadoc; there is a SBT task that identifies Java sources to run javadoc on, but it's been very difficult to modify it from outside to change what is set in the unidoc package. Some SBT-savvy people should help with this. The Javadoc site also lacks package-level descriptions and things like that, so we may want to look into that. We may decide not to post these right now if it's too limited compared to the Scala one.
      
      Example of the built doc site: http://people.csail.mit.edu/matei/spark-unified-docs/
      
      Author: Matei Zaharia <matei@databricks.com>
      
      This patch had conflicts when merged, resolved by
      Committer: Patrick Wendell <pwendell@gmail.com>
      
      Closes #457 from mateiz/better-docs and squashes the following commits:
      
      a63d4a3 [Matei Zaharia] Skip Java/Scala API docs for Python package
      5ea1f43 [Matei Zaharia] Fix links to Java classes in Java guide, fix some JS for scrolling to anchors on page load
      f05abc0 [Matei Zaharia] Don't include java.lang package names
      995e992 [Matei Zaharia] Skip internal packages and class names with $ in JavaDoc
      a14a93c [Matei Zaharia] typo
      76ce64d [Matei Zaharia] Add groups to Javadoc index page, and a first package-info.java
      ed6f994 [Matei Zaharia] Generate JavaDoc as well, add titles, update doc site to use unified docs
      acb993d [Matei Zaharia] Add Unidoc plugin for the projects we want Unidoced
      fc783847
  24. Feb 08, 2014
    • Martin Jaggi's avatar
      Merge pull request #552 from martinjaggi/master. Closes #552. · fabf1749
      Martin Jaggi authored
      tex formulas in the documentation
      
      using mathjax.
      and spliting the MLlib documentation by techniques
      
      see jira
      https://spark-project.atlassian.net/browse/MLLIB-19
      and
      https://github.com/shivaram/spark/compare/mathjax
      
      Author: Martin Jaggi <m.jaggi@gmail.com>
      
      == Merge branch commits ==
      
      commit 0364bfabbfc347f917216057a20c39b631842481
      Author: Martin Jaggi <m.jaggi@gmail.com>
      Date:   Fri Feb 7 03:19:38 2014 +0100
      
          minor polishing, as suggested by @pwendell
      
      commit dcd2142c164b2f602bf472bb152ad55bae82d31a
      Author: Martin Jaggi <m.jaggi@gmail.com>
      Date:   Thu Feb 6 18:04:26 2014 +0100
      
          enabling inline latex formulas with $.$
      
          same mathjax configuration as used in math.stackexchange.com
      
          sample usage in the linear algebra (SVD) documentation
      
      commit bbafafd2b497a5acaa03a140bb9de1fbb7d67ffa
      Author: Martin Jaggi <m.jaggi@gmail.com>
      Date:   Thu Feb 6 17:31:29 2014 +0100
      
          split MLlib documentation by techniques
      
          and linked from the main mllib-guide.md site
      
      commit d1c5212b93c67436543c2d8ddbbf610fdf0a26eb
      Author: Martin Jaggi <m.jaggi@gmail.com>
      Date:   Thu Feb 6 16:59:43 2014 +0100
      
          enable mathjax formula in the .md documentation files
      
          code by @shivaram
      
      commit d73948db0d9bc36296054e79fec5b1a657b4eab4
      Author: Martin Jaggi <m.jaggi@gmail.com>
      Date:   Thu Feb 6 16:57:23 2014 +0100
      
          minor update on how to compile the documentation
      fabf1749
Loading