Skip to content
Snippets Groups Projects
  1. Aug 05, 2016
    • Bryan Cutler's avatar
      [SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs · 180fd3e0
      Bryan Cutler authored
      ## What changes were proposed in this pull request?
      Improve example outputs to better reflect the functionality that is being presented.  This mostly consisted of modifying what was printed at the end of the example, such as calling show() with truncate=False, but sometimes required minor tweaks in the example data to get relevant output.  Explicitly set parameters when they are used as part of the example.  Fixed Java examples that failed to run because of using old-style MLlib Vectors or problem with schema.  Synced examples between different APIs.
      
      ## How was this patch tested?
      Ran each example for Scala, Python, and Java and made sure output was legible on a terminal of width 100.
      
      Author: Bryan Cutler <cutlerb@gmail.com>
      
      Closes #14308 from BryanCutler/ml-examples-improve-output-SPARK-16260.
      180fd3e0
  2. Jul 02, 2016
    • WeichenXu's avatar
      [GRAPHX][EXAMPLES] move graphx test data directory and update graphx document · 192d1f9c
      WeichenXu authored
      ## What changes were proposed in this pull request?
      
      There are two test data files used for graphx examples existing in directory "graphx/data"
      I move it into "data/" directory because the "graphx" directory is used for code files and other test data files (such as mllib, streaming test data) are all in there.
      
      I also update the graphx document where reference the data files which I move place.
      
      ## How was this patch tested?
      
      N/A
      
      Author: WeichenXu <WeichenXu123@outlook.com>
      
      Closes #14010 from WeichenXu123/move_graphx_data_dir.
      192d1f9c
  3. Jun 16, 2016
    • WeichenXu's avatar
      [SPARK-15608][ML][EXAMPLES][DOC] add examples and documents of ml.isotonic regression · 9040d83b
      WeichenXu authored
      ## What changes were proposed in this pull request?
      
      add ml doc for ml isotonic regression
      add scala example for ml isotonic regression
      add java example for ml isotonic regression
      add python example for ml isotonic regression
      
      modify scala example for mllib isotonic regression
      modify java example for mllib isotonic regression
      modify python example for mllib isotonic regression
      
      add data/mllib/sample_isotonic_regression_libsvm_data.txt
      delete data/mllib/sample_isotonic_regression_data.txt
      ## How was this patch tested?
      
      N/A
      
      Author: WeichenXu <WeichenXu123@outlook.com>
      
      Closes #13381 from WeichenXu123/add_isotonic_regression_doc.
      9040d83b
  4. May 27, 2016
    • wm624@hotmail.com's avatar
      [SPARK-15449][MLLIB][EXAMPLE] Wrong Data Format - Documentation Issue · 5d4dafe8
      wm624@hotmail.com authored
      ## What changes were proposed in this pull request?
      
      (Please fill in changes proposed in this fix)
      In the MLLib naivebayes example, scala and python example doesn't use libsvm data, but Java does.
      
      I make changes in scala and python example to use the libsvm data as the same as Java example.
      
      ## How was this patch tested?
      
      Manual tests
      
      Author: wm624@hotmail.com <wm624@hotmail.com>
      
      Closes #13301 from wangmiao1981/example.
      5d4dafe8
  5. May 11, 2016
    • Zheng RuiFeng's avatar
      [SPARK-15150][EXAMPLE][DOC] Update LDA examples · d88afabd
      Zheng RuiFeng authored
      ## What changes were proposed in this pull request?
      1,create a libsvm-type dataset for lda: `data/mllib/sample_lda_libsvm_data.txt`
      2,add python example
      3,directly read the datafile in examples
      4,BTW, change to `SparkSession` in `aft_survival_regression.py`
      
      ## How was this patch tested?
      manual tests
      `./bin/spark-submit examples/src/main/python/ml/lda_example.py`
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #12927 from zhengruifeng/lda_pe.
      d88afabd
    • Zheng RuiFeng's avatar
      [SPARK-14340][EXAMPLE][DOC] Update Examples and User Guide for ml.BisectingKMeans · cef73b56
      Zheng RuiFeng authored
      ## What changes were proposed in this pull request?
      
      1, add BisectingKMeans to ml-clustering.md
      2, add the missing Scala BisectingKMeansExample
      3, create a new datafile `data/mllib/sample_kmeans_data.txt`
      
      ## How was this patch tested?
      
      manual tests
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #11844 from zhengruifeng/doc_bkm.
      cef73b56
  6. Mar 03, 2016
  7. Feb 16, 2016
  8. Dec 18, 2015
  9. Jul 18, 2015
    • Paweł Kozikowski's avatar
      [MLLIB] [DOC] Seed fix in mllib naive bayes example · b9ef7ac9
      Paweł Kozikowski authored
      Previous seed resulted in empty test data set.
      
      Author: Paweł Kozikowski <mupakoz@gmail.com>
      
      Closes #7477 from mupakoz/patch-1 and squashes the following commits:
      
      f5d41ee [Paweł Kozikowski] Mllib Naive Bayes example data set enlarged
      b9ef7ac9
  10. Jul 02, 2015
  11. May 22, 2015
    • Ram Sriharsha's avatar
      [SPARK-7574] [ML] [DOC] User guide for OneVsRest · 509d55ab
      Ram Sriharsha authored
      Including Iris Dataset (after shuffling and relabeling 3 -> 0 to confirm to 0 -> numClasses-1 labeling). Could not find an existing dataset in data/mllib for multiclass classification.
      
      Author: Ram Sriharsha <rsriharsha@hw11853.local>
      
      Closes #6296 from harsha2010/SPARK-7574 and squashes the following commits:
      
      645427c [Ram Sriharsha] cleanup
      46c41b1 [Ram Sriharsha] cleanup
      2f76295 [Ram Sriharsha] Code Review Fixes
      ebdf103 [Ram Sriharsha] Java Example
      c026613 [Ram Sriharsha] Code Review fixes
      4b7d1a6 [Ram Sriharsha] minor cleanup
      13bed9c [Ram Sriharsha] add wikipedia link
      bb9dbfa [Ram Sriharsha] Clean up naming
      6f90db1 [Ram Sriharsha] [SPARK-7574][ml][doc] User guide for OneVsRest
      509d55ab
  12. Feb 23, 2015
    • Jacky Li's avatar
      [SPARK-5939][MLLib] make FPGrowth example app take parameters · 651a1c01
      Jacky Li authored
      Add parameter parsing in FPGrowth example app in Scala and Java
      And a sample data file is added in data/mllib folder
      
      Author: Jacky Li <jacky.likun@huawei.com>
      
      Closes #4714 from jackylk/parameter and squashes the following commits:
      
      8c478b3 [Jacky Li] fix according to comments
      3bb74f6 [Jacky Li] make FPGrowth exampl app take parameters
      f0e4d10 [Jacky Li] make FPGrowth exampl app take parameters
      651a1c01
  13. Feb 20, 2015
    • Joseph K. Bradley's avatar
      [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] Doc cleanups for 1.3 release · 4a17eedb
      Joseph K. Bradley authored
      For SPARK-5867:
      * The spark.ml programming guide needs to be updated to use the new SQL DataFrame API instead of the old SchemaRDD API.
      * It should also include Python examples now.
      
      For SPARK-5892:
      * Fix Python docs
      * Various other cleanups
      
      BTW, I accidentally merged this with master.  If you want to compile it on your own, use this branch which is based on spark/branch-1.3 and cherry-picks the commits from this PR: [https://github.com/jkbradley/spark/tree/doc-review-1.3-check]
      
      CC: mengxr  (ML),  davies  (Python docs)
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #4675 from jkbradley/doc-review-1.3 and squashes the following commits:
      
      f191bb0 [Joseph K. Bradley] small cleanups
      e786efa [Joseph K. Bradley] small doc corrections
      6b1ab4a [Joseph K. Bradley] fixed python lint test
      946affa [Joseph K. Bradley] Added sample data for ml.MovieLensALS example.  Changed spark.ml Java examples to use DataFrames API instead of sql()
      da81558 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into doc-review-1.3
      629dbf5 [Joseph K. Bradley] Updated based on code review: * made new page for old migration guides * small fixes * moved inherit_doc in python
      b9df7c4 [Joseph K. Bradley] Small cleanups: toDF to toDF(), adding s for string interpolation
      34b067f [Joseph K. Bradley] small doc correction
      da16aef [Joseph K. Bradley] Fixed python mllib docs
      8cce91c [Joseph K. Bradley] GMM: removed old imports, added some doc
      695f3f6 [Joseph K. Bradley] partly done trying to fix inherit_doc for class hierarchies in python docs
      a72c018 [Joseph K. Bradley] made ChiSqTestResult appear in python docs
      b05a80d [Joseph K. Bradley] organize imports. doc cleanups
      e572827 [Joseph K. Bradley] updated programming guide for ml and mllib
      4a17eedb
  14. Feb 15, 2015
    • martinzapletal's avatar
      [MLLIB][SPARK-5502] User guide for isotonic regression · 61eb1267
      martinzapletal authored
      User guide for isotonic regression added to docs/mllib-regression.md including code examples for Scala and Java.
      
      Author: martinzapletal <zapletal-martin@email.cz>
      
      Closes #4536 from zapletal-martin/SPARK-5502 and squashes the following commits:
      
      67fe773 [martinzapletal] SPARK-5502 reworded model prediction rules to use more general language rather than the code/implementation specific terms
      80bd4c3 [martinzapletal] SPARK-5502 created docs page for isotonic regression, added links to the page, updated data and examples
      7d8136e [martinzapletal] SPARK-5502 Added documentation for Isotonic regression including examples for Scala and Java
      504b5c3 [martinzapletal] SPARK-5502 Added documentation for Isotonic regression including examples for Scala and Java
      61eb1267
  15. Feb 09, 2015
    • Xiangrui Meng's avatar
      [SPARK-5539][MLLIB] LDA guide · 855d12ac
      Xiangrui Meng authored
      This is the LDA user guide from jkbradley with Java and Scala code example.
      
      Author: Xiangrui Meng <meng@databricks.com>
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #4465 from mengxr/lda-guide and squashes the following commits:
      
      6dcb7d1 [Xiangrui Meng] update java example in the user guide
      76169ff [Xiangrui Meng] update java example
      36c3ae2 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into lda-guide
      c2a1efe [Joseph K. Bradley] Added LDA programming guide, plus Java example (which is in the guide and probably should be removed).
      855d12ac
  16. Feb 06, 2015
    • Travis Galoppo's avatar
      [SPARK-5013] [MLlib] Added documentation and sample data file for GaussianMixture · 9ad56ad2
      Travis Galoppo authored
      Simple description and code samples (and sample data) for GaussianMixture
      
      Author: Travis Galoppo <tjg2107@columbia.edu>
      
      Closes #4401 from tgaloppo/spark-5013 and squashes the following commits:
      
      c9ff9a5 [Travis Galoppo] Fixed link in mllib-clustering.md Added Gaussian mixture and power iteration as available clustering techniques in mllib-guide
      2368690 [Travis Galoppo] Minor fixes
      3eb41fa [Travis Galoppo] [SPARK-5013] Added documentation and sample data file for GaussianMixture
      9ad56ad2
  17. Jul 13, 2014
    • Sean Owen's avatar
      SPARK-2363. Clean MLlib's sample data files · 635888cb
      Sean Owen authored
      (Just made a PR for this, mengxr was the reporter of:)
      
      MLlib has sample data under serveral folders:
      1) data/mllib
      2) data/
      3) mllib/data/*
      Per previous discussion with Matei Zaharia, we want to put them under `data/mllib` and clean outdated files.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #1394 from srowen/SPARK-2363 and squashes the following commits:
      
      54313dd [Sean Owen] Move ML example data from /mllib/data/ and /data/ into /data/mllib/
      635888cb
  18. May 19, 2014
    • Xiangrui Meng's avatar
      [SPARK-1874][MLLIB] Clean up MLlib sample data · bcb9dce6
      Xiangrui Meng authored
      1. Added synthetic datasets for `MovieLensALS`, `LinearRegression`, `BinaryClassification`.
      2. Embedded instructions in the help message of those example apps.
      
      Per discussion with Matei on the JIRA page, new example data is under `data/mllib`.
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #833 from mengxr/mllib-sample-data and squashes the following commits:
      
      59f0a18 [Xiangrui Meng] add sample binary classification data
      3c2f92f [Xiangrui Meng] add linear regression data
      050f1ca [Xiangrui Meng] add a sample dataset for MovieLensALS example
      bcb9dce6
  19. Sep 22, 2013
Loading