Skip to content
Snippets Groups Projects
  1. Jan 10, 2017
    • Peng, Meng's avatar
      [SPARK-17645][MLLIB][ML][FOLLOW-UP] document minor change · 32286ba6
      Peng, Meng authored
      ## What changes were proposed in this pull request?
      Add FDR test case in ml/feature/ChiSqSelectorSuite.
      Improve some comments in the code.
      This is a follow-up pr for #15212.
      
      ## How was this patch tested?
      ut
      
      Author: Peng, Meng <peng.meng@intel.com>
      
      Closes #16434 from mpjlu/fdr_fwe_update.
      Unverified
      32286ba6
  2. Dec 28, 2016
    • Peng's avatar
      [SPARK-17645][MLLIB][ML] add feature selector method based on: False Discovery... · 79ff8536
      Peng authored
      [SPARK-17645][MLLIB][ML] add feature selector method based on: False Discovery Rate (FDR) and Family wise error rate (FWE)
      
      ## What changes were proposed in this pull request?
      
      Univariate feature selection works by selecting the best features based on univariate statistical tests.
      FDR and FWE are a popular univariate statistical test for feature selection.
      In 2005, the Benjamini and Hochberg paper on FDR was identified as one of the 25 most-cited statistical papers. The FDR uses the Benjamini-Hochberg procedure in this PR. https://en.wikipedia.org/wiki/False_discovery_rate.
      In statistics, FWE is the probability of making one or more false discoveries, or type I errors, among all the hypotheses when performing multiple hypotheses tests.
      https://en.wikipedia.org/wiki/Family-wise_error_rate
      
      We add  FDR and FWE methods for ChiSqSelector in this PR, like it is implemented in scikit-learn.
      http://scikit-learn.org/stable/modules/feature_selection.html#univariate-feature-selection
      ## How was this patch tested?
      
      ut will be added soon
      
      (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
      
      (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
      
      Author: Peng <peng.meng@intel.com>
      Author: Peng, Meng <peng.meng@intel.com>
      
      Closes #15212 from mpjlu/fdr_fwe.
      79ff8536
  3. Nov 01, 2016
    • Joseph K. Bradley's avatar
      [SPARK-18088][ML] Various ChiSqSelector cleanups · 91c33a0c
      Joseph K. Bradley authored
      ## What changes were proposed in this pull request?
      - Renamed kbest to numTopFeatures
      - Renamed alpha to fpr
      - Added missing Since annotations
      - Doc cleanups
      ## How was this patch tested?
      
      Added new standardized unit tests for spark.ml.
      Improved existing unit test coverage a bit.
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #15647 from jkbradley/chisqselector-follow-ups.
      91c33a0c
  4. Sep 28, 2016
  5. Aug 27, 2016
    • Sean Owen's avatar
      [SPARK-17001][ML] Enable standardScaler to standardize sparse vectors when withMean=True · e07baf14
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Allow centering / mean scaling of sparse vectors in StandardScaler, if requested. This is for compatibility with `VectorAssembler` in common usages.
      
      ## How was this patch tested?
      
      Jenkins tests, including new caes to reflect the new behavior.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #14663 from srowen/SPARK-17001.
      e07baf14
  6. Jul 15, 2016
    • Joseph K. Bradley's avatar
      [SPARK-14817][ML][MLLIB][DOC] Made DataFrame-based API primary in MLlib guide · 5ffd5d38
      Joseph K. Bradley authored
      ## What changes were proposed in this pull request?
      
      Made DataFrame-based API primary
      * Spark doc menu bar and other places now link to ml-guide.html, not mllib-guide.html
      * mllib-guide.html keeps RDD-specific list of features, with a link at the top redirecting people to ml-guide.html
      * ml-guide.html includes a "maintenance mode" announcement about the RDD-based API
        * **Reviewers: please check this carefully**
      * (minor) Titles for DF API no longer include "- spark.ml" suffix.  Titles for RDD API have "- RDD-based API" suffix
      * Moved migration guide to ml-guide from mllib-guide
        * Also moved past guides from mllib-migration-guides to ml-migration-guides, with a redirect link on mllib-migration-guides
        * **Reviewers**: I did not change any of the content of the migration guides.
      
      Reorganized DataFrame-based guide:
      * ml-guide.html mimics the old mllib-guide.html page in terms of content: overview, migration guide, etc.
      * Moved Pipeline description into ml-pipeline.html and moved tuning into ml-tuning.html
        * **Reviewers**: I did not change the content of these guides, except some intro text.
      * Sidebar remains the same, but with pipeline and tuning sections added
      
      Other:
      * ml-classification-regression.html: Moved text about linear methods to new section in page
      
      ## How was this patch tested?
      
      Generated docs locally
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #14213 from jkbradley/ml-guide-2.0.
      5ffd5d38
  7. Jun 11, 2016
    • Dongjoon Hyun's avatar
      [SPARK-15883][MLLIB][DOCS] Fix broken links in mllib documents · ad102af1
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      This issue fixes all broken links on Spark 2.0 preview MLLib documents. Also, this contains some editorial change.
      
      **Fix broken links**
        * mllib-data-types.md
        * mllib-decision-tree.md
        * mllib-ensembles.md
        * mllib-feature-extraction.md
        * mllib-pmml-model-export.md
        * mllib-statistics.md
      
      **Fix malformed section header and scala coding style**
        * mllib-linear-methods.md
      
      **Replace indirect forward links with direct one**
        * ml-classification-regression.md
      
      ## How was this patch tested?
      
      Manual tests (with `cd docs; jekyll build`.)
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #13608 from dongjoon-hyun/SPARK-15883.
      ad102af1
  8. May 17, 2016
    • Yuhao Yang's avatar
      [SPARK-15182][ML] Copy MLlib doc to ML: ml.feature.tf, idf · 3308a862
      Yuhao Yang authored
      ## What changes were proposed in this pull request?
      
      We should now begin copying algorithm details from the spark.mllib guide to spark.ml as needed, rather than just linking back to the corresponding algorithms in the spark.mllib user guide.
      
      ## How was this patch tested?
      
      manual review for doc.
      
      Author: Yuhao Yang <hhbyyh@gmail.com>
      Author: Yuhao Yang <yuhao.yang@intel.com>
      
      Closes #12957 from hhbyyh/tfidfdoc.
      3308a862
  9. Mar 24, 2016
  10. Dec 10, 2015
    • Timothy Hunter's avatar
      [SPARK-12212][ML][DOC] Clarifies the difference between spark.ml, spark.mllib... · 2ecbe02d
      Timothy Hunter authored
      [SPARK-12212][ML][DOC] Clarifies the difference between spark.ml, spark.mllib and mllib in the documentation.
      
      Replaces a number of occurences of `MLlib` in the documentation that were meant to refer to the `spark.mllib` package instead. It should clarify for new users the difference between `spark.mllib` (the package) and MLlib (the umbrella project for ML in spark).
      
      It also removes some files that I forgot to delete with #10207
      
      Author: Timothy Hunter <timhunter@databricks.com>
      
      Closes #10234 from thunterdb/12212.
      2ecbe02d
  11. Oct 07, 2015
  12. Sep 15, 2015
    • Joseph K. Bradley's avatar
      [SPARK-10595] [ML] [MLLIB] [DOCS] Various ML guide cleanups · b921fe4d
      Joseph K. Bradley authored
      Various ML guide cleanups.
      
      * ml-guide.md: Make it easier to access the algorithm-specific guides.
      * LDA user guide: EM often begins with useless topics, but running longer generally improves them dramatically.  E.g., 10 iterations on a Wikipedia dataset produces useless topics, but 50 iterations produces very meaningful topics.
      * mllib-feature-extraction.html#elementwiseproduct: “w” parameter should be “scalingVec”
      * Clean up Binarizer user guide a little.
      * Document in Pipeline that users should not put an instance into the Pipeline in more than 1 place.
      * spark.ml Word2Vec user guide: clean up grammar/writing
      * Chi Sq Feature Selector docs: Improve text in doc.
      
      CC: mengxr feynmanliang
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #8752 from jkbradley/mlguide-fixes-1.5.
      b921fe4d
  13. Jul 31, 2015
    • Sean Owen's avatar
      [SPARK-9490] [DOCS] [MLLIB] MLlib evaluation metrics guide example python code... · 873ab0f9
      Sean Owen authored
      [SPARK-9490] [DOCS] [MLLIB] MLlib evaluation metrics guide example python code uses deprecated print statement
      
      Use print(x) not print x for Python 3 in eval examples
      CC sethah mengxr -- just wanted to close this out before 1.5
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #7822 from srowen/SPARK-9490 and squashes the following commits:
      
      01abeba [Sean Owen] Change "print x" to "print(x)" in the rest of the docs too
      bd7f7fb [Sean Owen] Use print(x) not print x for Python 3 in eval examples
      873ab0f9
  14. Jun 30, 2015
  15. Jun 21, 2015
    • Joseph K. Bradley's avatar
      [SPARK-7715] [MLLIB] [ML] [DOC] Updated MLlib programming guide for release 1.4 · a1894422
      Joseph K. Bradley authored
      Reorganized docs a bit.  Added migration guides.
      
      **Q**: Do we want to say more for the 1.3 -> 1.4 migration guide for ```spark.ml```?  It would be a lot.
      
      CC: mengxr
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #6897 from jkbradley/ml-guide-1.4 and squashes the following commits:
      
      4bf26d6 [Joseph K. Bradley] tiny fix
      8085067 [Joseph K. Bradley] fixed spacing/layout issues in ml guide from previous commit in this PR
      6cd5c78 [Joseph K. Bradley] Updated MLlib programming guide for release 1.4
      a1894422
  16. Jun 18, 2015
    • MechCoder's avatar
      [SPARK-7605] [MLLIB] [PYSPARK] Python API for ElementwiseProduct · 22732e1e
      MechCoder authored
      Python API for org.apache.spark.mllib.feature.ElementwiseProduct
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #6346 from MechCoder/spark-7605 and squashes the following commits:
      
      79d1ef5 [MechCoder] Consistent and support list / array types
      5f81d81 [MechCoder] [SPARK-7605] [MLlib] Python API for ElementwiseProduct
      22732e1e
  17. May 31, 2015
    • Yuhao Yang's avatar
      [SPARK-7949] [MLLIB] [DOC] update document with some missing save/load · 06747003
      Yuhao Yang authored
      add save load for examples:
      KMeansModel
      PowerIterationClusteringModel
      Word2VecModel
      IsotonicRegressionModel
      
      Author: Yuhao Yang <hhbyyh@gmail.com>
      
      Closes #6498 from hhbyyh/docSaveLoad and squashes the following commits:
      
      7f9f06d [Yuhao Yang] add missing imports
      c604cad [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into docSaveLoad
      1dd77cc [Yuhao Yang] update document with some missing save/load
      06747003
  18. May 30, 2015
    • Mike Dusenberry's avatar
      [SPARK-7920] [MLLIB] Make MLlib ChiSqSelector Serializable (& Fix Related Documentation Example). · 1281a351
      Mike Dusenberry authored
      The MLlib ChiSqSelector class is not serializable, and so the example in the ChiSqSelector documentation fails. Also, that example is missing the import of ChiSqSelector.
      
      This PR makes ChiSqSelector extend Serializable in MLlib, and adds the ChiSqSelector import statement to the associated example in the documentation.
      
      Author: Mike Dusenberry <dusenberrymw@gmail.com>
      
      Closes #6462 from dusenberrymw/Make_ChiSqSelector_Serializable_and_Fix_Related_Docs_Example and squashes the following commits:
      
      9cb2f94 [Mike Dusenberry] Make MLlib ChiSqSelector Serializable.
      d9003bf [Mike Dusenberry] Add missing import in MLlib ChiSqSelector Docs Scala example.
      1281a351
    • Octavian Geagla's avatar
      [SPARK-7459] [MLLIB] ElementwiseProduct Java example · e3a43748
      Octavian Geagla authored
      Author: Octavian Geagla <ogeagla@gmail.com>
      
      Closes #6008 from ogeagla/elementwise-prod-doc and squashes the following commits:
      
      72e6dc0 [Octavian Geagla] [SPARK-7459] [MLLIB] Java example import.
      cf2afbd [Octavian Geagla] [SPARK-7459] [MLLIB] Update description of example.
      b66431b [Octavian Geagla] [SPARK-7459] [MLLIB] Add override annotation to java example, make scala example use same data as java.
      6b26b03 [Octavian Geagla] [SPARK-7459] [MLLIB] Fix line which is too long.
      79af020 [Octavian Geagla] [SPARK-7459] [MLLIB] Actually don't use Java 8.
      9d5b31a [Octavian Geagla] [SPARK-7459] [MLLIB] Don't use Java 8
      4f0c92f [Octavian Geagla] [SPARK-7459] [MLLIB] ElementwiseProduct Java example.
      e3a43748
  19. May 10, 2015
    • Kirill A. Korinskiy's avatar
      [SPARK-5521] PCA wrapper for easy transform vectors · 8c07c75c
      Kirill A. Korinskiy authored
      I implement a simple PCA wrapper for easy transform of vectors by PCA for example LabeledPoint or another complicated structure.
      
      Example of usage:
      ```
        import org.apache.spark.mllib.regression.LinearRegressionWithSGD
        import org.apache.spark.mllib.regression.LabeledPoint
        import org.apache.spark.mllib.linalg.Vectors
        import org.apache.spark.mllib.feature.PCA
      
        val data = sc.textFile("data/mllib/ridge-data/lpsa.data").map { line =>
          val parts = line.split(',')
          LabeledPoint(parts(0).toDouble, Vectors.dense(parts(1).split(' ').map(_.toDouble)))
        }.cache()
      
        val splits = data.randomSplit(Array(0.6, 0.4), seed = 11L)
        val training = splits(0).cache()
        val test = splits(1)
      
        val pca = PCA.create(training.first().features.size/2, data.map(_.features))
        val training_pca = training.map(p => p.copy(features = pca.transform(p.features)))
        val test_pca = test.map(p => p.copy(features = pca.transform(p.features)))
      
        val numIterations = 100
        val model = LinearRegressionWithSGD.train(training, numIterations)
        val model_pca = LinearRegressionWithSGD.train(training_pca, numIterations)
      
        val valuesAndPreds = test.map { point =>
          val score = model.predict(point.features)
          (score, point.label)
        }
      
        val valuesAndPreds_pca = test_pca.map { point =>
          val score = model_pca.predict(point.features)
          (score, point.label)
        }
      
        val MSE = valuesAndPreds.map{case(v, p) => math.pow((v - p), 2)}.mean()
        val MSE_pca = valuesAndPreds_pca.map{case(v, p) => math.pow((v - p), 2)}.mean()
      
        println("Mean Squared Error = " + MSE)
        println("PCA Mean Squared Error = " + MSE_pca)
      ```
      
      Author: Kirill A. Korinskiy <catap@catap.ru>
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #4304 from catap/pca and squashes the following commits:
      
      501bcd9 [Joseph K. Bradley] Small updates: removed k from Java-friendly PCA fit().  In PCASuite, converted results to set for comparison. Added an error message for bad k in PCA.
      9dcc02b [Kirill A. Korinskiy] [SPARK-5521] fix scala style
      1892a06 [Kirill A. Korinskiy] [SPARK-5521] PCA wrapper for easy transform vectors
      8c07c75c
  20. May 07, 2015
    • Octavian Geagla's avatar
      [SPARK-5726] [MLLIB] Elementwise (Hadamard) Vector Product Transformer · 658a478d
      Octavian Geagla authored
      See https://issues.apache.org/jira/browse/SPARK-5726
      
      Author: Octavian Geagla <ogeagla@gmail.com>
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #4580 from ogeagla/spark-mllib-weighting and squashes the following commits:
      
      fac12ad [Octavian Geagla] [SPARK-5726] [MLLIB] Use new createTransformFunc.
      90f7e39 [Joseph K. Bradley] small cleanups
      4595165 [Octavian Geagla] [SPARK-5726] [MLLIB] Remove erroneous test case.
      ded3ac6 [Octavian Geagla] [SPARK-5726] [MLLIB] Pass style checks.
      37d4705 [Octavian Geagla] [SPARK-5726] [MLLIB] Incorporated feedback.
      1dffeee [Octavian Geagla] [SPARK-5726] [MLLIB] Pass style checks.
      e436896 [Octavian Geagla] [SPARK-5726] [MLLIB] Remove 'TF' from 'ElementwiseProductTF'
      cb520e6 [Octavian Geagla] [SPARK-5726] [MLLIB] Rename HadamardProduct to ElementwiseProduct
      4922722 [Octavian Geagla] [SPARK-5726] [MLLIB] Hadamard Vector Product Transformer
      658a478d
  21. Feb 23, 2015
    • Joseph K. Bradley's avatar
      [SPARK-5912] [docs] [mllib] Small fixes to ChiSqSelector docs · 59536cc8
      Joseph K. Bradley authored
      Fixes:
      * typo in Scala example
      * Removed comment "usually applied on sparse data" since that is debatable
      * small edits to text for clarity
      
      CC: avulanov  I noticed a typo post-hoc and ended up making a few small edits.  Do the changes look OK?
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #4732 from jkbradley/chisqselector-docs and squashes the following commits:
      
      9656a3b [Joseph K. Bradley] added Java example for ChiSqSelector to guide
      3f3f9f4 [Joseph K. Bradley] small fixes to ChiSqSelector docs
      59536cc8
    • Alexander Ulanov's avatar
      [MLLIB] SPARK-5912 Programming guide for feature selection · 28ccf5ee
      Alexander Ulanov authored
      Added description of ChiSqSelector and few words about feature selection in general. I could add a code example, however it would not look reasonable in the absence of feature discretizer or a dataset in the `data` folder that has redundant features.
      
      Author: Alexander Ulanov <nashb@yandex.ru>
      
      Closes #4709 from avulanov/SPARK-5912 and squashes the following commits:
      
      19a8a4e [Alexander Ulanov] Addressing reviewers comments @jkbradley
      58d9e4d [Alexander Ulanov] Addressing reviewers comments @jkbradley
      eb6b9fe [Alexander Ulanov] Typo
      2921a1d [Alexander Ulanov] ChiSqSelector example of use
      c845350 [Alexander Ulanov] ChiSqSelector docs
      28ccf5ee
  22. Feb 01, 2015
    • Octavian Geagla's avatar
      [SPARK-5207] [MLLIB] StandardScalerModel mean and variance re-use · bdb0680d
      Octavian Geagla authored
      This seems complete, the duplication of tests for provided means/variances might be overkill, would appreciate some feedback.
      
      Author: Octavian Geagla <ogeagla@gmail.com>
      
      Closes #4140 from ogeagla/SPARK-5207 and squashes the following commits:
      
      fa64dfa [Octavian Geagla] [SPARK-5207] [MLLIB] [WIP] change StandardScalerModel to take stddev instead of variance
      9078fe0 [Octavian Geagla] [SPARK-5207] [MLLIB] [WIP] Incorporate code review feedback: change arg ordering, add dev api annotations, do better null checking, add another test and some doc for this.
      997d2e0 [Octavian Geagla] [SPARK-5207] [MLLIB] [WIP] make withMean and withStd public, add constructor which uses defaults, un-refactor test class
      64408a4 [Octavian Geagla] [SPARK-5207] [MLLIB] [WIP] change StandardScalerModel contructor to not be private to mllib, added tests for newly-exposed functionality
      bdb0680d
  23. Oct 31, 2014
    • Anant's avatar
      [SPARK-3838][examples][mllib][python] Word2Vec example in python · e07fb6a4
      Anant authored
      This pull request refers to issue: https://issues.apache.org/jira/browse/SPARK-3838
      
      Python example for word2vec
      mengxr
      
      Author: Anant <anant.asty@gmail.com>
      
      Closes #2952 from anantasty/SPARK-3838 and squashes the following commits:
      
      87bd723 [Anant] remove stop line
      4bd439e [Anant] Changes as per code review. Fized error in word2vec python example, simplified example in docs.
      3d3c9ee [Anant] Added empty line after python imports
      0c90c31 [Anant] Fixed erroneous code. I was still treating each line to be a single word instead of 16 words
      ee4f5f6 [Anant] Fixes from code review comments
      c637bcf [Anant] Added word2vec python example to docs
      269f31f [Anant] added example in docs
      c015b14 [Anant] Added python example for word2vec
      e07fb6a4
  24. Oct 28, 2014
    • Davies Liu's avatar
      [SPARK-3961] [MLlib] [PySpark] Python API for mllib.feature · fae095bc
      Davies Liu authored
      Added completed Python API for MLlib.feature
      
      Normalizer
      StandardScalerModel
      StandardScaler
      HashTF
      IDFModel
      IDF
      
      cc mengxr
      
      Author: Davies Liu <davies@databricks.com>
      Author: Davies Liu <davies.liu@gmail.com>
      
      Closes #2819 from davies/feature and squashes the following commits:
      
      4f48f48 [Davies Liu] add a note for HashingTF
      67f6d21 [Davies Liu] address comments
      b628693 [Davies Liu] rollback changes in Word2Vec
      efb4f4f [Davies Liu] Merge branch 'master' into feature
      806c7c2 [Davies Liu] address comments
      3abb8c2 [Davies Liu] address comments
      59781b9 [Davies Liu] Merge branch 'master' of github.com:apache/spark into feature
      a405ae7 [Davies Liu] fix tests
      7a1891a [Davies Liu] fix tests
      486795f [Davies Liu] update programming guide, HashTF -> HashingTF
      8a50584 [Davies Liu] Python API for mllib.feature
      fae095bc
  25. Oct 23, 2014
    • Kousuke Saruta's avatar
      [SPARK-4055][MLlib] Inconsistent spelling 'MLlib' and 'MLLib' · f799700e
      Kousuke Saruta authored
      Thare are some inconsistent spellings 'MLlib' and 'MLLib' in some documents and source codes.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #2903 from sarutak/SPARK-4055 and squashes the following commits:
      
      b031640 [Kousuke Saruta] Fixed inconsistent spelling "MLlib and MLLib"
      f799700e
  26. Sep 26, 2014
    • RJ Nowling's avatar
      [SPARK-3614][MLLIB] Add minimumOccurence filtering to IDF · ec9df6a7
      RJ Nowling authored
      This PR for [SPARK-3614](https://issues.apache.org/jira/browse/SPARK-3614) adds functionality for filtering out terms which do not appear in at least a minimum number of documents.
      
      This is implemented using a minimumOccurence parameter (default 0).  When terms' document frequencies are less than minimumOccurence, their IDFs are set to 0, just like when the DF is 0.  As a result, the TF-IDFs for the terms are found to be 0, as if the terms were not present in the documents.
      
      This PR makes the following changes:
      * Add a minimumOccurence parameter to the IDF and DocumentFrequencyAggregator classes.
      * Create a parameter-less constructor for IDF with a default minimumOccurence value of 0 to remain backwards-compatibility with the original IDF API.
      * Sets the IDFs to 0 for terms which DFs are less than minimumOccurence
      * Add tests to the Spark IDFSuite and Java JavaTfIdfSuite test suites
      * Updated the MLLib Feature Extraction programming guide to describe the new feature
      
      Author: RJ Nowling <rnowling@gmail.com>
      
      Closes #2494 from rnowling/spark-3614-idf-filter and squashes the following commits:
      
      0aa3c63 [RJ Nowling] Fix identation
      e6523a8 [RJ Nowling] Remove unnecessary toDouble's from IDFSuite
      bfa82ec [RJ Nowling] Add space after if
      30d20b3 [RJ Nowling] Add spaces around equals signs
      9013447 [RJ Nowling] Add space before division operator
      79978fc [RJ Nowling] Remove unnecessary semi-colon
      40fd70c [RJ Nowling] Change minimumOccurence to minDocFreq in code and docs
      47850ab [RJ Nowling] Changed minimumOccurence to Int from Long
      9fb4093 [RJ Nowling] Remove unnecessary lines from IDF class docs
      1fc09d8 [RJ Nowling] Add backwards-compatible constructor to DocumentFrequencyAggregator
      1801fd2 [RJ Nowling] Fix style errors in IDF.scala
      6897252 [RJ Nowling] Preface minimumOccurence members with val to make them final and immutable
      a200bab [RJ Nowling] Remove unnecessary else statement
      4b974f5 [RJ Nowling] Remove accidentally-added import from testing
      c0cc643 [RJ Nowling] Add minimumOccurence filtering to IDF
      ec9df6a7
  27. Sep 22, 2014
  28. Aug 24, 2014
    • DB Tsai's avatar
      [SPARK-2841][MLlib] Documentation for feature transformations · 572952ae
      DB Tsai authored
      Documentation for newly added feature transformations:
      1. TF-IDF
      2. StandardScaler
      3. Normalizer
      
      Author: DB Tsai <dbtsai@alpinenow.com>
      
      Closes #2068 from dbtsai/transformer-documentation and squashes the following commits:
      
      109f324 [DB Tsai] address feedback
      572952ae
  29. Aug 20, 2014
    • Xiangrui Meng's avatar
      [SPARK-3143][MLLIB] add tf-idf user guide · e1571874
      Xiangrui Meng authored
      Moved TF-IDF before Word2Vec because the former is more basic. I also added a link for Word2Vec. atalwalkar
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #2061 from mengxr/tfidf-doc and squashes the following commits:
      
      ca04c70 [Xiangrui Meng] address comments
      a5ea4b4 [Xiangrui Meng] add tf-idf user guide
      e1571874
  30. Aug 18, 2014
    • Liquan Pei's avatar
      [SPARK-2842][MLlib]Word2Vec documentation · eef779b8
      Liquan Pei authored
      mengxr
      Documentation for Word2Vec
      
      Author: Liquan Pei <liquanpei@gmail.com>
      
      Closes #2003 from Ishiihara/Word2Vec-doc and squashes the following commits:
      
      4ff11d4 [Liquan Pei] minor fix
      8d7458f [Liquan Pei] code reformat
      6df0dcb [Liquan Pei] add Word2Vec documentation
      eef779b8
  31. Aug 12, 2014
    • Ameet Talwalkar's avatar
      SPARK-2830 [MLlib]: re-organize mllib documentation · c235b83e
      Ameet Talwalkar authored
      As per discussions with Xiangrui, I've reorganized and edited the mllib documentation.
      
      Author: Ameet Talwalkar <atalwalkar@gmail.com>
      
      Closes #1908 from atalwalkar/master and squashes the following commits:
      
      fe6938a [Ameet Talwalkar] made xiangruis suggested changes
      840028b [Ameet Talwalkar] made xiangruis suggested changes
      7ec366a [Ameet Talwalkar] reorganize and edit mllib documentation
      c235b83e
Loading