Skip to content
Snippets Groups Projects
  • leahmcguire's avatar
    d01a6d8c
    [SPARK-4894][mllib] Added Bernoulli option to NaiveBayes model in mllib · d01a6d8c
    leahmcguire authored
    Added optional model type parameter for  NaiveBayes training. Can be either Multinomial or Bernoulli.
    
    When Bernoulli is given the Bernoulli smoothing is used for fitting and for prediction as per: http://nlp.stanford.edu/IR-book/html/htmledition/the-bernoulli-model-1.html.
    
     Default for model is original Multinomial fit and predict.
    
    Added additional testing for Bernoulli and Multinomial models.
    
    Author: leahmcguire <lmcguire@salesforce.com>
    Author: Joseph K. Bradley <joseph@databricks.com>
    Author: Leah McGuire <lmcguire@salesforce.com>
    
    Closes #4087 from leahmcguire/master and squashes the following commits:
    
    f3c8994 [leahmcguire] changed checks on model type to requires
    acb69af [leahmcguire] removed enum type and replaces all modelType parameters with strings
    2224b15 [Leah McGuire] Merge pull request #2 from jkbradley/leahmcguire-master
    9ad89ca [Joseph K. Bradley] removed old code
    6a8f383 [Joseph K. Bradley] Added new model save/load format 2.0 for NaiveBayesModel after modelType parameter was added.  Updated tests.  Also updated ModelType enum-like type.
    852a727 [leahmcguire] merged with upstream master
    a22d670 [leahmcguire] changed NaiveBayesModel modelType parameter back to NaiveBayes.ModelType, made NaiveBayes.ModelType serializable, fixed getter method in NavieBayes
    18f3219 [leahmcguire] removed private from naive bayes constructor for lambda only
    bea62af [leahmcguire] put back in constructor for NaiveBayes
    01baad7 [leahmcguire] made fixes from code review
    fb0a5c7 [leahmcguire] removed typo
    e2d925e [leahmcguire] fixed nonserializable error that was causing naivebayes test failures
    2d0c1ba [leahmcguire] fixed typo in NaiveBayes
    c298e78 [leahmcguire] fixed scala style errors
    b85b0c9 [leahmcguire] Merge remote-tracking branch 'upstream/master'
    900b586 [leahmcguire] fixed model call so that uses type argument
    ea09b28 [leahmcguire] Merge remote-tracking branch 'upstream/master'
    e016569 [leahmcguire] updated test suite with model type fix
    85f298f [leahmcguire] Merge remote-tracking branch 'upstream/master'
    dc65374 [leahmcguire] integrated model type fix
    7622b0c [leahmcguire] added comments and fixed style as per rb
    b93aaf6 [Leah McGuire] Merge pull request #1 from jkbradley/nb-model-type
    3730572 [Joseph K. Bradley] modified NB model type to be more Java-friendly
    b61b5e2 [leahmcguire] added back compatable constructor to NaiveBayesModel to fix MIMA test failure
    5a4a534 [leahmcguire] fixed scala style error in NaiveBayes
    3891bf2 [leahmcguire] synced with apache spark and resolved merge conflict
    d9477ed [leahmcguire] removed old inaccurate comment from test suite for mllib naive bayes
    76e5b0f [leahmcguire] removed unnecessary sort from test
    0313c0c [leahmcguire] fixed style error in NaiveBayes.scala
    4a3676d [leahmcguire] Updated changes re-comments. Got rid of verbose populateMatrix method. Public api now has string instead of enumeration. Docs are updated."
    ce73c63 [leahmcguire] added Bernoulli option to niave bayes model in mllib, added optional model type parameter for training. When Bernoulli is given the Bernoulli smoothing is used for fitting and for prediction http://nlp.stanford.edu/IR-book/html/htmledition/the-bernoulli-model-1.html
    d01a6d8c
    History
    [SPARK-4894][mllib] Added Bernoulli option to NaiveBayes model in mllib
    leahmcguire authored
    Added optional model type parameter for  NaiveBayes training. Can be either Multinomial or Bernoulli.
    
    When Bernoulli is given the Bernoulli smoothing is used for fitting and for prediction as per: http://nlp.stanford.edu/IR-book/html/htmledition/the-bernoulli-model-1.html.
    
     Default for model is original Multinomial fit and predict.
    
    Added additional testing for Bernoulli and Multinomial models.
    
    Author: leahmcguire <lmcguire@salesforce.com>
    Author: Joseph K. Bradley <joseph@databricks.com>
    Author: Leah McGuire <lmcguire@salesforce.com>
    
    Closes #4087 from leahmcguire/master and squashes the following commits:
    
    f3c8994 [leahmcguire] changed checks on model type to requires
    acb69af [leahmcguire] removed enum type and replaces all modelType parameters with strings
    2224b15 [Leah McGuire] Merge pull request #2 from jkbradley/leahmcguire-master
    9ad89ca [Joseph K. Bradley] removed old code
    6a8f383 [Joseph K. Bradley] Added new model save/load format 2.0 for NaiveBayesModel after modelType parameter was added.  Updated tests.  Also updated ModelType enum-like type.
    852a727 [leahmcguire] merged with upstream master
    a22d670 [leahmcguire] changed NaiveBayesModel modelType parameter back to NaiveBayes.ModelType, made NaiveBayes.ModelType serializable, fixed getter method in NavieBayes
    18f3219 [leahmcguire] removed private from naive bayes constructor for lambda only
    bea62af [leahmcguire] put back in constructor for NaiveBayes
    01baad7 [leahmcguire] made fixes from code review
    fb0a5c7 [leahmcguire] removed typo
    e2d925e [leahmcguire] fixed nonserializable error that was causing naivebayes test failures
    2d0c1ba [leahmcguire] fixed typo in NaiveBayes
    c298e78 [leahmcguire] fixed scala style errors
    b85b0c9 [leahmcguire] Merge remote-tracking branch 'upstream/master'
    900b586 [leahmcguire] fixed model call so that uses type argument
    ea09b28 [leahmcguire] Merge remote-tracking branch 'upstream/master'
    e016569 [leahmcguire] updated test suite with model type fix
    85f298f [leahmcguire] Merge remote-tracking branch 'upstream/master'
    dc65374 [leahmcguire] integrated model type fix
    7622b0c [leahmcguire] added comments and fixed style as per rb
    b93aaf6 [Leah McGuire] Merge pull request #1 from jkbradley/nb-model-type
    3730572 [Joseph K. Bradley] modified NB model type to be more Java-friendly
    b61b5e2 [leahmcguire] added back compatable constructor to NaiveBayesModel to fix MIMA test failure
    5a4a534 [leahmcguire] fixed scala style error in NaiveBayes
    3891bf2 [leahmcguire] synced with apache spark and resolved merge conflict
    d9477ed [leahmcguire] removed old inaccurate comment from test suite for mllib naive bayes
    76e5b0f [leahmcguire] removed unnecessary sort from test
    0313c0c [leahmcguire] fixed style error in NaiveBayes.scala
    4a3676d [leahmcguire] Updated changes re-comments. Got rid of verbose populateMatrix method. Public api now has string instead of enumeration. Docs are updated."
    ce73c63 [leahmcguire] added Bernoulli option to niave bayes model in mllib, added optional model type parameter for training. When Bernoulli is given the Bernoulli smoothing is used for fitting and for prediction http://nlp.stanford.edu/IR-book/html/htmledition/the-bernoulli-model-1.html