Skip to content
  • Maxime Rihouey's avatar
    e3bf37fa
    Fix example of tf_idf with minDocFreq · e3bf37fa
    Maxime Rihouey authored
    ## What changes were proposed in this pull request?
    
    The python example for tf_idf with the parameter "minDocFreq" is not properly set up because the same variable is used to transform the document for both with and without the "minDocFreq" parameter.
    The IDF(minDocFreq=2) is stored in the variable "idfIgnore" but then it is the original variable "idf" used to transform the "tf" instead of the "idfIgnore".
    
    ## How was this patch tested?
    
    Before the results for "tfidf" and "tfidfIgnore" were the same:
    tfidf:
    (1048576,[1046921],[3.75828890549])
    (1048576,[1046920],[3.75828890549])
    (1048576,[1046923],[3.75828890549])
    (1048576,[892732],[3.75828890549])
    (1048576,[892733],[3.75828890549])
    (1048576,[892734],[3.75828890549])
    tfidfIgnore:
    (1048576,[1046921],[3.75828890549])
    (1048576,[1046920],[3.75828890549])
    (1048576,[1046923],[3.75828890549])
    (1048576,[892732],[3.75828890549])
    (1048576,[892733],[3.75828890549])
    (1048576,[892734],[3.75828890549])
    
    After the fix those are how they should be:
    tfidf:
    (1048576,[1046921],[3.75828890549])
    (1048576,[1046920],[3.75828890549])
    (1048576,[1046923],[3.75828890549])
    (1048576,[892732],[3.75828890549])
    (1048576,[892733],[3.75828890549])
    (1048576,[892734],[3.75828890549])
    tfidfIgnore:
    (1048576,[1046921],[0.0])
    (1048576,[1046920],[0.0])
    (1048576,[1046923],[0.0])
    (1048576,[892732],[0.0])
    (1048576,[892733],[0.0])
    (1048576,[892734],[0.0])
    
    Author: Maxime Rihouey <maxime.rihouey@gmail.com>
    
    Closes #15503 from maximerihouey/patch-1.
    e3bf37fa
    Fix example of tf_idf with minDocFreq
    Maxime Rihouey authored
    ## What changes were proposed in this pull request?
    
    The python example for tf_idf with the parameter "minDocFreq" is not properly set up because the same variable is used to transform the document for both with and without the "minDocFreq" parameter.
    The IDF(minDocFreq=2) is stored in the variable "idfIgnore" but then it is the original variable "idf" used to transform the "tf" instead of the "idfIgnore".
    
    ## How was this patch tested?
    
    Before the results for "tfidf" and "tfidfIgnore" were the same:
    tfidf:
    (1048576,[1046921],[3.75828890549])
    (1048576,[1046920],[3.75828890549])
    (1048576,[1046923],[3.75828890549])
    (1048576,[892732],[3.75828890549])
    (1048576,[892733],[3.75828890549])
    (1048576,[892734],[3.75828890549])
    tfidfIgnore:
    (1048576,[1046921],[3.75828890549])
    (1048576,[1046920],[3.75828890549])
    (1048576,[1046923],[3.75828890549])
    (1048576,[892732],[3.75828890549])
    (1048576,[892733],[3.75828890549])
    (1048576,[892734],[3.75828890549])
    
    After the fix those are how they should be:
    tfidf:
    (1048576,[1046921],[3.75828890549])
    (1048576,[1046920],[3.75828890549])
    (1048576,[1046923],[3.75828890549])
    (1048576,[892732],[3.75828890549])
    (1048576,[892733],[3.75828890549])
    (1048576,[892734],[3.75828890549])
    tfidfIgnore:
    (1048576,[1046921],[0.0])
    (1048576,[1046920],[0.0])
    (1048576,[1046923],[0.0])
    (1048576,[892732],[0.0])
    (1048576,[892733],[0.0])
    (1048576,[892734],[0.0])
    
    Author: Maxime Rihouey <maxime.rihouey@gmail.com>
    
    Closes #15503 from maximerihouey/patch-1.
Loading