Skip to content
Snippets Groups Projects
Unverified Commit e3bf37fa authored by Maxime Rihouey's avatar Maxime Rihouey Committed by Sean Owen
Browse files

Fix example of tf_idf with minDocFreq

## What changes were proposed in this pull request?

The python example for tf_idf with the parameter "minDocFreq" is not properly set up because the same variable is used to transform the document for both with and without the "minDocFreq" parameter.
The IDF(minDocFreq=2) is stored in the variable "idfIgnore" but then it is the original variable "idf" used to transform the "tf" instead of the "idfIgnore".

## How was this patch tested?

Before the results for "tfidf" and "tfidfIgnore" were the same:
tfidf:
(1048576,[1046921],[3.75828890549])
(1048576,[1046920],[3.75828890549])
(1048576,[1046923],[3.75828890549])
(1048576,[892732],[3.75828890549])
(1048576,[892733],[3.75828890549])
(1048576,[892734],[3.75828890549])
tfidfIgnore:
(1048576,[1046921],[3.75828890549])
(1048576,[1046920],[3.75828890549])
(1048576,[1046923],[3.75828890549])
(1048576,[892732],[3.75828890549])
(1048576,[892733],[3.75828890549])
(1048576,[892734],[3.75828890549])

After the fix those are how they should be:
tfidf:
(1048576,[1046921],[3.75828890549])
(1048576,[1046920],[3.75828890549])
(1048576,[1046923],[3.75828890549])
(1048576,[892732],[3.75828890549])
(1048576,[892733],[3.75828890549])
(1048576,[892734],[3.75828890549])
tfidfIgnore:
(1048576,[1046921],[0.0])
(1048576,[1046920],[0.0])
(1048576,[1046923],[0.0])
(1048576,[892732],[0.0])
(1048576,[892733],[0.0])
(1048576,[892734],[0.0])

Author: Maxime Rihouey <maxime.rihouey@gmail.com>

Closes #15503 from maximerihouey/patch-1.
parent 56b0f5f4
No related branches found
No related tags found
No related merge requests found
......@@ -43,7 +43,7 @@ if __name__ == "__main__":
# In such cases, the IDF for these terms is set to 0.
# This feature can be used by passing the minDocFreq value to the IDF constructor.
idfIgnore = IDF(minDocFreq=2).fit(tf)
tfidfIgnore = idf.transform(tf)
tfidfIgnore = idfIgnore.transform(tf)
# $example off$
print("tfidf:")
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment