-
- Downloads
[SPARK-14050][ML] Add multiple languages support and additional methods for Stop Words Remover
## What changes were proposed in this pull request? This PR continues the work from #11871 with the following changes: * load English stopwords as default * covert stopwords to list in Python * update some tests and doc ## How was this patch tested? Unit tests. Closes #11871 cc: burakkose srowen Author: Burak Köse <burakks41@gmail.com> Author: Xiangrui Meng <meng@databricks.com> Author: Burak KOSE <burakks41@gmail.com> Closes #12843 from mengxr/SPARK-14050.
Showing
- licenses/LICENSE-postgresql.txt 24 additions, 0 deletionslicenses/LICENSE-postgresql.txt
- mllib/src/main/resources/org/apache/spark/ml/feature/stopwords/README 12 additions, 0 deletions...in/resources/org/apache/spark/ml/feature/stopwords/README
- mllib/src/main/resources/org/apache/spark/ml/feature/stopwords/danish.txt 94 additions, 0 deletions...esources/org/apache/spark/ml/feature/stopwords/danish.txt
- mllib/src/main/resources/org/apache/spark/ml/feature/stopwords/dutch.txt 101 additions, 0 deletions...resources/org/apache/spark/ml/feature/stopwords/dutch.txt
- mllib/src/main/resources/org/apache/spark/ml/feature/stopwords/english.txt 153 additions, 0 deletions...sources/org/apache/spark/ml/feature/stopwords/english.txt
- mllib/src/main/resources/org/apache/spark/ml/feature/stopwords/finnish.txt 235 additions, 0 deletions...sources/org/apache/spark/ml/feature/stopwords/finnish.txt
- mllib/src/main/resources/org/apache/spark/ml/feature/stopwords/french.txt 155 additions, 0 deletions...esources/org/apache/spark/ml/feature/stopwords/french.txt
- mllib/src/main/resources/org/apache/spark/ml/feature/stopwords/german.txt 231 additions, 0 deletions...esources/org/apache/spark/ml/feature/stopwords/german.txt
- mllib/src/main/resources/org/apache/spark/ml/feature/stopwords/hungarian.txt 199 additions, 0 deletions...urces/org/apache/spark/ml/feature/stopwords/hungarian.txt
- mllib/src/main/resources/org/apache/spark/ml/feature/stopwords/italian.txt 279 additions, 0 deletions...sources/org/apache/spark/ml/feature/stopwords/italian.txt
- mllib/src/main/resources/org/apache/spark/ml/feature/stopwords/norwegian.txt 176 additions, 0 deletions...urces/org/apache/spark/ml/feature/stopwords/norwegian.txt
- mllib/src/main/resources/org/apache/spark/ml/feature/stopwords/portuguese.txt 203 additions, 0 deletions...rces/org/apache/spark/ml/feature/stopwords/portuguese.txt
- mllib/src/main/resources/org/apache/spark/ml/feature/stopwords/russian.txt 151 additions, 0 deletions...sources/org/apache/spark/ml/feature/stopwords/russian.txt
- mllib/src/main/resources/org/apache/spark/ml/feature/stopwords/spanish.txt 313 additions, 0 deletions...sources/org/apache/spark/ml/feature/stopwords/spanish.txt
- mllib/src/main/resources/org/apache/spark/ml/feature/stopwords/swedish.txt 114 additions, 0 deletions...sources/org/apache/spark/ml/feature/stopwords/swedish.txt
- mllib/src/main/resources/org/apache/spark/ml/feature/stopwords/turkish.txt 53 additions, 0 deletions...sources/org/apache/spark/ml/feature/stopwords/turkish.txt
- mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala 37 additions, 69 deletions.../scala/org/apache/spark/ml/feature/StopWordsRemover.scala
- mllib/src/test/scala/org/apache/spark/ml/feature/StopWordsRemoverSuite.scala 55 additions, 2 deletions...a/org/apache/spark/ml/feature/StopWordsRemoverSuite.scala
- python/pyspark/ml/feature.py 22 additions, 16 deletionspython/pyspark/ml/feature.py
- python/pyspark/ml/tests.py 7 additions, 0 deletionspython/pyspark/ml/tests.py
Loading
Please register or sign in to comment