Skip to content
Snippets Groups Projects
  1. Sep 07, 2016
    • Liwei Lin's avatar
      [SPARK-17359][SQL][MLLIB] Use ArrayBuffer.+=(A) instead of... · 3ce3a282
      Liwei Lin authored
      [SPARK-17359][SQL][MLLIB] Use ArrayBuffer.+=(A) instead of ArrayBuffer.append(A) in performance critical paths
      
      ## What changes were proposed in this pull request?
      
      We should generally use `ArrayBuffer.+=(A)` rather than `ArrayBuffer.append(A)`, because `append(A)` would involve extra boxing / unboxing.
      
      ## How was this patch tested?
      
      N/A
      
      Author: Liwei Lin <lwlin7@gmail.com>
      
      Closes #14914 from lw-lin/append_to_plus_eq_v2.
      3ce3a282
  2. Sep 04, 2016
    • Yanbo Liang's avatar
      [MINOR][ML][MLLIB] Remove work around for breeze sparse matrix. · 1b001b52
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      Since we have updated breeze version to 0.12, we should remove work around for bug of breeze sparse matrix in v0.11.
      I checked all mllib code and found this is the only work around for breeze 0.11.
      
      ## How was this patch tested?
      Existing tests.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #14953 from yanboliang/matrices.
      1b001b52
  3. Sep 01, 2016
    • Sean Owen's avatar
      [SPARK-17331][CORE][MLLIB] Avoid allocating 0-length arrays · 3893e8c5
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Avoid allocating some 0-length arrays, esp. in UTF8String, and by using Array.empty in Scala over Array[T]()
      
      ## How was this patch tested?
      
      Jenkins
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #14895 from srowen/SPARK-17331.
      3893e8c5
  4. Aug 27, 2016
    • Peng, Meng's avatar
      [ML][MLLIB] The require condition and message doesn't match in SparseMatrix. · 40168dbe
      Peng, Meng authored
      ## What changes were proposed in this pull request?
      The require condition and message doesn't match, and the condition also should be optimized.
      Small change.  Please kindly let me know if JIRA required.
      
      ## How was this patch tested?
      No additional test required.
      
      Author: Peng, Meng <peng.meng@intel.com>
      
      Closes #14824 from mpjlu/smallChangeForMatrixRequire.
      40168dbe
  5. Aug 26, 2016
    • Peng, Meng's avatar
      [SPARK-17207][MLLIB] fix comparing Vector bug in TestingUtils · c0949dc9
      Peng, Meng authored
      ## What changes were proposed in this pull request?
      
      fix comparing Vector bug in TestingUtils.
      There is the same bug for Matrix comparing. How to check the length of Matrix should be discussed first.
      
      ## How was this patch tested?
      
      (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
      
      (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
      
      Author: Peng, Meng <peng.meng@intel.com>
      
      Closes #14785 from mpjlu/testUtils.
      c0949dc9
  6. Aug 19, 2016
    • Jeff Zhang's avatar
      [SPARK-16965][MLLIB][PYSPARK] Fix bound checking for SparseVector. · 072acf5e
      Jeff Zhang authored
      ## What changes were proposed in this pull request?
      
      1. In scala, add negative low bound checking and put all the low/upper bound checking in one place
      2. In python, add low/upper bound checking of indices.
      
      ## How was this patch tested?
      
      unit test added
      
      Author: Jeff Zhang <zjffdu@apache.org>
      
      Closes #14555 from zjffdu/SPARK-16965.
      072acf5e
  7. Jul 19, 2016
  8. Jul 16, 2016
    • Sean Owen's avatar
      [SPARK-3359][DOCS] More changes to resolve javadoc 8 errors that will help... · 5ec0d692
      Sean Owen authored
      [SPARK-3359][DOCS] More changes to resolve javadoc 8 errors that will help unidoc/genjavadoc compatibility
      
      ## What changes were proposed in this pull request?
      
      These are yet more changes that resolve problems with unidoc/genjavadoc and Java 8. It does not fully resolve the problem, but gets rid of as many errors as we can from this end.
      
      ## How was this patch tested?
      
      Jenkins build of docs
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #14221 from srowen/SPARK-3359.3.
      5ec0d692
  9. Jul 11, 2016
    • Reynold Xin's avatar
      [SPARK-16477] Bump master version to 2.1.0-SNAPSHOT · ffcb6e05
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      After SPARK-16476 (committed earlier today as #14128), we can finally bump the version number.
      
      ## How was this patch tested?
      N/A
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #14130 from rxin/SPARK-16477.
      ffcb6e05
  10. Jun 06, 2016
    • Zheng RuiFeng's avatar
      [MINOR] Fix Typos 'an -> a' · fd8af397
      Zheng RuiFeng authored
      ## What changes were proposed in this pull request?
      
      `an -> a`
      
      Use cmds like `find . -name '*.R' | xargs -i sh -c "grep -in ' an [^aeiou]' {} && echo {}"` to generate candidates, and review them one by one.
      
      ## How was this patch tested?
      manual tests
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #13515 from zhengruifeng/an_a.
      fd8af397
  11. May 27, 2016
    • DB Tsai's avatar
      [SPARK-15413][ML][MLLIB] Change `toBreeze` to `asBreeze` in Vector and Matrix · 21b2605d
      DB Tsai authored
      ## What changes were proposed in this pull request?
      
      We're using `asML` to convert the mllib vector/matrix to ml vector/matrix now. Using `as` is more correct given that this conversion actually shares the same underline data structure. As a result, in this PR, `toBreeze` will be changed to `asBreeze`. This is a private API, as a result, it will not affect any user's application.
      
      ## How was this patch tested?
      
      unit tests
      
      Author: DB Tsai <dbt@netflix.com>
      
      Closes #13198 from dbtsai/minor.
      21b2605d
  12. May 19, 2016
  13. May 17, 2016
  14. Apr 30, 2016
    • Xiangrui Meng's avatar
      [SPARK-14653][ML] Remove json4s from mllib-local · 0847fe4e
      Xiangrui Meng authored
      ## What changes were proposed in this pull request?
      
      This PR moves Vector.toJson/fromJson to ml.linalg.VectorEncoder under mllib/ to keep mllib-local's dependency minimal. The json encoding is used by Params. So we still need this feature in SPARK-14615, where we will switch to ml.linalg in spark.ml APIs.
      
      ## How was this patch tested?
      
      Copied existing unit tests over.
      
      cc; dbtsai
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #12802 from mengxr/SPARK-14653.
      0847fe4e
  15. Apr 28, 2016
  16. Apr 26, 2016
    • Joseph K. Bradley's avatar
      [SPARK-14732][ML] spark.ml GaussianMixture should use MultivariateGaussian in mllib-local · bd2c9a6d
      Joseph K. Bradley authored
      ## What changes were proposed in this pull request?
      
      Before, spark.ml GaussianMixtureModel used the spark.mllib MultivariateGaussian in its public API.  This was added after 1.6, so we can modify this API without breaking APIs.
      
      This PR copies MultivariateGaussian to mllib-local in spark.ml, with a few changes:
      * Renamed fields to match numpy, scipy: mu => mean, sigma => cov
      
      This PR then uses the spark.ml MultivariateGaussian in the spark.ml GaussianMixtureModel, which involves:
      * Modifying the constructor
      * Adding a computeProbabilities method
      
      Also:
      * Added EPSILON to mllib-local for use in MultivariateGaussian
      
      ## How was this patch tested?
      
      Existing unit tests
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #12593 from jkbradley/sparkml-gmm-fix.
      bd2c9a6d
  17. Apr 22, 2016
    • Joan's avatar
      [SPARK-6429] Implement hashCode and equals together · bf95b8da
      Joan authored
      ## What changes were proposed in this pull request?
      
      Implement some `hashCode` and `equals` together in order to enable the scalastyle.
      This is a first batch, I will continue to implement them but I wanted to know your thoughts.
      
      Author: Joan <joan@goyeau.com>
      
      Closes #12157 from joan38/SPARK-6429-HashCode-Equals.
      bf95b8da
  18. Apr 15, 2016
    • DB Tsai's avatar
      [SPARK-14549][ML] Copy the Vector and Matrix classes from mllib to ml in mllib-local · 96534aa4
      DB Tsai authored
      ## What changes were proposed in this pull request?
      
      This task will copy the Vector and Matrix classes from mllib to ml package in mllib-local jar. The UDTs and `since` annotation in ml vector and matrix will be removed from now. UDTs will be achieved by #SPARK-14487, and `since` will be replaced by /*  since 1.2.0 */
      
      The BLAS implementation will be copied, and some of the test utilities will be copies as well.
      
      Summary of changes:
      
      1. In mllib-local/src/main/scala/org/apache/spark/**ml**/linalg/BLAS.scala
        - Copied from mllib/src/main/scala/org/apache/spark/**mllib**/linalg/BLAS.scala
        - logDebug("gemm: alpha is equal to 0 and beta is equal to 1. Returning C.") is removed in ml version.
      2. In  mllib-local/src/main/scala/org/apache/spark/**ml**/linalg/Matrices.scala
        - Copied from mllib/src/main/scala/org/apache/spark/**mllib**/linalg/Matrices.scala
        - `Since` was removed, and we'll use standard `/* Since /*` Java doc. Will be in another PR.
        - `UDT` related code was removed, and will use `SPARK-13944` https://github.com/apache/spark/pull/12259  to replace the annotation.
      3. In mllib-local/src/main/scala/org/apache/spark/**ml**/linalg/Vectors.scala
        - Copied from mllib/src/main/scala/org/apache/spark/**mllib**/linalg/Vectors.scala
        - `Since` was removed.
        - `UDT` related code was removed.
        - In `def parseNumeric`, it was throwing `throw new SparkException(s"Cannot parse $other.")`, and now it's throwing `throw new IllegalArgumentException(s"Cannot parse $other.")`
      4. In mllib/src/main/scala/org/apache/spark/**mllib**/linalg/Vectors.scala
        - For consistency with ML version of vector, `def parseNumeric` is now throwing `throw new IllegalArgumentException(s"Cannot parse $other.")`
      5. mllib/src/main/scala/org/apache/spark/**mllib**/util/NumericParser.scala is moved to mllib-local/src/main/scala/org/apache/spark/**ml**/util/NumericParser.scala
        - All the `throw new SparkException` were replaced by `throw new IllegalArgumentException`
      
      ## How was this patch tested?
      
      unit tests
      
      Author: DB Tsai <dbt@netflix.com>
      
      Closes #12317 from dbtsai/dbtsai-ml-vector.
      96534aa4
  19. Apr 14, 2016
  20. Apr 11, 2016
    • DB Tsai's avatar
      [SPARK-14462][ML][MLLIB] Add the mllib-local build to maven pom · efaf7d18
      DB Tsai authored
      ## What changes were proposed in this pull request?
      
      In order to separate the linear algebra, and vector matrix classes into a standalone jar, we need to setup the build first. This PR will create a new jar called mllib-local with minimal dependencies.
      
      The previous PR was failing the build because of `spark-core:test` dependency, and that was reverted. In this PR, `FunSuite` with `// scalastyle:ignore funsuite` in mllib-local test was used, similar to sketch.
      
      Thanks.
      
      ## How was this patch tested?
      
      Unit tests
      
      mengxr tedyu holdenk
      
      Author: DB Tsai <dbt@netflix.com>
      
      Closes #12298 from dbtsai/dbtsai-mllib-local-build-fix.
      efaf7d18
  21. Apr 09, 2016
    • Xiangrui Meng's avatar
      415446cc
    • DB Tsai's avatar
      [SPARK-14462][ML][MLLIB] add the mllib-local build to maven pom · 1598d11b
      DB Tsai authored
      ## What changes were proposed in this pull request?
      
      In order to separate the linear algebra, and vector matrix classes into a standalone jar, we need to setup the build first. This PR will create a new jar called mllib-local with minimal dependencies. The test scope will still depend on spark-core and spark-core-test in order to use the common utilities, but the runtime will avoid any platform dependency. Couple platform independent classes will be moved to this package to demonstrate how this work.
      
      ## How was this patch tested?
      
      Unit tests
      
      Author: DB Tsai <dbt@netflix.com>
      
      Closes #12241 from dbtsai/dbtsai-mllib-local-build.
      1598d11b
Loading