Skip to content
Snippets Groups Projects
  1. Sep 01, 2017
    • Sean Owen's avatar
      [SPARK-14280][BUILD][WIP] Update change-version.sh and pom.xml to add Scala... · 12ab7f7e
      Sean Owen authored
      [SPARK-14280][BUILD][WIP] Update change-version.sh and pom.xml to add Scala 2.12 profiles and enable 2.12 compilation
      
      …build; fix some things that will be warnings or errors in 2.12; restore Scala 2.12 profile infrastructure
      
      ## What changes were proposed in this pull request?
      
      This change adds back the infrastructure for a Scala 2.12 build, but does not enable it in the release or Python test scripts.
      
      In order to make that meaningful, it also resolves compile errors that the code hits in 2.12 only, in a way that still works with 2.11.
      
      It also updates dependencies to the earliest minor release of dependencies whose current version does not yet support Scala 2.12. This is in a sense covered by other JIRAs under the main umbrella, but implemented here. The versions below still work with 2.11, and are the _latest_ maintenance release in the _earliest_ viable minor release.
      
      - Scalatest 2.x -> 3.0.3
      - Chill 0.8.0 -> 0.8.4
      - Clapper 1.0.x -> 1.1.2
      - json4s 3.2.x -> 3.4.2
      - Jackson 2.6.x -> 2.7.9 (required by json4s)
      
      This change does _not_ fully enable a Scala 2.12 build:
      
      - It will also require dropping support for Kafka before 0.10. Easy enough, just didn't do it yet here
      - It will require recreating `SparkILoop` and `Main` for REPL 2.12, which is SPARK-14650. Possible to do here too.
      
      What it does do is make changes that resolve much of the remaining gap without affecting the current 2.11 build.
      
      ## How was this patch tested?
      
      Existing tests and build. Manually tested with `./dev/change-scala-version.sh 2.12` to verify it compiles, modulo the exceptions above.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #18645 from srowen/SPARK-14280.
      12ab7f7e
  2. Aug 16, 2017
    • Peng Meng's avatar
      [SPARK-21680][ML][MLLIB] optimize Vector compress · a0345cbe
      Peng Meng authored
      ## What changes were proposed in this pull request?
      
      When use Vector.compressed to change a Vector to SparseVector, the performance is very low comparing with Vector.toSparse.
      This is because you have to scan the value three times using Vector.compressed, but you just need two times when use Vector.toSparse.
      When the length of the vector is large, there is significant performance difference between this two method.
      
      ## How was this patch tested?
      
      The existing UT
      
      Author: Peng Meng <peng.meng@intel.com>
      
      Closes #18899 from mpjlu/optVectorCompress.
      a0345cbe
  3. Aug 15, 2017
    • Marcelo Vanzin's avatar
      [SPARK-21731][BUILD] Upgrade scalastyle to 0.9. · 3f958a99
      Marcelo Vanzin authored
      This version fixes a few issues in the import order checker; it provides
      better error messages, and detects more improper ordering (thus the need
      to change a lot of files in this patch). The main fix is that it correctly
      complains about the order of packages vs. classes.
      
      As part of the above, I moved some "SparkSession" import in ML examples
      inside the "$example on$" blocks; that didn't seem consistent across
      different source files to start with, and avoids having to add more on/off blocks
      around specific imports.
      
      The new scalastyle also seems to have a better header detector, so a few
      license headers had to be updated to match the expected indentation.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #18943 from vanzin/SPARK-21731.
      3f958a99
  4. Jul 13, 2017
    • Sean Owen's avatar
      [SPARK-19810][BUILD][CORE] Remove support for Scala 2.10 · 425c4ada
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      - Remove Scala 2.10 build profiles and support
      - Replace some 2.10 support in scripts with commented placeholders for 2.12 later
      - Remove deprecated API calls from 2.10 support
      - Remove usages of deprecated context bounds where possible
      - Remove Scala 2.10 workarounds like ScalaReflectionLock
      - Other minor Scala warning fixes
      
      ## How was this patch tested?
      
      Existing tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #17150 from srowen/SPARK-19810.
      425c4ada
  5. May 16, 2017
  6. May 09, 2017
    • Jon McLean's avatar
      [SPARK-20615][ML][TEST] SparseVector.argmax throws IndexOutOfBoundsException · be53a783
      Jon McLean authored
      ## What changes were proposed in this pull request?
      
      Added a check for for the number of defined values.  Previously the argmax function assumed that at least one value was defined if the vector size was greater than zero.
      
      ## How was this patch tested?
      
      Tests were added to the existing VectorsSuite to cover this case.
      
      Author: Jon McLean <jon.mclean@atsid.com>
      
      Closes #17877 from jonmclean/vectorArgmaxIndexBug.
      be53a783
  7. Apr 24, 2017
  8. Apr 09, 2017
    • Vijay Ramesh's avatar
      [SPARK-20260][MLLIB] String interpolation required for error message · 261eaf51
      Vijay Ramesh authored
      ## What changes were proposed in this pull request?
      This error message doesn't get properly formatted because of a missing `s`.  Currently the error looks like:
      
      ```
      Caused by: java.lang.IllegalArgumentException: requirement failed: indices should be one-based and in ascending order; found current=$current, previous=$previous; line="$line"
      ```
      (note the literal `$current` instead of the interpolated value)
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: Vijay Ramesh <vramesh@demandbase.com>
      
      Closes #17572 from vijaykramesh/master.
      261eaf51
  9. Mar 24, 2017
    • sethah's avatar
      [SPARK-17471][ML] Add compressed method to ML matrices · e8810b73
      sethah authored
      ## What changes were proposed in this pull request?
      
      This patch adds a `compressed` method to ML `Matrix` class, which returns the minimal storage representation of the matrix - either sparse or dense. Because the space occupied by a sparse matrix is dependent upon its layout (i.e. column major or row major), this method must consider both cases. It may also be useful to force the layout to be column or row major beforehand, so an overload is added which takes in a `columnMajor: Boolean` parameter.
      
      The compressed implementation relies upon two new abstract methods `toDense(columnMajor: Boolean)` and `toSparse(columnMajor: Boolean)`, similar to the compressed method implemented in the `Vector` class. These methods also allow the layout of the resulting matrix to be specified via the `columnMajor` parameter. More detail on the new methods is given below.
      ## How was this patch tested?
      
      Added many new unit tests
      ## New methods (summary, not exhaustive list)
      
      **Matrix trait**
      - `private[ml] def toDenseMatrix(columnMajor: Boolean): DenseMatrix` (abstract) - converts the matrix (either sparse or dense) to dense format
      - `private[ml] def toSparseMatrix(columnMajor: Boolean): SparseMatrix` (abstract) -  converts the matrix (either sparse or dense) to sparse format
      - `def toDense: DenseMatrix = toDense(true)`  - converts the matrix (either sparse or dense) to dense format in column major layout
      - `def toSparse: SparseMatrix = toSparse(true)` -  converts the matrix (either sparse or dense) to sparse format in column major layout
      - `def compressed: Matrix` - finds the minimum space representation of this matrix, considering both column and row major layouts, and converts it
      - `def compressed(columnMajor: Boolean): Matrix` - finds the minimum space representation of this matrix considering only column OR row major, and converts it
      
      **DenseMatrix class**
      - `private[ml] def toDenseMatrix(columnMajor: Boolean): DenseMatrix` - converts the dense matrix to a dense matrix, optionally changing the layout (data is NOT duplicated if the layouts are the same)
      - `private[ml] def toSparseMatrix(columnMajor: Boolean): SparseMatrix` - converts the dense matrix to sparse matrix, using the specified layout
      
      **SparseMatrix class**
      - `private[ml] def toDenseMatrix(columnMajor: Boolean): DenseMatrix` - converts the sparse matrix to a dense matrix, using the specified layout
      - `private[ml] def toSparseMatrix(columnMajors: Boolean): SparseMatrix` - converts the sparse matrix to sparse matrix. If the sparse matrix contains any explicit zeros, they are removed. If the layout requested does not match the current layout, data is copied to a new representation. If the layouts match and no explicit zeros exist, the current matrix is returned.
      
      Author: sethah <seth.hendrickson16@gmail.com>
      
      Closes #15628 from sethah/matrix_compress.
      e8810b73
  10. Mar 23, 2017
    • Timothy Hunter's avatar
      [SPARK-19636][ML] Feature parity for correlation statistics in MLlib · d27daa54
      Timothy Hunter authored
      ## What changes were proposed in this pull request?
      
      This patch adds the Dataframes-based support for the correlation statistics found in the `org.apache.spark.mllib.stat.correlation.Statistics`, following the design doc discussed in the JIRA ticket.
      
      The current implementation is a simple wrapper around the `spark.mllib` implementation. Future optimizations can be implemented at a later stage.
      
      ## How was this patch tested?
      
      ```
      build/sbt "testOnly org.apache.spark.ml.stat.StatisticsSuite"
      ```
      
      Author: Timothy Hunter <timhunter@databricks.com>
      
      Closes #17108 from thunterdb/19636.
      d27daa54
  11. Feb 01, 2017
    • hyukjinkwon's avatar
      [SPARK-19402][DOCS] Support LaTex inline formula correctly and fix warnings in... · f1a1f260
      hyukjinkwon authored
      [SPARK-19402][DOCS] Support LaTex inline formula correctly and fix warnings in Scala/Java APIs generation
      
      ## What changes were proposed in this pull request?
      
      This PR proposes three things as below:
      
      - Support LaTex inline-formula, `\( ... \)` in Scala API documentation
        It seems currently,
      
        ```
        \( ... \)
        ```
      
        are rendered as they are, for example,
      
        <img width="345" alt="2017-01-30 10 01 13" src="https://cloud.githubusercontent.com/assets/6477701/22423960/ab37d54a-e737-11e6-9196-4f6229c0189c.png">
      
        It seems mistakenly more backslashes were added.
      
      - Fix warnings Scaladoc/Javadoc generation
        This PR fixes t two types of warnings as below:
      
        ```
        [warn] .../spark/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala:335: Could not find any member to link for "UnsupportedOperationException".
        [warn]   /**
        [warn]   ^
        ```
      
        ```
        [warn] .../spark/sql/core/src/main/scala/org/apache/spark/sql/internal/VariableSubstitution.scala:24: Variable var undefined in comment for class VariableSubstitution in class VariableSubstitution
        [warn]  * `${var}`, `${system:var}` and `${env:var}`.
        [warn]      ^
        ```
      
      - Fix Javadoc8 break
        ```
        [error] .../spark/mllib/target/java/org/apache/spark/ml/PredictionModel.java:7: error: reference not found
        [error]  *                       E.g., {link VectorUDT} for vector features.
        [error]                                       ^
        [error] .../spark/mllib/target/java/org/apache/spark/ml/PredictorParams.java:12: error: reference not found
        [error]    *                          E.g., {link VectorUDT} for vector features.
        [error]                                            ^
        [error] .../spark/mllib/target/java/org/apache/spark/ml/Predictor.java:10: error: reference not found
        [error]  *                       E.g., {link VectorUDT} for vector features.
        [error]                                       ^
        [error] .../spark/sql/hive/target/java/org/apache/spark/sql/hive/HiveAnalysis.java:5: error: reference not found
        [error]  * Note that, this rule must be run after {link PreprocessTableInsertion}.
        [error]                                                  ^
        ```
      
      ## How was this patch tested?
      
      Manually via `sbt unidoc` and `jeykil build`.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #16741 from HyukjinKwon/warn-and-break.
      f1a1f260
  12. Dec 21, 2016
    • Ryan Williams's avatar
      [SPARK-17807][CORE] split test-tags into test-JAR · afd9bc1d
      Ryan Williams authored
      Remove spark-tag's compile-scope dependency (and, indirectly, spark-core's compile-scope transitive-dependency) on scalatest by splitting test-oriented tags into spark-tags' test JAR.
      
      Alternative to #16303.
      
      Author: Ryan Williams <ryan.blake.williams@gmail.com>
      
      Closes #16311 from ryan-williams/tt.
      afd9bc1d
  13. Dec 02, 2016
  14. Nov 29, 2016
  15. Nov 25, 2016
    • hyukjinkwon's avatar
      [SPARK-3359][BUILD][DOCS] More changes to resolve javadoc 8 errors that will... · 51b1c155
      hyukjinkwon authored
      [SPARK-3359][BUILD][DOCS] More changes to resolve javadoc 8 errors that will help unidoc/genjavadoc compatibility
      
      ## What changes were proposed in this pull request?
      
      This PR only tries to fix things that looks pretty straightforward and were fixed in other previous PRs before.
      
      This PR roughly fixes several things as below:
      
      - Fix unrecognisable class and method links in javadoc by changing it from `[[..]]` to `` `...` ``
      
        ```
        [error] .../spark/sql/core/target/java/org/apache/spark/sql/streaming/DataStreamReader.java:226: error: reference not found
        [error]    * Loads text files and returns a {link DataFrame} whose schema starts with a string column named
        ```
      
      - Fix an exception annotation and remove code backticks in `throws` annotation
      
        Currently, sbt unidoc with Java 8 complains as below:
      
        ```
        [error] .../java/org/apache/spark/sql/streaming/StreamingQuery.java:72: error: unexpected text
        [error]    * throws StreamingQueryException, if <code>this</code> query has terminated with an exception.
        ```
      
        `throws` should specify the correct class name from `StreamingQueryException,` to `StreamingQueryException` without backticks. (see [JDK-8007644](https://bugs.openjdk.java.net/browse/JDK-8007644)).
      
      - Fix `[[http..]]` to `<a href="http..."></a>`.
      
        ```diff
        -   * [[https://blogs.oracle.com/java-platform-group/entry/diagnosing_tls_ssl_and_https Oracle
        -   * blog page]].
        +   * <a href="https://blogs.oracle.com/java-platform-group/entry/diagnosing_tls_ssl_and_https">
        +   * Oracle blog page</a>.
        ```
      
         `[[http...]]` link markdown in scaladoc is unrecognisable in javadoc.
      
      - It seems class can't have `return` annotation. So, two cases of this were removed.
      
        ```
        [error] .../java/org/apache/spark/mllib/regression/IsotonicRegression.java:27: error: invalid use of return
        [error]    * return New instance of IsotonicRegression.
        ```
      
      - Fix < to `&lt;` and > to `&gt;` according to HTML rules.
      
      - Fix `</p>` complaint
      
      - Exclude unrecognisable in javadoc, `constructor`, `todo` and `groupname`.
      
      ## How was this patch tested?
      
      Manually tested by `jekyll build` with Java 7 and 8
      
      ```
      java version "1.7.0_80"
      Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
      Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)
      ```
      
      ```
      java version "1.8.0_45"
      Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
      Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
      ```
      
      Note: this does not yet make sbt unidoc suceed with Java 8 yet but it reduces the number of errors with Java 8.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #15999 from HyukjinKwon/SPARK-3359-errors.
      51b1c155
  16. Nov 19, 2016
    • hyukjinkwon's avatar
      [SPARK-18445][BUILD][DOCS] Fix the markdown for `Note:`/`NOTE:`/`Note... · d5b1d5fc
      hyukjinkwon authored
      [SPARK-18445][BUILD][DOCS] Fix the markdown for `Note:`/`NOTE:`/`Note that`/`'''Note:'''` across Scala/Java API documentation
      
      ## What changes were proposed in this pull request?
      
      It seems in Scala/Java,
      
      - `Note:`
      - `NOTE:`
      - `Note that`
      - `'''Note:'''`
      - `note`
      
      This PR proposes to fix those to `note` to be consistent.
      
      **Before**
      
      - Scala
        ![2016-11-17 6 16 39](https://cloud.githubusercontent.com/assets/6477701/20383180/1a7aed8c-acf2-11e6-9611-5eaf6d52c2e0.png)
      
      - Java
        ![2016-11-17 6 14 41](https://cloud.githubusercontent.com/assets/6477701/20383096/c8ffc680-acf1-11e6-914a-33460bf1401d.png)
      
      **After**
      
      - Scala
        ![2016-11-17 6 16 44](https://cloud.githubusercontent.com/assets/6477701/20383167/09940490-acf2-11e6-937a-0d5e1dc2cadf.png)
      
      - Java
        ![2016-11-17 6 13 39](https://cloud.githubusercontent.com/assets/6477701/20383132/e7c2a57e-acf1-11e6-9c47-b849674d4d88.png)
      
      ## How was this patch tested?
      
      The notes were found via
      
      ```bash
      grep -r "NOTE: " . | \ # Note:|NOTE:|Note that|'''Note:'''
      grep -v "// NOTE: " | \  # starting with // does not appear in API documentation.
      grep -E '.scala|.java' | \ # java/scala files
      grep -v Suite | \ # exclude tests
      grep -v Test | \ # exclude tests
      grep -e 'org.apache.spark.api.java' \ # packages appear in API documenation
      -e 'org.apache.spark.api.java.function' \ # note that this is a regular expression. So actual matches were mostly `org/apache/spark/api/java/functions ...`
      -e 'org.apache.spark.api.r' \
      ...
      ```
      
      ```bash
      grep -r "Note that " . | \ # Note:|NOTE:|Note that|'''Note:'''
      grep -v "// Note that " | \  # starting with // does not appear in API documentation.
      grep -E '.scala|.java' | \ # java/scala files
      grep -v Suite | \ # exclude tests
      grep -v Test | \ # exclude tests
      grep -e 'org.apache.spark.api.java' \ # packages appear in API documenation
      -e 'org.apache.spark.api.java.function' \
      -e 'org.apache.spark.api.r' \
      ...
      ```
      
      ```bash
      grep -r "Note: " . | \ # Note:|NOTE:|Note that|'''Note:'''
      grep -v "// Note: " | \  # starting with // does not appear in API documentation.
      grep -E '.scala|.java' | \ # java/scala files
      grep -v Suite | \ # exclude tests
      grep -v Test | \ # exclude tests
      grep -e 'org.apache.spark.api.java' \ # packages appear in API documenation
      -e 'org.apache.spark.api.java.function' \
      -e 'org.apache.spark.api.r' \
      ...
      ```
      
      ```bash
      grep -r "'''Note:'''" . | \ # Note:|NOTE:|Note that|'''Note:'''
      grep -v "// '''Note:''' " | \  # starting with // does not appear in API documentation.
      grep -E '.scala|.java' | \ # java/scala files
      grep -v Suite | \ # exclude tests
      grep -v Test | \ # exclude tests
      grep -e 'org.apache.spark.api.java' \ # packages appear in API documenation
      -e 'org.apache.spark.api.java.function' \
      -e 'org.apache.spark.api.r' \
      ...
      ```
      
      And then fixed one by one comparing with API documentation/access modifiers.
      
      After that, manually tested via `jekyll build`.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #15889 from HyukjinKwon/SPARK-18437.
      d5b1d5fc
  17. Oct 25, 2016
    • sethah's avatar
      [SPARK-17748][ML] One pass solver for Weighted Least Squares with ElasticNet · 78d740a0
      sethah authored
      ## What changes were proposed in this pull request?
      
      1. Make a pluggable solver interface for `WeightedLeastSquares`
      2. Add a `QuasiNewton` solver to handle elastic net regularization for `WeightedLeastSquares`
      3. Add method `BLAS.dspmv` used by QN solver
      4. Add mechanism for WLS to handle singular covariance matrices by falling back to QN solver when Cholesky fails.
      
      ## How was this patch tested?
      Unit tests - see below.
      
      ## Design choices
      
      **Pluggable Normal Solver**
      
      Before, the `WeightedLeastSquares` package always used the Cholesky decomposition solver to compute the solution to the normal equations. Now, we specify the solver as a constructor argument to the `WeightedLeastSquares`. We introduce a new trait:
      
      ````scala
      private[ml] sealed trait NormalEquationSolver {
      
        def solve(
            bBar: Double,
            bbBar: Double,
            abBar: DenseVector,
            aaBar: DenseVector,
            aBar: DenseVector): NormalEquationSolution
      }
      ````
      
      We extend this trait for different variants of normal equation solvers. In the future, we can easily add others (like QR) using this interface.
      
      **Always train in the standardized space**
      
      The normal solver did not previously standardize the data, but this patch introduces a change such that we always solve the normal equations in the standardized space. We convert back to the original space in the same way that is done for distributed L-BFGS/OWL-QN. We add test cases for zero variance features/labels.
      
      **Use L-BFGS locally to solve normal equations for singular matrix**
      
      When linear regression with the normal solver is called for a singular matrix, we initially try to solve with Cholesky. We use the output of `lapack.dppsv` to determine if the matrix is singular. If it is, we fall back to using L-BFGS locally to solve the normal equations. We add test cases for this as well.
      
      ## Test cases
      I found it helpful to enumerate some of the test cases and hopefully it makes review easier.
      
      **WeightedLeastSquares**
      
      1. Constant columns - Cholesky solver fails with no regularization, Auto solver falls back to QN, and QN trains successfully.
      2. Collinear features - Cholesky solver fails with no regularization, Auto solver falls back to QN, and QN trains successfully.
      3. Label is constant zero - no training is performed regardless of intercept. Coefficients are zero and intercept is zero.
      4. Label is constant - if fitIntercept, then no training is performed and intercept equals label mean. If not fitIntercept, then we train and return an answer that matches R's lm package.
      5. Test with L1 - go through various combinations of L1/L2, standardization, fitIntercept and verify that output matches glmnet.
      6. Initial intercept - verify that setting the initial intercept to label mean is correct by training model with strong L1 regularization so that all coefficients are zero and intercept converges to label mean.
      7. Test diagInvAtWA - since we are standardizing features now during training, we should test that the inverse is computed to match R.
      
      **LinearRegression**
      1. For all existing L1 test cases, test the "normal" solver too.
      2. Check that using the normal solver now handles singular matrices.
      3. Check that using the normal solver with L1 produces an objective history in the model summary, but does not produce the inverse of AtA.
      
      **BLAS**
      1. Test new method `dspmv`.
      
      ## Performance Testing
      This patch will speed up linear regression with L1/elasticnet penalties when the feature size is < 4096. I have not conducted performance tests at scale, only observed by testing locally that there is a speed improvement.
      
      We should decide if this PR needs to be blocked before performance testing is conducted.
      
      Author: sethah <seth.hendrickson16@gmail.com>
      
      Closes #15394 from sethah/SPARK-17748.
      78d740a0
  18. Oct 21, 2016
    • Zheng RuiFeng's avatar
      [SPARK-17331][FOLLOWUP][ML][CORE] Avoid allocating 0-length arrays · a8ea4da8
      Zheng RuiFeng authored
      ## What changes were proposed in this pull request?
      
      `Array[T]()` -> `Array.empty[T]` to avoid allocating 0-length arrays.
      Use regex `find . -name '*.scala' | xargs -i bash -c 'egrep "Array\[[A-Za-z]+\]\(\)" -n {} && echo {}'` to find modification candidates.
      
      cc srowen
      
      ## How was this patch tested?
      existing tests
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #15564 from zhengruifeng/avoid_0_length_array.
      a8ea4da8
  19. Sep 29, 2016
    • Bjarne Fruergaard's avatar
      [SPARK-17721][MLLIB][ML] Fix for multiplying transposed SparseMatrix with SparseVector · 29396e7d
      Bjarne Fruergaard authored
      ## What changes were proposed in this pull request?
      
      * changes the implementation of gemv with transposed SparseMatrix and SparseVector both in mllib-local and mllib (identical)
      * adds a test that was failing before this change, but succeeds with these changes.
      
      The problem in the previous implementation was that it only increments `i`, that is enumerating the columns of a row in the SparseMatrix, when the row-index of the vector matches the column-index of the SparseMatrix. In cases where a particular row of the SparseMatrix has non-zero values at column-indices lower than corresponding non-zero row-indices of the SparseVector, the non-zero values of the SparseVector are enumerated without ever matching the column-index at index `i` and the remaining column-indices i+1,...,indEnd-1 are never attempted. The test cases in this PR illustrate this issue.
      
      ## How was this patch tested?
      
      I have run the specific `gemv` tests in both mllib-local and mllib. I am currently still running `./dev/run-tests`.
      
      ## ___
      As per instructions, I hereby state that this is my original work and that I license the work to the project (Apache Spark) under the project's open source license.
      
      Mentioning dbtsai, viirya and brkyvz whom I can see have worked/authored on these parts before.
      
      Author: Bjarne Fruergaard <bwahlgreen@gmail.com>
      
      Closes #15296 from bwahlgreen/bugfix-spark-17721.
      29396e7d
  20. Sep 07, 2016
    • Liwei Lin's avatar
      [SPARK-17359][SQL][MLLIB] Use ArrayBuffer.+=(A) instead of... · 3ce3a282
      Liwei Lin authored
      [SPARK-17359][SQL][MLLIB] Use ArrayBuffer.+=(A) instead of ArrayBuffer.append(A) in performance critical paths
      
      ## What changes were proposed in this pull request?
      
      We should generally use `ArrayBuffer.+=(A)` rather than `ArrayBuffer.append(A)`, because `append(A)` would involve extra boxing / unboxing.
      
      ## How was this patch tested?
      
      N/A
      
      Author: Liwei Lin <lwlin7@gmail.com>
      
      Closes #14914 from lw-lin/append_to_plus_eq_v2.
      3ce3a282
  21. Sep 04, 2016
    • Yanbo Liang's avatar
      [MINOR][ML][MLLIB] Remove work around for breeze sparse matrix. · 1b001b52
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      Since we have updated breeze version to 0.12, we should remove work around for bug of breeze sparse matrix in v0.11.
      I checked all mllib code and found this is the only work around for breeze 0.11.
      
      ## How was this patch tested?
      Existing tests.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #14953 from yanboliang/matrices.
      1b001b52
  22. Sep 01, 2016
    • Sean Owen's avatar
      [SPARK-17331][CORE][MLLIB] Avoid allocating 0-length arrays · 3893e8c5
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Avoid allocating some 0-length arrays, esp. in UTF8String, and by using Array.empty in Scala over Array[T]()
      
      ## How was this patch tested?
      
      Jenkins
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #14895 from srowen/SPARK-17331.
      3893e8c5
  23. Aug 27, 2016
    • Peng, Meng's avatar
      [ML][MLLIB] The require condition and message doesn't match in SparseMatrix. · 40168dbe
      Peng, Meng authored
      ## What changes were proposed in this pull request?
      The require condition and message doesn't match, and the condition also should be optimized.
      Small change.  Please kindly let me know if JIRA required.
      
      ## How was this patch tested?
      No additional test required.
      
      Author: Peng, Meng <peng.meng@intel.com>
      
      Closes #14824 from mpjlu/smallChangeForMatrixRequire.
      40168dbe
  24. Aug 26, 2016
    • Peng, Meng's avatar
      [SPARK-17207][MLLIB] fix comparing Vector bug in TestingUtils · c0949dc9
      Peng, Meng authored
      ## What changes were proposed in this pull request?
      
      fix comparing Vector bug in TestingUtils.
      There is the same bug for Matrix comparing. How to check the length of Matrix should be discussed first.
      
      ## How was this patch tested?
      
      (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
      
      (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
      
      Author: Peng, Meng <peng.meng@intel.com>
      
      Closes #14785 from mpjlu/testUtils.
      c0949dc9
  25. Aug 19, 2016
    • Jeff Zhang's avatar
      [SPARK-16965][MLLIB][PYSPARK] Fix bound checking for SparseVector. · 072acf5e
      Jeff Zhang authored
      ## What changes were proposed in this pull request?
      
      1. In scala, add negative low bound checking and put all the low/upper bound checking in one place
      2. In python, add low/upper bound checking of indices.
      
      ## How was this patch tested?
      
      unit test added
      
      Author: Jeff Zhang <zjffdu@apache.org>
      
      Closes #14555 from zjffdu/SPARK-16965.
      072acf5e
  26. Jul 19, 2016
  27. Jul 16, 2016
    • Sean Owen's avatar
      [SPARK-3359][DOCS] More changes to resolve javadoc 8 errors that will help... · 5ec0d692
      Sean Owen authored
      [SPARK-3359][DOCS] More changes to resolve javadoc 8 errors that will help unidoc/genjavadoc compatibility
      
      ## What changes were proposed in this pull request?
      
      These are yet more changes that resolve problems with unidoc/genjavadoc and Java 8. It does not fully resolve the problem, but gets rid of as many errors as we can from this end.
      
      ## How was this patch tested?
      
      Jenkins build of docs
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #14221 from srowen/SPARK-3359.3.
      5ec0d692
  28. Jul 11, 2016
    • Reynold Xin's avatar
      [SPARK-16477] Bump master version to 2.1.0-SNAPSHOT · ffcb6e05
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      After SPARK-16476 (committed earlier today as #14128), we can finally bump the version number.
      
      ## How was this patch tested?
      N/A
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #14130 from rxin/SPARK-16477.
      ffcb6e05
  29. Jun 06, 2016
    • Zheng RuiFeng's avatar
      [MINOR] Fix Typos 'an -> a' · fd8af397
      Zheng RuiFeng authored
      ## What changes were proposed in this pull request?
      
      `an -> a`
      
      Use cmds like `find . -name '*.R' | xargs -i sh -c "grep -in ' an [^aeiou]' {} && echo {}"` to generate candidates, and review them one by one.
      
      ## How was this patch tested?
      manual tests
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #13515 from zhengruifeng/an_a.
      fd8af397
  30. May 27, 2016
    • DB Tsai's avatar
      [SPARK-15413][ML][MLLIB] Change `toBreeze` to `asBreeze` in Vector and Matrix · 21b2605d
      DB Tsai authored
      ## What changes were proposed in this pull request?
      
      We're using `asML` to convert the mllib vector/matrix to ml vector/matrix now. Using `as` is more correct given that this conversion actually shares the same underline data structure. As a result, in this PR, `toBreeze` will be changed to `asBreeze`. This is a private API, as a result, it will not affect any user's application.
      
      ## How was this patch tested?
      
      unit tests
      
      Author: DB Tsai <dbt@netflix.com>
      
      Closes #13198 from dbtsai/minor.
      21b2605d
  31. May 19, 2016
  32. May 17, 2016
  33. Apr 30, 2016
    • Xiangrui Meng's avatar
      [SPARK-14653][ML] Remove json4s from mllib-local · 0847fe4e
      Xiangrui Meng authored
      ## What changes were proposed in this pull request?
      
      This PR moves Vector.toJson/fromJson to ml.linalg.VectorEncoder under mllib/ to keep mllib-local's dependency minimal. The json encoding is used by Params. So we still need this feature in SPARK-14615, where we will switch to ml.linalg in spark.ml APIs.
      
      ## How was this patch tested?
      
      Copied existing unit tests over.
      
      cc; dbtsai
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #12802 from mengxr/SPARK-14653.
      0847fe4e
  34. Apr 28, 2016
  35. Apr 26, 2016
    • Joseph K. Bradley's avatar
      [SPARK-14732][ML] spark.ml GaussianMixture should use MultivariateGaussian in mllib-local · bd2c9a6d
      Joseph K. Bradley authored
      ## What changes were proposed in this pull request?
      
      Before, spark.ml GaussianMixtureModel used the spark.mllib MultivariateGaussian in its public API.  This was added after 1.6, so we can modify this API without breaking APIs.
      
      This PR copies MultivariateGaussian to mllib-local in spark.ml, with a few changes:
      * Renamed fields to match numpy, scipy: mu => mean, sigma => cov
      
      This PR then uses the spark.ml MultivariateGaussian in the spark.ml GaussianMixtureModel, which involves:
      * Modifying the constructor
      * Adding a computeProbabilities method
      
      Also:
      * Added EPSILON to mllib-local for use in MultivariateGaussian
      
      ## How was this patch tested?
      
      Existing unit tests
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #12593 from jkbradley/sparkml-gmm-fix.
      bd2c9a6d
  36. Apr 22, 2016
    • Joan's avatar
      [SPARK-6429] Implement hashCode and equals together · bf95b8da
      Joan authored
      ## What changes were proposed in this pull request?
      
      Implement some `hashCode` and `equals` together in order to enable the scalastyle.
      This is a first batch, I will continue to implement them but I wanted to know your thoughts.
      
      Author: Joan <joan@goyeau.com>
      
      Closes #12157 from joan38/SPARK-6429-HashCode-Equals.
      bf95b8da
  37. Apr 15, 2016
    • DB Tsai's avatar
      [SPARK-14549][ML] Copy the Vector and Matrix classes from mllib to ml in mllib-local · 96534aa4
      DB Tsai authored
      ## What changes were proposed in this pull request?
      
      This task will copy the Vector and Matrix classes from mllib to ml package in mllib-local jar. The UDTs and `since` annotation in ml vector and matrix will be removed from now. UDTs will be achieved by #SPARK-14487, and `since` will be replaced by /*  since 1.2.0 */
      
      The BLAS implementation will be copied, and some of the test utilities will be copies as well.
      
      Summary of changes:
      
      1. In mllib-local/src/main/scala/org/apache/spark/**ml**/linalg/BLAS.scala
        - Copied from mllib/src/main/scala/org/apache/spark/**mllib**/linalg/BLAS.scala
        - logDebug("gemm: alpha is equal to 0 and beta is equal to 1. Returning C.") is removed in ml version.
      2. In  mllib-local/src/main/scala/org/apache/spark/**ml**/linalg/Matrices.scala
        - Copied from mllib/src/main/scala/org/apache/spark/**mllib**/linalg/Matrices.scala
        - `Since` was removed, and we'll use standard `/* Since /*` Java doc. Will be in another PR.
        - `UDT` related code was removed, and will use `SPARK-13944` https://github.com/apache/spark/pull/12259  to replace the annotation.
      3. In mllib-local/src/main/scala/org/apache/spark/**ml**/linalg/Vectors.scala
        - Copied from mllib/src/main/scala/org/apache/spark/**mllib**/linalg/Vectors.scala
        - `Since` was removed.
        - `UDT` related code was removed.
        - In `def parseNumeric`, it was throwing `throw new SparkException(s"Cannot parse $other.")`, and now it's throwing `throw new IllegalArgumentException(s"Cannot parse $other.")`
      4. In mllib/src/main/scala/org/apache/spark/**mllib**/linalg/Vectors.scala
        - For consistency with ML version of vector, `def parseNumeric` is now throwing `throw new IllegalArgumentException(s"Cannot parse $other.")`
      5. mllib/src/main/scala/org/apache/spark/**mllib**/util/NumericParser.scala is moved to mllib-local/src/main/scala/org/apache/spark/**ml**/util/NumericParser.scala
        - All the `throw new SparkException` were replaced by `throw new IllegalArgumentException`
      
      ## How was this patch tested?
      
      unit tests
      
      Author: DB Tsai <dbt@netflix.com>
      
      Closes #12317 from dbtsai/dbtsai-ml-vector.
      96534aa4
  38. Apr 14, 2016
Loading