There was an error fetching the commit references. Please try again later.

Unverified Commit 15eb86c2 authored 8 years ago by sethah Committed by DB Tsai 8 years ago

[SPARK-18456][ML][FOLLOWUP] Use matrix abstraction for coefficients in LogisticRegression training

## What changes were proposed in this pull request?

This is a follow up to some of the discussion [here](https://github.com/apache/spark/pull/15593

). During LogisticRegression training, we store the coefficients combined with intercepts as a flat vector, but a more natural abstraction is a matrix. Here, we refactor the code to use matrix where possible, which makes the code more readable and greatly simplifies the indexing.

Note: We do not use a Breeze matrix for the cost function as was mentioned in the linked PR. This is because LBFGS/OWLQN require an implicit `MutableInnerProductModule[DenseMatrix[Double], Double]` which is not natively defined in Breeze. We would need to extend Breeze in Spark to define it ourselves. Also, we do not modify the `regParamL1Fun` because OWLQN in Breeze requires a `MutableEnumeratedCoordinateField[(Int, Int), DenseVector[Double]]` (since we still use a dense vector for coefficients). Here again we would have to extend Breeze inside Spark.

## How was this patch tested?

This is internal code refactoring - the current unit tests passing show us that the change did not break anything. No added functionality in this patch.

Author: sethah <seth.hendrickson16@gmail.com>

Closes #15893 from sethah/logreg_refactor.

(cherry picked from commit 856e0042)
Signed-off-by: DB Tsai <dbtsai@dbtsai.com>

parent 15ad3a31

No related branches found

No related tags found

No related merge requests found

Hide whitespace changes

Inline Side-by-side

Showing with 53 additions and 62 deletions

Please register or to comment