Skip to content
Snippets Groups Projects
Commit 257cde7c authored by lewuathe's avatar lewuathe Committed by Xiangrui Meng
Browse files

[SPARK-6421][MLLIB] _regression_train_wrapper does not test initialWeights correctly

Weight parameters must be initialized correctly even when numpy array is passed as initial weights.

Author: lewuathe <lewuathe@me.com>

Closes #5101 from Lewuathe/SPARK-6421 and squashes the following commits:

7795201 [lewuathe] Fix lint-python errors
21d4fe3 [lewuathe] Fix init logic of weights
parent 11e02595
No related branches found
No related tags found
No related merge requests found
......@@ -163,7 +163,8 @@ def _regression_train_wrapper(train_func, modelClass, data, initial_weights):
first = data.first()
if not isinstance(first, LabeledPoint):
raise ValueError("data should be an RDD of LabeledPoint, but got %s" % first)
initial_weights = initial_weights or [0.0] * len(data.first().features)
if initial_weights is None:
initial_weights = [0.0] * len(data.first().features)
weights, intercept = train_func(data, _convert_to_vector(initial_weights))
return modelClass(weights, intercept)
......
......@@ -323,6 +323,13 @@ class ListTests(PySparkTestCase):
self.assertTrue(gbt_model.predict(features[2]) <= 0)
self.assertTrue(gbt_model.predict(features[3]) > 0)
try:
LinearRegressionWithSGD.train(rdd, initialWeights=array([1.0, 1.0]))
LassoWithSGD.train(rdd, initialWeights=array([1.0, 1.0]))
RidgeRegressionWithSGD.train(rdd, initialWeights=array([1.0, 1.0]))
except ValueError:
self.fail()
class StatTests(PySparkTestCase):
# SPARK-4023
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment