Skip to content
Snippets Groups Projects
Commit 6ed7e2cd authored by Evan Sparks's avatar Evan Sparks Committed by Reynold Xin
Browse files

Use numpy directly for matrix multiply.

Using matrix multiply to compute XtX and XtY yields a 5-20x speedup depending on problem size.

For example - the following takes 19s locally after this change vs. 5m21s before the change. (16x speedup).
bin/pyspark examples/src/main/python/als.py local[8] 1000 1000 50 10 10

Author: Evan Sparks <evan.sparks@gmail.com>

Closes #687 from etrain/patch-1 and squashes the following commits:

e094dbc [Evan Sparks] Touching only diaganols on update.
d1ab9b6 [Evan Sparks] Use numpy directly for matrix multiply.
parent 108c4c16
No related branches found
No related tags found
No related merge requests found
...@@ -36,14 +36,13 @@ def rmse(R, ms, us): ...@@ -36,14 +36,13 @@ def rmse(R, ms, us):
def update(i, vec, mat, ratings): def update(i, vec, mat, ratings):
uu = mat.shape[0] uu = mat.shape[0]
ff = mat.shape[1] ff = mat.shape[1]
XtX = matrix(np.zeros((ff, ff)))
Xty = np.zeros((ff, 1)) XtX = mat.T * mat
XtY = mat.T * ratings[i, :].T
for j in range(uu):
v = mat[j, :] for j in range(ff):
XtX += v.T * v XtX[j,j] += LAMBDA * uu
Xty += v.T * ratings[i, j]
XtX += np.eye(ff, ff) * LAMBDA * uu
return np.linalg.solve(XtX, Xty) return np.linalg.solve(XtX, Xty)
if __name__ == "__main__": if __name__ == "__main__":
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment