Skip to content
Snippets Groups Projects
Commit 738c1074 authored by MechCoder's avatar MechCoder Committed by Xiangrui Meng
Browse files

[SPARK-8823] [MLLIB] [PYSPARK] Optimizations for SparseVector dot products

Follow up for https://github.com/apache/spark/pull/5946

Currently we iterate over indices and values in SparseVector and can be vectorized.

Author: MechCoder <manojkumarsivaraj334@gmail.com>

Closes #7222 from MechCoder/sparse_optim and squashes the following commits:

dcb51d3 [MechCoder] [SPARK-8823] [MLlib] [PySpark] Optimizations for SparseVector dot product
parent 1dbc4a15
No related branches found
No related tags found
No related merge requests found
...@@ -590,18 +590,14 @@ class SparseVector(Vector): ...@@ -590,18 +590,14 @@ class SparseVector(Vector):
return np.dot(other.array[self.indices], self.values) return np.dot(other.array[self.indices], self.values)
elif isinstance(other, SparseVector): elif isinstance(other, SparseVector):
result = 0.0 # Find out common indices.
i, j = 0, 0 self_cmind = np.in1d(self.indices, other.indices, assume_unique=True)
while i < len(self.indices) and j < len(other.indices): self_values = self.values[self_cmind]
if self.indices[i] == other.indices[j]: if self_values.size == 0:
result += self.values[i] * other.values[j] return 0.0
i += 1 else:
j += 1 other_cmind = np.in1d(other.indices, self.indices, assume_unique=True)
elif self.indices[i] < other.indices[j]: return np.dot(self_values, other.values[other_cmind])
i += 1
else:
j += 1
return result
else: else:
return self.dot(_convert_to_vector(other)) return self.dot(_convert_to_vector(other))
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment