Skip to content
Snippets Groups Projects
Commit a3c7b418 authored by José Antonio's avatar José Antonio Committed by Sean Owen
Browse files

[MLLIB] org.apache.spark.mllib.util.SVMDataGenerator generates...

[MLLIB] org.apache.spark.mllib.util.SVMDataGenerator generates ArrayIndexOutOfBoundsException. I have found the bug and tested the solution.

## What changes were proposed in this pull request?

Just adjust the size of an array in line 58 so it does not cause an ArrayOutOfBoundsException in line 66.

## How was this patch tested?

Manual tests. I have recompiled the entire project with the fix, it has been built successfully and I have run the code, also with good results.

line 66: val yD = blas.ddot(trueWeights.length, x, 1, trueWeights, 1) + rnd.nextGaussian() * 0.1
crashes because trueWeights has length "nfeatures + 1" while "x" has length "features", and they should have the same length.

To fix this just make trueWeights be the same length as x.

I have recompiled the project with the change and it is working now:
[spark-1.6.1]$ spark-submit --master local[*] --class org.apache.spark.mllib.util.SVMDataGenerator mllib/target/spark-mllib_2.11-1.6.1.jar local /home/user/test

And it generates the data successfully now in the specified folder.

Author: José Antonio <joseanmunoz@gmail.com>

Closes #13895 from j4munoz/patch-2.
parent a7d29499
No related branches found
No related tags found
No related merge requests found
......@@ -55,7 +55,7 @@ object SVMDataGenerator {
val sc = new SparkContext(sparkMaster, "SVMGenerator")
val globalRnd = new Random(94720)
val trueWeights = Array.fill[Double](nfeatures + 1)(globalRnd.nextGaussian())
val trueWeights = Array.fill[Double](nfeatures)(globalRnd.nextGaussian())
val data: RDD[LabeledPoint] = sc.parallelize(0 until nexamples, parts).map { idx =>
val rnd = new Random(42 + idx)
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment