Skip to content
Snippets Groups Projects
Commit e9746f87 authored by Jagadeesan's avatar Jagadeesan Committed by Yanbo Liang
Browse files

[SPARK-18133][EXAMPLES][ML] Python ML Pipeline Example has syntax e…

## What changes were proposed in this pull request?

In Python 3, there is only one integer type (i.e., int), which mostly behaves like the long type in Python 2. Since Python 3 won't accept "L", so removed "L" in all examples.

## How was this patch tested?

Unit tests.

…rrors]

Author: Jagadeesan <as2@us.ibm.com>

Closes #15660 from jagadeesanas2/SPARK-18133.
parent 569788a5
No related branches found
No related tags found
No related merge requests found
......@@ -84,10 +84,10 @@ if __name__ == "__main__":
# Prepare test documents, which are unlabeled.
test = spark.createDataFrame([
(4L, "spark i j k"),
(5L, "l m n"),
(6L, "mapreduce spark"),
(7L, "apache hadoop")
(4, "spark i j k"),
(5, "l m n"),
(6, "mapreduce spark"),
(7, "apache hadoop")
], ["id", "text"])
# Make predictions on test documents. cvModel uses the best model found (lrModel).
......
......@@ -38,7 +38,7 @@ if __name__ == "__main__":
# loads data
dataset = spark.read.format("libsvm").load("data/mllib/sample_kmeans_data.txt")
gmm = GaussianMixture().setK(2).setSeed(538009335L)
gmm = GaussianMixture().setK(2).setSeed(538009335)
model = gmm.fit(dataset)
print("Gaussians shown as a DataFrame: ")
......
......@@ -35,10 +35,10 @@ if __name__ == "__main__":
# $example on$
# Prepare training documents from a list of (id, text, label) tuples.
training = spark.createDataFrame([
(0L, "a b c d e spark", 1.0),
(1L, "b d", 0.0),
(2L, "spark f g h", 1.0),
(3L, "hadoop mapreduce", 0.0)
(0, "a b c d e spark", 1.0),
(1, "b d", 0.0),
(2, "spark f g h", 1.0),
(3, "hadoop mapreduce", 0.0)
], ["id", "text", "label"])
# Configure an ML pipeline, which consists of three stages: tokenizer, hashingTF, and lr.
......@@ -52,10 +52,10 @@ if __name__ == "__main__":
# Prepare test documents, which are unlabeled (id, text) tuples.
test = spark.createDataFrame([
(4L, "spark i j k"),
(5L, "l m n"),
(6L, "spark hadoop spark"),
(7L, "apache hadoop")
(4, "spark i j k"),
(5, "l m n"),
(6, "spark hadoop spark"),
(7, "apache hadoop")
], ["id", "text"])
# Make predictions on test documents and print columns of interest.
......
......@@ -39,7 +39,7 @@ if __name__ == "__main__":
.rdd.map(lambda row: LabeledPoint(row[0], row[1]))
# Split data into training (60%) and test (40%)
training, test = data.randomSplit([0.6, 0.4], seed=11L)
training, test = data.randomSplit([0.6, 0.4], seed=11)
training.cache()
# Run training algorithm to build the model
......
......@@ -32,7 +32,7 @@ if __name__ == "__main__":
data = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_multiclass_classification_data.txt")
# Split data into training (60%) and test (40%)
training, test = data.randomSplit([0.6, 0.4], seed=11L)
training, test = data.randomSplit([0.6, 0.4], seed=11)
training.cache()
# Run training algorithm to build the model
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment