examples/src/main/python/sql/hive.py · b0e5840d4b37d7b73e300671795185bba37effb0 · cs525-sp18-g07 / spark

Jan 10, 2017

[SPARK-19134][EXAMPLE] Fix several sql, mllib and status api examples not working · b0e5840d

hyukjinkwon authored Jan 10, 2017

## What changes were proposed in this pull request?

**binary_classification_metrics_example.py**

LibSVM datasource loads `ml.linalg.SparseVector` whereas the example requires it to be `mllib.linalg.SparseVector`.  For the equivalent Scala exmaple, `BinaryClassificationMetricsExample.scala` seems fine.

```
./bin/spark-submit examples/src/main/python/mllib/binary_classification_metrics_example.py
```

```
  File ".../spark/examples/src/main/python/mllib/binary_classification_metrics_example.py", line 39, in <lambda>
    .rdd.map(lambda row: LabeledPoint(row[0], row[1]))
  File ".../spark/python/pyspark/mllib/regression.py", line 54, in __init__
    self.features = _convert_to_vector(features)
  File ".../spark/python/pyspark/mllib/linalg/__init__.py", line 80, in _convert_to_vector
    raise TypeError("Cannot convert type %s into Vector" % type(l))
TypeError: Cannot convert type <class 'pyspark.ml.linalg.SparseVector'> into Vector
```

**status_api_demo.py** (this one does not work on Python 3.4.6)

It's `queue` in Python 3+.

```
PYSPARK_PYTHON=python3 ./bin/spark-submit examples/src/main/python/status_api_demo.py
```

```
Traceback (most recent call last):
  File ".../spark/examples/src/main/python/status_api_demo.py", line 22, in <module>
    import Queue
ImportError: No module named 'Queue'
```

**bisecting_k_means_example.py**

`BisectingKMeansModel` does not implement `save` and `load` in Python.

```bash
./bin/spark-submit examples/src/main/python/mllib/bisecting_k_means_example.py
```

```
Traceback (most recent call last):
  File ".../spark/examples/src/main/python/mllib/bisecting_k_means_example.py", line 46, in <module>
    model.save(sc, path)
AttributeError: 'BisectingKMeansModel' object has no attribute 'save'
```

**elementwise_product_example.py**

It calls `collect` from the vector.

```bash
./bin/spark-submit examples/src/main/python/mllib/elementwise_product_example.py
```

```
Traceback (most recent call last):
  File ".../spark/examples/src/main/python/mllib/elementwise_product_example.py", line 48, in <module>
    for each in transformedData2.collect():
  File ".../spark/python/pyspark/mllib/linalg/__init__.py", line 478, in __getattr__
    return getattr(self.array, item)
AttributeError: 'numpy.ndarray' object has no attribute 'collect'
```

**These three tests look throwing an exception for a relative path set in `spark.sql.warehouse.dir`.**

**hive.py**

```
./bin/spark-submit examples/src/main/python/sql/hive.py
```

```
Traceback (most recent call last):
  File ".../spark/examples/src/main/python/sql/hive.py", line 47, in <module>
    spark.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING) USING hive")
  File ".../spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 541, in sql
  File ".../spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
  File ".../spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 69, in deco
pyspark.sql.utils.AnalysisException: 'org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:./spark-warehouse);'
```

**SparkHiveExample.scala**

```
./bin/run-example sql.hive.SparkHiveExample
```

```
Exception in thread "main" org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:./spark-warehouse
	at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:498)
	at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:484)
	at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1668)
```

**JavaSparkHiveExample.java**

```
./bin/run-example sql.hive.JavaSparkHiveExample
```

```
Exception in thread "main" org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:./spark-warehouse
	at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:498)
	at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:484)
	at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1668)
```

## How was this patch tested?

Manually via

```
./bin/spark-submit examples/src/main/python/mllib/binary_classification_metrics_example.py
```

```
PYSPARK_PYTHON=python3 ./bin/spark-submit examples/src/main/python/status_api_demo.py
```

```
./bin/spark-submit examples/src/main/python/mllib/bisecting_k_means_example.py
```

```
./bin/spark-submit examples/src/main/python/mllib/elementwise_product_example.py
```

```
./bin/spark-submit examples/src/main/python/sql/hive.py
```

```
./bin/run-example sql.hive.JavaSparkHiveExample
```

```
./bin/run-example sql.hive.SparkHiveExample
```

These were found via

```bash
find ./examples/src/main/python -name "*.py" -exec spark-submit {} \;
```

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #16515 from HyukjinKwon/minor-example-fix.

b0e5840d

[SPARK-19134][EXAMPLE] Fix several sql, mllib and status api examples not working

hyukjinkwon authored Jan 10, 2017

## What changes were proposed in this pull request?

**binary_classification_metrics_example.py**

LibSVM datasource loads `ml.linalg.SparseVector` whereas the example requires it to be `mllib.linalg.SparseVector`.  For the equivalent Scala exmaple, `BinaryClassificationMetricsExample.scala` seems fine.

```
./bin/spark-submit examples/src/main/python/mllib/binary_classification_metrics_example.py
```

```
  File ".../spark/examples/src/main/python/mllib/binary_classification_metrics_example.py", line 39, in <lambda>
    .rdd.map(lambda row: LabeledPoint(row[0], row[1]))
  File ".../spark/python/pyspark/mllib/regression.py", line 54, in __init__
    self.features = _convert_to_vector(features)
  File ".../spark/python/pyspark/mllib/linalg/__init__.py", line 80, in _convert_to_vector
    raise TypeError("Cannot convert type %s into Vector" % type(l))
TypeError: Cannot convert type <class 'pyspark.ml.linalg.SparseVector'> into Vector
```

**status_api_demo.py** (this one does not work on Python 3.4.6)

It's `queue` in Python 3+.

```
PYSPARK_PYTHON=python3 ./bin/spark-submit examples/src/main/python/status_api_demo.py
```

```
Traceback (most recent call last):
  File ".../spark/examples/src/main/python/status_api_demo.py", line 22, in <module>
    import Queue
ImportError: No module named 'Queue'
```

**bisecting_k_means_example.py**

`BisectingKMeansModel` does not implement `save` and `load` in Python.

```bash
./bin/spark-submit examples/src/main/python/mllib/bisecting_k_means_example.py
```

```
Traceback (most recent call last):
  File ".../spark/examples/src/main/python/mllib/bisecting_k_means_example.py", line 46, in <module>
    model.save(sc, path)
AttributeError: 'BisectingKMeansModel' object has no attribute 'save'
```

**elementwise_product_example.py**

It calls `collect` from the vector.

```bash
./bin/spark-submit examples/src/main/python/mllib/elementwise_product_example.py
```

```
Traceback (most recent call last):
  File ".../spark/examples/src/main/python/mllib/elementwise_product_example.py", line 48, in <module>
    for each in transformedData2.collect():
  File ".../spark/python/pyspark/mllib/linalg/__init__.py", line 478, in __getattr__
    return getattr(self.array, item)
AttributeError: 'numpy.ndarray' object has no attribute 'collect'
```

**These three tests look throwing an exception for a relative path set in `spark.sql.warehouse.dir`.**

**hive.py**

```
./bin/spark-submit examples/src/main/python/sql/hive.py
```

```
Traceback (most recent call last):
  File ".../spark/examples/src/main/python/sql/hive.py", line 47, in <module>
    spark.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING) USING hive")
  File ".../spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 541, in sql
  File ".../spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
  File ".../spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 69, in deco
pyspark.sql.utils.AnalysisException: 'org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:./spark-warehouse);'
```

**SparkHiveExample.scala**

```
./bin/run-example sql.hive.SparkHiveExample
```

```
Exception in thread "main" org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:./spark-warehouse
	at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:498)
	at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:484)
	at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1668)
```

**JavaSparkHiveExample.java**

```
./bin/run-example sql.hive.JavaSparkHiveExample
```

```
Exception in thread "main" org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:./spark-warehouse
	at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:498)
	at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:484)
	at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1668)
```

## How was this patch tested?

Manually via

```
./bin/spark-submit examples/src/main/python/mllib/binary_classification_metrics_example.py
```

```
PYSPARK_PYTHON=python3 ./bin/spark-submit examples/src/main/python/status_api_demo.py
```

```
./bin/spark-submit examples/src/main/python/mllib/bisecting_k_means_example.py
```

```
./bin/spark-submit examples/src/main/python/mllib/elementwise_product_example.py
```

```
./bin/spark-submit examples/src/main/python/sql/hive.py
```

```
./bin/run-example sql.hive.JavaSparkHiveExample
```

```
./bin/run-example sql.hive.SparkHiveExample
```

These were found via

```bash
find ./examples/src/main/python -name "*.py" -exec spark-submit {} \;
```

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #16515 from HyukjinKwon/minor-example-fix.