Skip to content
Snippets Groups Projects
  1. Jan 25, 2017
    • aokolnychyi's avatar
      [SPARK-16046][DOCS] Aggregations in the Spark SQL programming guide · 3fdce814
      aokolnychyi authored
      ## What changes were proposed in this pull request?
      
      - A separate subsection for Aggregations under “Getting Started” in the Spark SQL programming guide. It mentions which aggregate functions are predefined and how users can create their own.
      - Examples of using the `UserDefinedAggregateFunction` abstract class for untyped aggregations in Java and Scala.
      - Examples of using the `Aggregator` abstract class for type-safe aggregations in Java and Scala.
      - Python is not covered.
      - The PR might not resolve the ticket since I do not know what exactly was planned by the author.
      
      In total, there are four new standalone examples that can be executed via `spark-submit` or `run-example`. The updated Spark SQL programming guide references to these examples and does not contain hard-coded snippets.
      
      ## How was this patch tested?
      
      The patch was tested locally by building the docs. The examples were run as well.
      
      ![image](https://cloud.githubusercontent.com/assets/6235869/21292915/04d9d084-c515-11e6-811a-999d598dffba.png)
      
      Author: aokolnychyi <okolnychyyanton@gmail.com>
      
      Closes #16329 from aokolnychyi/SPARK-16046.
      3fdce814
  2. Jan 18, 2017
  3. Jan 12, 2017
    • Yanbo Liang's avatar
      [SPARK-19158][SPARKR][EXAMPLES] Fix ml.R example fails due to lack of e1071 package. · 2c586f50
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      ```ml.R``` example depends on ```e1071``` package, if it's not available in users' environment, it will fail. I think the example should not depends on third-party packages, so I update it to remove the dependency.
      
      ## How was this patch tested?
      Manual test.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #16548 from yanboliang/spark-19158.
      2c586f50
  4. Jan 10, 2017
    • hyukjinkwon's avatar
      [SPARK-19134][EXAMPLE] Fix several sql, mllib and status api examples not working · b0e5840d
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      **binary_classification_metrics_example.py**
      
      LibSVM datasource loads `ml.linalg.SparseVector` whereas the example requires it to be `mllib.linalg.SparseVector`.  For the equivalent Scala exmaple, `BinaryClassificationMetricsExample.scala` seems fine.
      
      ```
      ./bin/spark-submit examples/src/main/python/mllib/binary_classification_metrics_example.py
      ```
      
      ```
        File ".../spark/examples/src/main/python/mllib/binary_classification_metrics_example.py", line 39, in <lambda>
          .rdd.map(lambda row: LabeledPoint(row[0], row[1]))
        File ".../spark/python/pyspark/mllib/regression.py", line 54, in __init__
          self.features = _convert_to_vector(features)
        File ".../spark/python/pyspark/mllib/linalg/__init__.py", line 80, in _convert_to_vector
          raise TypeError("Cannot convert type %s into Vector" % type(l))
      TypeError: Cannot convert type <class 'pyspark.ml.linalg.SparseVector'> into Vector
      ```
      
      **status_api_demo.py** (this one does not work on Python 3.4.6)
      
      It's `queue` in Python 3+.
      
      ```
      PYSPARK_PYTHON=python3 ./bin/spark-submit examples/src/main/python/status_api_demo.py
      ```
      
      ```
      Traceback (most recent call last):
        File ".../spark/examples/src/main/python/status_api_demo.py", line 22, in <module>
          import Queue
      ImportError: No module named 'Queue'
      ```
      
      **bisecting_k_means_example.py**
      
      `BisectingKMeansModel` does not implement `save` and `load` in Python.
      
      ```bash
      ./bin/spark-submit examples/src/main/python/mllib/bisecting_k_means_example.py
      ```
      
      ```
      Traceback (most recent call last):
        File ".../spark/examples/src/main/python/mllib/bisecting_k_means_example.py", line 46, in <module>
          model.save(sc, path)
      AttributeError: 'BisectingKMeansModel' object has no attribute 'save'
      ```
      
      **elementwise_product_example.py**
      
      It calls `collect` from the vector.
      
      ```bash
      ./bin/spark-submit examples/src/main/python/mllib/elementwise_product_example.py
      ```
      
      ```
      Traceback (most recent call last):
        File ".../spark/examples/src/main/python/mllib/elementwise_product_example.py", line 48, in <module>
          for each in transformedData2.collect():
        File ".../spark/python/pyspark/mllib/linalg/__init__.py", line 478, in __getattr__
          return getattr(self.array, item)
      AttributeError: 'numpy.ndarray' object has no attribute 'collect'
      ```
      
      **These three tests look throwing an exception for a relative path set in `spark.sql.warehouse.dir`.**
      
      **hive.py**
      
      ```
      ./bin/spark-submit examples/src/main/python/sql/hive.py
      ```
      
      ```
      Traceback (most recent call last):
        File ".../spark/examples/src/main/python/sql/hive.py", line 47, in <module>
          spark.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING) USING hive")
        File ".../spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 541, in sql
        File ".../spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
        File ".../spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 69, in deco
      pyspark.sql.utils.AnalysisException: 'org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:./spark-warehouse);'
      ```
      
      **SparkHiveExample.scala**
      
      ```
      ./bin/run-example sql.hive.SparkHiveExample
      ```
      
      ```
      Exception in thread "main" org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:./spark-warehouse
      	at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:498)
      	at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:484)
      	at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1668)
      ```
      
      **JavaSparkHiveExample.java**
      
      ```
      ./bin/run-example sql.hive.JavaSparkHiveExample
      ```
      
      ```
      Exception in thread "main" org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:./spark-warehouse
      	at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:498)
      	at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:484)
      	at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1668)
      ```
      
      ## How was this patch tested?
      
      Manually via
      
      ```
      ./bin/spark-submit examples/src/main/python/mllib/binary_classification_metrics_example.py
      ```
      
      ```
      PYSPARK_PYTHON=python3 ./bin/spark-submit examples/src/main/python/status_api_demo.py
      ```
      
      ```
      ./bin/spark-submit examples/src/main/python/mllib/bisecting_k_means_example.py
      ```
      
      ```
      ./bin/spark-submit examples/src/main/python/mllib/elementwise_product_example.py
      ```
      
      ```
      ./bin/spark-submit examples/src/main/python/sql/hive.py
      ```
      
      ```
      ./bin/run-example sql.hive.JavaSparkHiveExample
      ```
      
      ```
      ./bin/run-example sql.hive.SparkHiveExample
      ```
      
      These were found via
      
      ```bash
      find ./examples/src/main/python -name "*.py" -exec spark-submit {} \;
      ```
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #16515 from HyukjinKwon/minor-example-fix.
      b0e5840d
  5. Jan 05, 2017
  6. Jan 04, 2017
    • Niranjan Padmanabhan's avatar
      [MINOR][DOCS] Remove consecutive duplicated words/typo in Spark Repo · a1e40b1f
      Niranjan Padmanabhan authored
      ## What changes were proposed in this pull request?
      There are many locations in the Spark repo where the same word occurs consecutively. Sometimes they are appropriately placed, but many times they are not. This PR removes the inappropriately duplicated words.
      
      ## How was this patch tested?
      N/A since only docs or comments were updated.
      
      Author: Niranjan Padmanabhan <niranjan.padmanabhan@gmail.com>
      
      Closes #16455 from neurons/np.structure_streaming_doc.
      Unverified
      a1e40b1f
  7. Jan 03, 2017
    • Weiqing Yang's avatar
      [MINOR] Add missing sc.stop() to end of examples · e5c307c5
      Weiqing Yang authored
      ## What changes were proposed in this pull request?
      
      Add `finally` clause for `sc.stop()` in the `test("register and deregister Spark listener from SparkContext")`.
      
      ## How was this patch tested?
      Pass the build and unit tests.
      
      Author: Weiqing Yang <yangweiqing001@gmail.com>
      
      Closes #16426 from weiqingy/testIssue.
      Unverified
      e5c307c5
  8. Jan 02, 2017
    • hyukjinkwon's avatar
      [SPARK-19002][BUILD][PYTHON] Check pep8 against all Python scripts · 46b21260
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      This PR proposes to check pep8 against all other Python scripts and fix the errors as below:
      
      ```bash
      ./dev/create-release/generate-contributors.py
      ./dev/create-release/releaseutils.py
      ./dev/create-release/translate-contributors.py
      ./dev/lint-python
      ./python/docs/epytext.py
      ./examples/src/main/python/mllib/decision_tree_classification_example.py
      ./examples/src/main/python/mllib/decision_tree_regression_example.py
      ./examples/src/main/python/mllib/gradient_boosting_classification_example.py
      ./examples/src/main/python/mllib/gradient_boosting_regression_example.py
      ./examples/src/main/python/mllib/linear_regression_with_sgd_example.py
      ./examples/src/main/python/mllib/logistic_regression_with_lbfgs_example.py
      ./examples/src/main/python/mllib/naive_bayes_example.py
      ./examples/src/main/python/mllib/random_forest_classification_example.py
      ./examples/src/main/python/mllib/random_forest_regression_example.py
      ./examples/src/main/python/mllib/svm_with_sgd_example.py
      ./examples/src/main/python/streaming/network_wordjoinsentiments.py
      ./sql/hive/src/test/resources/data/scripts/cat.py
      ./sql/hive/src/test/resources/data/scripts/cat_error.py
      ./sql/hive/src/test/resources/data/scripts/doubleescapedtab.py
      ./sql/hive/src/test/resources/data/scripts/dumpdata_script.py
      ./sql/hive/src/test/resources/data/scripts/escapedcarriagereturn.py
      ./sql/hive/src/test/resources/data/scripts/escapednewline.py
      ./sql/hive/src/test/resources/data/scripts/escapedtab.py
      ./sql/hive/src/test/resources/data/scripts/input20_script.py
      ./sql/hive/src/test/resources/data/scripts/newline.py
      ```
      
      ## How was this patch tested?
      
      - `./python/docs/epytext.py`
      
        ```bash
        cd ./python/docs $$ make html
        ```
      
      - pep8 check (Python 2.7 / Python 3.3.6)
      
        ```
        ./dev/lint-python
        ```
      
      - `./dev/merge_spark_pr.py` (Python 2.7 only / Python 3.3.6 not working)
      
        ```bash
        python -m doctest -v ./dev/merge_spark_pr.py
        ```
      
      - `./dev/create-release/releaseutils.py` `./dev/create-release/generate-contributors.py` `./dev/create-release/translate-contributors.py` (Python 2.7 only / Python 3.3.6 not working)
      
        ```bash
        python generate-contributors.py
        python translate-contributors.py
        ```
      
      - Examples (Python 2.7 / Python 3.3.6)
      
        ```bash
        ./bin/spark-submit examples/src/main/python/mllib/decision_tree_classification_example.py
        ./bin/spark-submit examples/src/main/python/mllib/decision_tree_regression_example.py
        ./bin/spark-submit examples/src/main/python/mllib/gradient_boosting_classification_example.py
        ./bin/spark-submit examples/src/main/python/mllib/gradient_boosting_regression_example.p
        ./bin/spark-submit examples/src/main/python/mllib/random_forest_classification_example.py
        ./bin/spark-submit examples/src/main/python/mllib/random_forest_regression_example.py
        ```
      
      - Examples (Python 2.7 only / Python 3.3.6 not working)
        ```
        ./bin/spark-submit examples/src/main/python/mllib/linear_regression_with_sgd_example.py
        ./bin/spark-submit examples/src/main/python/mllib/logistic_regression_with_lbfgs_example.py
        ./bin/spark-submit examples/src/main/python/mllib/naive_bayes_example.py
        ./bin/spark-submit examples/src/main/python/mllib/svm_with_sgd_example.py
        ```
      
      - `sql/hive/src/test/resources/data/scripts/*.py` (Python 2.7 / Python 3.3.6 within suggested changes)
      
        Manually tested only changed ones.
      
      - `./dev/github_jira_sync.py` (Python 2.7 only / Python 3.3.6 not working)
      
        Manually tested this after disabling actually adding comments and links.
      
      And also via Jenkins tests.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #16405 from HyukjinKwon/minor-pep8.
      Unverified
      46b21260
  9. Dec 08, 2016
    • Yanbo Liang's avatar
      [SPARK-18325][SPARKR][ML] SparkR ML wrappers example code and user guide · 9bf8f3cd
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      * Add all R examples for ML wrappers which were added during 2.1 release cycle.
      * Split the whole ```ml.R``` example file into individual example for each algorithm, which will be convenient for users to rerun them.
      * Add corresponding examples to ML user guide.
      * Update ML section of SparkR user guide.
      
      Note: MLlib Scala/Java/Python examples will be consistent, however, SparkR examples may different from them, since R users may use the algorithms in a different way, for example, using R ```formula``` to specify ```featuresCol``` and ```labelCol```.
      
      ## How was this patch tested?
      Run all examples manually.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #16148 from yanboliang/spark-18325.
      9bf8f3cd
  10. Dec 07, 2016
    • wm624@hotmail.com's avatar
      [SPARK-18633][ML][EXAMPLE] Add multiclass logistic regression summary python example and document · aad11209
      wm624@hotmail.com authored
      ## What changes were proposed in this pull request?
      Logistic Regression summary is added in Python API. We need to add example and document for summary.
      
      The newly added example is consistent with Scala and Java examples.
      
      ## How was this patch tested?
      
      Manually tests: Run the example with spark-submit; copy & paste code into pyspark; build document and check the document.
      
      Author: wm624@hotmail.com <wm624@hotmail.com>
      
      Closes #16064 from wangmiao1981/py.
      aad11209
  11. Dec 03, 2016
    • Yunni's avatar
      [SPARK-18081][ML][DOCS] Add user guide for Locality Sensitive Hashing(LSH) · 34777184
      Yunni authored
      ## What changes were proposed in this pull request?
      The user guide for LSH is added to ml-features.md, with several scala/java examples in spark-examples.
      
      ## How was this patch tested?
      Doc has been generated through Jekyll, and checked through manual inspection.
      
      Author: Yunni <Euler57721@gmail.com>
      Author: Yun Ni <yunn@uber.com>
      Author: Joseph K. Bradley <joseph@databricks.com>
      Author: Yun Ni <Euler57721@gmail.com>
      
      Closes #15795 from Yunni/SPARK-18081-lsh-guide.
      34777184
  12. Dec 02, 2016
  13. Nov 16, 2016
    • Xianyang Liu's avatar
      [SPARK-18420][BUILD] Fix the errors caused by lint check in Java · 7569cf6c
      Xianyang Liu authored
      ## What changes were proposed in this pull request?
      
      Small fix, fix the errors caused by lint check in Java
      
      - Clear unused objects and `UnusedImports`.
      - Add comments around the method `finalize` of `NioBufferedFileInputStream`to turn off checkstyle.
      - Cut the line which is longer than 100 characters into two lines.
      
      ## How was this patch tested?
      Travis CI.
      ```
      $ build/mvn -T 4 -q -DskipTests -Pyarn -Phadoop-2.3 -Pkinesis-asl -Phive -Phive-thriftserver install
      $ dev/lint-java
      ```
      Before:
      ```
      Checkstyle checks failed at following occurrences:
      [ERROR] src/main/java/org/apache/spark/network/util/TransportConf.java:[21,8] (imports) UnusedImports: Unused import - org.apache.commons.crypto.cipher.CryptoCipherFactory.
      [ERROR] src/test/java/org/apache/spark/network/sasl/SparkSaslSuite.java:[516,5] (modifier) RedundantModifier: Redundant 'public' modifier.
      [ERROR] src/main/java/org/apache/spark/io/NioBufferedFileInputStream.java:[133] (coding) NoFinalizer: Avoid using finalizer method.
      [ERROR] src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeMapData.java:[71] (sizes) LineLength: Line is longer than 100 characters (found 113).
      [ERROR] src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java:[112] (sizes) LineLength: Line is longer than 100 characters (found 110).
      [ERROR] src/test/java/org/apache/spark/sql/catalyst/expressions/HiveHasherSuite.java:[31,17] (modifier) ModifierOrder: 'static' modifier out of order with the JLS suggestions.
      [ERROR]src/main/java/org/apache/spark/examples/ml/JavaLogisticRegressionWithElasticNetExample.java:[64] (sizes) LineLength: Line is longer than 100 characters (found 103).
      [ERROR] src/main/java/org/apache/spark/examples/ml/JavaInteractionExample.java:[22,8] (imports) UnusedImports: Unused import - org.apache.spark.ml.linalg.Vectors.
      [ERROR] src/main/java/org/apache/spark/examples/ml/JavaInteractionExample.java:[51] (regexp) RegexpSingleline: No trailing whitespace allowed.
      ```
      
      After:
      ```
      $ build/mvn -T 4 -q -DskipTests -Pyarn -Phadoop-2.3 -Pkinesis-asl -Phive -Phive-thriftserver install
      $ dev/lint-java
      Using `mvn` from path: /home/travis/build/ConeyLiu/spark/build/apache-maven-3.3.9/bin/mvn
      Checkstyle checks passed.
      ```
      
      Author: Xianyang Liu <xyliu0530@icloud.com>
      
      Closes #15865 from ConeyLiu/master.
      Unverified
      7569cf6c
    • uncleGen's avatar
      [SPARK-18410][STREAMING] Add structured kafka example · e6145772
      uncleGen authored
      ## What changes were proposed in this pull request?
      
      This PR provides structured kafka wordcount examples
      
      ## How was this patch tested?
      
      Author: uncleGen <hustyugm@gmail.com>
      
      Closes #15849 from uncleGen/SPARK-18410.
      Unverified
      e6145772
  14. Nov 15, 2016
    • Zheng RuiFeng's avatar
      [SPARK-18427][DOC] Update docs of mllib.KMeans · 33be4da5
      Zheng RuiFeng authored
      ## What changes were proposed in this pull request?
      1,Remove `runs` from docs of mllib.KMeans
      2,Add notes for `k` according to comments in sources
      ## How was this patch tested?
      existing tests
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #15873 from zhengruifeng/update_doc_mllib_kmeans.
      Unverified
      33be4da5
  15. Nov 10, 2016
  16. Nov 08, 2016
  17. Nov 02, 2016
  18. Oct 28, 2016
    • Jagadeesan's avatar
      [SPARK-18133][EXAMPLES][ML] Python ML Pipeline Example has syntax e… · e9746f87
      Jagadeesan authored
      ## What changes were proposed in this pull request?
      
      In Python 3, there is only one integer type (i.e., int), which mostly behaves like the long type in Python 2. Since Python 3 won't accept "L", so removed "L" in all examples.
      
      ## How was this patch tested?
      
      Unit tests.
      
      …rrors]
      
      Author: Jagadeesan <as2@us.ibm.com>
      
      Closes #15660 from jagadeesanas2/SPARK-18133.
      e9746f87
  19. Oct 26, 2016
    • Xin Ren's avatar
      [SPARK-14300][DOCS][MLLIB] Scala MLlib examples code merge and clean up · dcdda197
      Xin Ren authored
      ## What changes were proposed in this pull request?
      
      https://issues.apache.org/jira/browse/SPARK-14300
      
      Duplicated code found in scala/examples/mllib, below all deleted in this PR:
      
      - DenseGaussianMixture.scala
      - StreamingLinearRegression.scala
      
      ## delete reasons:
      
      #### delete: mllib/DenseGaussianMixture.scala
      
      - duplicate of mllib/GaussianMixtureExample
      
      #### delete: mllib/StreamingLinearRegression.scala
      
      - duplicate of mllib/StreamingLinearRegressionExample
      
      When merging and cleaning those code, be sure not disturb the previous example on and off blocks.
      
      ## How was this patch tested?
      
      Test with `SKIP_API=1 jekyll` manually to make sure that works well.
      
      Author: Xin Ren <iamshrek@126.com>
      
      Closes #12195 from keypointt/SPARK-14300.
      dcdda197
  20. Oct 24, 2016
    • Sean Owen's avatar
      [SPARK-17810][SQL] Default spark.sql.warehouse.dir is relative to local FS but... · 4ecbe1b9
      Sean Owen authored
      [SPARK-17810][SQL] Default spark.sql.warehouse.dir is relative to local FS but can resolve as HDFS path
      
      ## What changes were proposed in this pull request?
      
      Always resolve spark.sql.warehouse.dir as a local path, and as relative to working dir not home dir
      
      ## How was this patch tested?
      
      Existing tests.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #15382 from srowen/SPARK-17810.
      Unverified
      4ecbe1b9
  21. Oct 17, 2016
    • Maxime Rihouey's avatar
      Fix example of tf_idf with minDocFreq · e3bf37fa
      Maxime Rihouey authored
      ## What changes were proposed in this pull request?
      
      The python example for tf_idf with the parameter "minDocFreq" is not properly set up because the same variable is used to transform the document for both with and without the "minDocFreq" parameter.
      The IDF(minDocFreq=2) is stored in the variable "idfIgnore" but then it is the original variable "idf" used to transform the "tf" instead of the "idfIgnore".
      
      ## How was this patch tested?
      
      Before the results for "tfidf" and "tfidfIgnore" were the same:
      tfidf:
      (1048576,[1046921],[3.75828890549])
      (1048576,[1046920],[3.75828890549])
      (1048576,[1046923],[3.75828890549])
      (1048576,[892732],[3.75828890549])
      (1048576,[892733],[3.75828890549])
      (1048576,[892734],[3.75828890549])
      tfidfIgnore:
      (1048576,[1046921],[3.75828890549])
      (1048576,[1046920],[3.75828890549])
      (1048576,[1046923],[3.75828890549])
      (1048576,[892732],[3.75828890549])
      (1048576,[892733],[3.75828890549])
      (1048576,[892734],[3.75828890549])
      
      After the fix those are how they should be:
      tfidf:
      (1048576,[1046921],[3.75828890549])
      (1048576,[1046920],[3.75828890549])
      (1048576,[1046923],[3.75828890549])
      (1048576,[892732],[3.75828890549])
      (1048576,[892733],[3.75828890549])
      (1048576,[892734],[3.75828890549])
      tfidfIgnore:
      (1048576,[1046921],[0.0])
      (1048576,[1046920],[0.0])
      (1048576,[1046923],[0.0])
      (1048576,[892732],[0.0])
      (1048576,[892733],[0.0])
      (1048576,[892734],[0.0])
      
      Author: Maxime Rihouey <maxime.rihouey@gmail.com>
      
      Closes #15503 from maximerihouey/patch-1.
      Unverified
      e3bf37fa
  22. Oct 10, 2016
    • Wenchen Fan's avatar
      [SPARK-17338][SQL] add global temp view · 23ddff4b
      Wenchen Fan authored
      ## What changes were proposed in this pull request?
      
      Global temporary view is a cross-session temporary view, which means it's shared among all sessions. Its lifetime is the lifetime of the Spark application, i.e. it will be automatically dropped when the application terminates. It's tied to a system preserved database `global_temp`(configurable via SparkConf), and we must use the qualified name to refer a global temp view, e.g. SELECT * FROM global_temp.view1.
      
      changes for `SessionCatalog`:
      
      1. add a new field `gloabalTempViews: GlobalTempViewManager`, to access the shared global temp views, and the global temp db name.
      2. `createDatabase` will fail if users wanna create `global_temp`, which is system preserved.
      3. `setCurrentDatabase` will fail if users wanna set `global_temp`, which is system preserved.
      4. add `createGlobalTempView`, which is used in `CreateViewCommand` to create global temp views.
      5. add `dropGlobalTempView`, which is used in `CatalogImpl` to drop global temp view.
      6. add `alterTempViewDefinition`, which is used in `AlterViewAsCommand` to update the view definition for local/global temp views.
      7. `renameTable`/`dropTable`/`isTemporaryTable`/`lookupRelation`/`getTempViewOrPermanentTableMetadata`/`refreshTable` will handle global temp views.
      
      changes for SQL commands:
      
      1. `CreateViewCommand`/`AlterViewAsCommand` is updated to support global temp views
      2. `ShowTablesCommand` outputs a new column `database`, which is used to distinguish global and local temp views.
      3. other commands can also handle global temp views if they call `SessionCatalog` APIs which accepts global temp views, e.g. `DropTableCommand`, `AlterTableRenameCommand`, `ShowColumnsCommand`, etc.
      
      changes for other public API
      
      1. add a new method `dropGlobalTempView` in `Catalog`
      2. `Catalog.findTable` can find global temp view
      3. add a new method `createGlobalTempView` in `Dataset`
      
      ## How was this patch tested?
      
      new tests in `SQLViewSuite`
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #14897 from cloud-fan/global-temp-view.
      23ddff4b
  23. Oct 05, 2016
    • sethah's avatar
      [SPARK-17239][ML][DOC] Update user guide for multiclass logistic regression · 9df54f53
      sethah authored
      ## What changes were proposed in this pull request?
      Updates user guide to reflect that LogisticRegression now supports multiclass. Also adds new examples to show multiclass training.
      
      ## How was this patch tested?
      Ran locally using spark-submit, run-example, and copy/paste from user guide into shells. Generated docs and verified correct output.
      
      Author: sethah <seth.hendrickson16@gmail.com>
      
      Closes #15349 from sethah/SPARK-17239.
      Unverified
      9df54f53
  24. Sep 26, 2016
    • Justin Pihony's avatar
      [SPARK-14525][SQL] Make DataFrameWrite.save work for jdbc · 50b89d05
      Justin Pihony authored
      ## What changes were proposed in this pull request?
      
      This change modifies the implementation of DataFrameWriter.save such that it works with jdbc, and the call to jdbc merely delegates to save.
      
      ## How was this patch tested?
      
      This was tested via unit tests in the JDBCWriteSuite, of which I added one new test to cover this scenario.
      
      ## Additional details
      
      rxin This seems to have been most recently touched by you and was also commented on in the JIRA.
      
      This contribution is my original work and I license the work to the project under the project's open source license.
      
      Author: Justin Pihony <justin.pihony@gmail.com>
      Author: Justin Pihony <justin.pihony@typesafe.com>
      
      Closes #12601 from JustinPihony/jdbc_reconciliation.
      Unverified
      50b89d05
  25. Sep 12, 2016
  26. Sep 03, 2016
  27. Aug 27, 2016
    • Sean Owen's avatar
      [SPARK-17001][ML] Enable standardScaler to standardize sparse vectors when withMean=True · e07baf14
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Allow centering / mean scaling of sparse vectors in StandardScaler, if requested. This is for compatibility with `VectorAssembler` in common usages.
      
      ## How was this patch tested?
      
      Jenkins tests, including new caes to reflect the new behavior.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #14663 from srowen/SPARK-17001.
      e07baf14
  28. Aug 24, 2016
    • Weiqing Yang's avatar
      [MINOR][BUILD] Fix Java CheckStyle Error · 673a80d2
      Weiqing Yang authored
      ## What changes were proposed in this pull request?
      As Spark 2.0.1 will be released soon (mentioned in the spark dev mailing list), besides the critical bugs, it's better to fix the code style errors before the release.
      
      Before:
      ```
      ./dev/lint-java
      Checkstyle checks failed at following occurrences:
      [ERROR] src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java:[525] (sizes) LineLength: Line is longer than 100 characters (found 119).
      [ERROR] src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredNetworkWordCount.java:[64] (sizes) LineLength: Line is longer than 100 characters (found 103).
      ```
      After:
      ```
      ./dev/lint-java
      Using `mvn` from path: /usr/local/bin/mvn
      Checkstyle checks passed.
      ```
      ## How was this patch tested?
      Manual.
      
      Author: Weiqing Yang <yangweiqing001@gmail.com>
      
      Closes #14768 from Sherry302/fixjavastyle.
      673a80d2
  29. Aug 20, 2016
    • wm624@hotmail.com's avatar
      [SPARKR][EXAMPLE] change example APP name · 3e5fdeb3
      wm624@hotmail.com authored
      ## What changes were proposed in this pull request?
      
      (Please fill in changes proposed in this fix)
      
      For R SQL example, appname is "MyApp". While examples in scala, Java and python, the appName is "x Spark SQL basic example".
      
      I made the R example consistent with other examples.
      
      ## How was this patch tested?
      
      (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
      
      Manual test
      (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
      
      Author: wm624@hotmail.com <wm624@hotmail.com>
      
      Closes #14703 from wangmiao1981/example.
      3e5fdeb3
  30. Aug 11, 2016
    • hyukjinkwon's avatar
      [SPARK-16886][EXAMPLES][DOC] Fix some examples to be consistent and indentation in documentation · 7186e8c3
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      Originally this PR was based on #14491 but I realised that fixing examples are more sensible rather than comments.
      
      This PR fixes three things below:
      
       - Fix two wrong examples in `structured-streaming-programming-guide.md`. Loading via `read.load(..)` without `as` will be `Dataset<Row>` not `Dataset<String>` in Java.
      
      - Fix indentation across `structured-streaming-programming-guide.md`. Python has 4 spaces and Scala and Java have double spaces. These are inconsistent across the examples.
      
      - Fix `StructuredNetworkWordCountWindowed` and  `StructuredNetworkWordCount` in Java and Scala to initially load `DataFrame` and `Dataset<Row>` to be consistent with the comments and some examples in `structured-streaming-programming-guide.md` and to match Scala and Java to Python one (Python one loads it as `DataFrame` initially).
      
      ## How was this patch tested?
      
      N/A
      
      Closes https://github.com/apache/spark/pull/14491
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      Author: Ganesh Chand <ganeshchand@Ganeshs-MacBook-Pro-2.local>
      
      Closes #14564 from HyukjinKwon/SPARK-16886.
      7186e8c3
  31. Aug 08, 2016
    • Weiqing Yang's avatar
      [SPARK-16945] Fix Java Lint errors · e10ca8de
      Weiqing Yang authored
      ## What changes were proposed in this pull request?
      This PR is to fix the minor Java linter errors as following:
      [ERROR] src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java:[42,10] (modifier) RedundantModifier: Redundant 'final' modifier.
      [ERROR] src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java:[97,10] (modifier) RedundantModifier: Redundant 'final' modifier.
      
      ## How was this patch tested?
      Manual test.
      dev/lint-java
      Using `mvn` from path: /usr/local/bin/mvn
      Checkstyle checks passed.
      
      Author: Weiqing Yang <yangweiqing001@gmail.com>
      
      Closes #14532 from Sherry302/master.
      e10ca8de
  32. Aug 05, 2016
    • Bryan Cutler's avatar
      [SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs · 180fd3e0
      Bryan Cutler authored
      ## What changes were proposed in this pull request?
      Improve example outputs to better reflect the functionality that is being presented.  This mostly consisted of modifying what was printed at the end of the example, such as calling show() with truncate=False, but sometimes required minor tweaks in the example data to get relevant output.  Explicitly set parameters when they are used as part of the example.  Fixed Java examples that failed to run because of using old-style MLlib Vectors or problem with schema.  Synced examples between different APIs.
      
      ## How was this patch tested?
      Ran each example for Scala, Python, and Java and made sure output was legible on a terminal of width 100.
      
      Author: Bryan Cutler <cutlerb@gmail.com>
      
      Closes #14308 from BryanCutler/ml-examples-improve-output-SPARK-16260.
      180fd3e0
  33. Aug 02, 2016
    • sandy's avatar
      [SPARK-16816] Modify java example which is also reflect in documentation exmaple · cbdff493
      sandy authored
      ## What changes were proposed in this pull request?
      
      Modify java example which is also reflect in document.
      
      ## How was this patch tested?
      
      run test cases.
      
      Author: sandy <phalodi@gmail.com>
      
      Closes #14436 from phalodi/SPARK-16816.
      cbdff493
    • Xusen Yin's avatar
      [SPARK-16558][EXAMPLES][MLLIB] examples/mllib/LDAExample should use MLVector... · dd8514fa
      Xusen Yin authored
      [SPARK-16558][EXAMPLES][MLLIB] examples/mllib/LDAExample should use MLVector instead of MLlib Vector
      
      ## What changes were proposed in this pull request?
      
      mllib.LDAExample uses ML pipeline and MLlib LDA algorithm. The former transforms original data into MLVector format, while the latter uses MLlibVector format.
      
      ## How was this patch tested?
      
      Test manually.
      
      Author: Xusen Yin <yinxusen@gmail.com>
      
      Closes #14212 from yinxusen/SPARK-16558.
      dd8514fa
    • Cheng Lian's avatar
      [SPARK-16734][EXAMPLES][SQL] Revise examples of all language bindings · 10e1c0e6
      Cheng Lian authored
      ## What changes were proposed in this pull request?
      
      This PR makes various minor updates to examples of all language bindings to make sure they are consistent with each other. Some typos and missing parts (JDBC example in Scala/Java/Python) are also fixed.
      
      ## How was this patch tested?
      
      Manually tested.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #14368 from liancheng/revise-examples.
      10e1c0e6
  34. Jul 30, 2016
    • Bryan Cutler's avatar
      [SPARK-16800][EXAMPLES][ML] Fix Java examples that fail to run due to exception · a6290e51
      Bryan Cutler authored
      ## What changes were proposed in this pull request?
      Some Java examples are using mllib.linalg.Vectors instead of ml.linalg.Vectors and causes an exception when run.  Also there are some Java examples that incorrectly specify data types in the schema, also causing an exception.
      
      ## How was this patch tested?
      Ran corrected examples locally
      
      Author: Bryan Cutler <cutlerb@gmail.com>
      
      Closes #14405 from BryanCutler/java-examples-ml.Vectors-fix-SPARK-16800.
      a6290e51
    • Sean Owen's avatar
      [SPARK-16694][CORE] Use for/foreach rather than map for Unit expressions whose... · 0dc4310b
      Sean Owen authored
      [SPARK-16694][CORE] Use for/foreach rather than map for Unit expressions whose side effects are required
      
      ## What changes were proposed in this pull request?
      
      Use foreach/for instead of map where operation requires execution of body, not actually defining a transformation
      
      ## How was this patch tested?
      
      Jenkins
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #14332 from srowen/SPARK-16694.
      0dc4310b
  35. Jul 23, 2016
    • Cheng Lian's avatar
      [SPARK-16380][EXAMPLES] Update SQL examples and programming guide for Python language binding · 53b2456d
      Cheng Lian authored
      This PR is based on PR #14098 authored by wangmiao1981.
      
      ## What changes were proposed in this pull request?
      
      This PR replaces the original Python Spark SQL example file with the following three files:
      
      - `sql/basic.py`
      
        Demonstrates basic Spark SQL features.
      
      - `sql/datasource.py`
      
        Demonstrates various Spark SQL data sources.
      
      - `sql/hive.py`
      
        Demonstrates Spark SQL Hive interaction.
      
      This PR also removes hard-coded Python example snippets in the SQL programming guide by extracting snippets from the above files using the `include_example` Liquid template tag.
      
      ## How was this patch tested?
      
      Manually tested.
      
      Author: wm624@hotmail.com <wm624@hotmail.com>
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #14317 from liancheng/py-examples-update.
      53b2456d
  36. Jul 19, 2016
Loading