Skip to content
Snippets Groups Projects
  1. May 03, 2017
    • MechCoder's avatar
      [SPARK-6227][MLLIB][PYSPARK] Implement PySpark wrappers for SVD and PCA (v2) · db2fb84b
      MechCoder authored
      Add PCA and SVD to PySpark's wrappers for `RowMatrix` and `IndexedRowMatrix` (SVD only).
      
      Based on #7963, updated.
      
      ## How was this patch tested?
      
      New doc tests and unit tests. Ran all examples locally.
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      Author: Nick Pentreath <nickp@za.ibm.com>
      
      Closes #17621 from MLnick/SPARK-6227-pyspark-svd-pca.
      db2fb84b
  2. Apr 29, 2017
    • Yuhao Yang's avatar
      [SPARK-19791][ML] Add doc and example for fpgrowth · add9d1bb
      Yuhao Yang authored
      ## What changes were proposed in this pull request?
      
      Add a new section for fpm
      Add Example for FPGrowth in scala and Java
      
      updated: Rewrite transform to be more compact.
      
      ## How was this patch tested?
      
      local doc generation.
      
      Author: Yuhao Yang <yuhao.yang@intel.com>
      
      Closes #17130 from hhbyyh/fpmdoc.
      add9d1bb
  3. Apr 24, 2017
  4. Apr 18, 2017
    • zero323's avatar
      [SPARK-20208][R][DOCS] Document R fpGrowth support · 702d85af
      zero323 authored
      ## What changes were proposed in this pull request?
      
      Document  fpGrowth in:
      
      - vignettes
      - programming guide
      - code example
      
      ## How was this patch tested?
      
      Manual tests.
      
      Author: zero323 <zero323@users.noreply.github.com>
      
      Closes #17557 from zero323/SPARK-20208.
      702d85af
    • Tathagata Das's avatar
      [SPARK-20377][SS] Fix JavaStructuredSessionization example · 74aa0df8
      Tathagata Das authored
      ## What changes were proposed in this pull request?
      
      Extra accessors in java bean class causes incorrect encoder generation, which corrupted the state when using timeouts.
      
      ## How was this patch tested?
      manually ran the example
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #17676 from tdas/SPARK-20377.
      74aa0df8
  5. Apr 12, 2017
    • hyukjinkwon's avatar
      [MINOR][DOCS] JSON APIs related documentation fixes · bca4259f
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      This PR proposes corrections related to JSON APIs as below:
      
      - Rendering links in Python documentation
      - Replacing `RDD` to `Dataset` in programing guide
      - Adding missing description about JSON Lines consistently in `DataFrameReader.json` in Python API
      - De-duplicating little bit of `DataFrameReader.json` in Scala/Java API
      
      ## How was this patch tested?
      
      Manually build the documentation via `jekyll build`. Corresponding snapstops will be left on the codes.
      
      Note that currently there are Javadoc8 breaks in several places. These are proposed to be handled in https://github.com/apache/spark/pull/17477. So, this PR does not fix those.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #17602 from HyukjinKwon/minor-json-documentation.
      bca4259f
  6. Apr 10, 2017
    • Sean Owen's avatar
      [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java String toLowerCase "Turkish... · a26e3ed5
      Sean Owen authored
      [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java String toLowerCase "Turkish locale bug" causes Spark problems
      
      ## What changes were proposed in this pull request?
      
      Add Locale.ROOT to internal calls to String `toLowerCase`, `toUpperCase`, to avoid inadvertent locale-sensitive variation in behavior (aka the "Turkish locale problem").
      
      The change looks large but it is just adding `Locale.ROOT` (the locale with no country or language specified) to every call to these methods.
      
      ## How was this patch tested?
      
      Existing tests.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #17527 from srowen/SPARK-20156.
      a26e3ed5
  7. Apr 09, 2017
  8. Apr 07, 2017
    • actuaryzhang's avatar
      [SPARK-20258][DOC][SPARKR] Fix SparkR logistic regression example in... · 1ad73f0a
      actuaryzhang authored
      [SPARK-20258][DOC][SPARKR] Fix SparkR logistic regression example in programming guide (did not converge)
      
      ## What changes were proposed in this pull request?
      
      SparkR logistic regression example did not converge in programming guide (for IRWLS). All estimates are essentially zero:
      
      ```
      training2 <- read.df("data/mllib/sample_binary_classification_data.txt", source = "libsvm")
      df_list2 <- randomSplit(training2, c(7,3), 2)
      binomialDF <- df_list2[[1]]
      binomialTestDF <- df_list2[[2]]
      binomialGLM <- spark.glm(binomialDF, label ~ features, family = "binomial")
      
      17/04/07 11:42:03 WARN WeightedLeastSquares: Cholesky solver failed due to singular covariance matrix. Retrying with Quasi-Newton solver.
      
      > summary(binomialGLM)
      
      Coefficients:
                       Estimate
      (Intercept)    9.0255e+00
      features_0     0.0000e+00
      features_1     0.0000e+00
      features_2     0.0000e+00
      features_3     0.0000e+00
      features_4     0.0000e+00
      features_5     0.0000e+00
      features_6     0.0000e+00
      features_7     0.0000e+00
      ```
      
      Author: actuaryzhang <actuaryzhang10@gmail.com>
      
      Closes #17571 from actuaryzhang/programGuide2.
      1ad73f0a
    • actuaryzhang's avatar
      [SPARK-20026][DOC][SPARKR] Add Tweedie example for SparkR in programming guide · 870b9d9a
      actuaryzhang authored
      ## What changes were proposed in this pull request?
      Add Tweedie example for SparkR in programming guide.
      The doc was already updated in #17103.
      
      Author: actuaryzhang <actuaryzhang10@gmail.com>
      
      Closes #17553 from actuaryzhang/programGuide.
      870b9d9a
  9. Apr 06, 2017
  10. Apr 05, 2017
  11. Apr 03, 2017
    • Yuhao Yang's avatar
      [SPARK-19969][ML] Imputer doc and example · 4d28e843
      Yuhao Yang authored
      ## What changes were proposed in this pull request?
      
      Add docs and examples for spark.ml.feature.Imputer. Currently scala and Java examples are included. Python example will be added after https://github.com/apache/spark/pull/17316
      
      ## How was this patch tested?
      
      local doc generation and example execution
      
      Author: Yuhao Yang <yuhao.yang@intel.com>
      
      Closes #17324 from hhbyyh/imputerdoc.
      4d28e843
  12. Mar 30, 2017
    • Jacek Laskowski's avatar
      [DOCS] Docs-only improvements · 0197262a
      Jacek Laskowski authored
      …adoc
      
      ## What changes were proposed in this pull request?
      
      Use recommended values for row boundaries in Window's scaladoc, i.e. `Window.unboundedPreceding`, `Window.unboundedFollowing`, and `Window.currentRow` (that were introduced in 2.1.0).
      
      ## How was this patch tested?
      
      Local build
      
      Author: Jacek Laskowski <jacek@japila.pl>
      
      Closes #17417 from jaceklaskowski/window-expression-scaladoc.
      0197262a
  13. Mar 29, 2017
    • wm624@hotmail.com's avatar
      [MINOR][SPARKR] Add run command comment in examples · 471de5db
      wm624@hotmail.com authored
      ## What changes were proposed in this pull request?
      
      There are two examples in r folder missing the run commands.
      
      In this PR, I just add the missing comment, which is consistent with other examples.
      
      ## How was this patch tested?
      
      Manual test.
      
      Author: wm624@hotmail.com <wm624@hotmail.com>
      
      Closes #17474 from wangmiao1981/stat.
      471de5db
  14. Mar 23, 2017
    • sureshthalamati's avatar
      [SPARK-10849][SQL] Adds option to the JDBC data source write for user to... · c7911807
      sureshthalamati authored
      [SPARK-10849][SQL] Adds option to the JDBC data source write for user to specify database column type for the create table
      
      ## What changes were proposed in this pull request?
      Currently JDBC data source creates tables in the target database using the default type mapping, and the JDBC dialect mechanism.  If users want to specify different database data type for only some of columns, there is no option available. In scenarios where default mapping does not work, users are forced to create tables on the target database before writing. This workaround is probably not acceptable from a usability point of view. This PR is to provide a user-defined type mapping for specific columns.
      
      The solution is to allow users to specify database column data type for the create table  as JDBC datasource option(createTableColumnTypes) on write. Data type information can be specified in the same format as table schema DDL format (e.g: `name CHAR(64), comments VARCHAR(1024)`).
      
      All supported target database types can not be specified ,  the data types has to be valid spark sql data types also.  For example user can not specify target database  CLOB data type. This will be supported in the follow-up PR.
      
      Example:
      ```Scala
      df.write
      .option("createTableColumnTypes", "name CHAR(64), comments VARCHAR(1024)")
      .jdbc(url, "TEST.DBCOLTYPETEST", properties)
      ```
      ## How was this patch tested?
      Added new test cases to the JDBCWriteSuite
      
      Author: sureshthalamati <suresh.thalamati@gmail.com>
      
      Closes #16209 from sureshthalamati/jdbc_custom_dbtype_option_json-spark-10849.
      c7911807
  15. Mar 02, 2017
    • Nick Pentreath's avatar
      [SPARK-19345][ML][DOC] Add doc for "coldStartStrategy" usage in ALS · 9cca3dbf
      Nick Pentreath authored
      [SPARK-14489](https://issues.apache.org/jira/browse/SPARK-14489) added the ability to skip `NaN` predictions during `ALSModel.transform`. This PR adds documentation for the `coldStartStrategy` param to the ALS user guide, and add code to the examples to illustrate usage.
      
      ## How was this patch tested?
      
      Doc and example change only. Build HTML doc locally and verified example code builds, and runs in shell for Scala/Python.
      
      Author: Nick Pentreath <nickp@za.ibm.com>
      
      Closes #17102 from MLnick/SPARK-19345-coldstart-doc.
      9cca3dbf
    • Yun Ni's avatar
      [MINOR][ML] Fix comments in LSH Examples and Python API · 3bd8ddf7
      Yun Ni authored
      ## What changes were proposed in this pull request?
      Remove `org.apache.spark.examples.` in
      Add slash in one of the python doc.
      
      ## How was this patch tested?
      Run examples using the commands in the comments.
      
      Author: Yun Ni <yunn@uber.com>
      
      Closes #17104 from Yunni/yunn_minor.
      3bd8ddf7
  16. Mar 01, 2017
    • wm624@hotmail.com's avatar
      [SPARK-19460][SPARKR] Update dataset used in R documentation, examples to... · 89cd3845
      wm624@hotmail.com authored
      [SPARK-19460][SPARKR] Update dataset used in R documentation, examples to reduce warning noise and confusions
      
      ## What changes were proposed in this pull request?
      
      Replace `iris` dataset with `Titanic` or other dataset in example and document.
      
      ## How was this patch tested?
      
      Manual and existing test
      
      Author: wm624@hotmail.com <wm624@hotmail.com>
      
      Closes #17032 from wangmiao1981/example.
      89cd3845
  17. Feb 27, 2017
    • hyukjinkwon's avatar
      [SPARK-15615][SQL][BUILD][FOLLOW-UP] Replace deprecated usage of json(RDD[String]) API · 8a5a5850
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      This PR proposes to replace the deprecated `json(RDD[String])` usage to `json(Dataset[String])`.
      
      This currently produces so many warnings.
      
      ## How was this patch tested?
      
      Fixed tests.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #17071 from HyukjinKwon/SPARK-15615-followup.
      8a5a5850
    • hyukjinkwon's avatar
      [MINOR][BUILD] Fix lint-java breaks in Java · 4ba9c6c4
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      This PR proposes to fix the lint-breaks as below:
      
      ```
      [ERROR] src/test/java/org/apache/spark/network/TransportResponseHandlerSuite.java:[29,8] (imports) UnusedImports: Unused import - org.apache.spark.network.buffer.ManagedBuffer.
      [ERROR] src/main/java/org/apache/spark/unsafe/types/UTF8String.java:[156,10] (modifier) ModifierOrder: 'Nonnull' annotation modifier does not precede non-annotation modifiers.
      [ERROR] src/main/java/org/apache/spark/SparkFirehoseListener.java:[122] (sizes) LineLength: Line is longer than 100 characters (found 105).
      [ERROR] src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java:[164,78] (coding) OneStatementPerLine: Only one statement per line allowed.
      [ERROR] src/test/java/test/org/apache/spark/JavaAPISuite.java:[1157] (sizes) LineLength: Line is longer than 100 characters (found 121).
      [ERROR] src/test/java/org/apache/spark/streaming/JavaMapWithStateSuite.java:[149] (sizes) LineLength: Line is longer than 100 characters (found 113).
      [ERROR] src/test/java/test/org/apache/spark/streaming/Java8APISuite.java:[146] (sizes) LineLength: Line is longer than 100 characters (found 122).
      [ERROR] src/test/java/test/org/apache/spark/streaming/JavaAPISuite.java:[32,8] (imports) UnusedImports: Unused import - org.apache.spark.streaming.Time.
      [ERROR] src/test/java/test/org/apache/spark/streaming/JavaAPISuite.java:[611] (sizes) LineLength: Line is longer than 100 characters (found 101).
      [ERROR] src/test/java/test/org/apache/spark/streaming/JavaAPISuite.java:[1317] (sizes) LineLength: Line is longer than 100 characters (found 102).
      [ERROR] src/test/java/test/org/apache/spark/sql/JavaDatasetAggregatorSuite.java:[91] (sizes) LineLength: Line is longer than 100 characters (found 102).
      [ERROR] src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java:[113] (sizes) LineLength: Line is longer than 100 characters (found 101).
      [ERROR] src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java:[164] (sizes) LineLength: Line is longer than 100 characters (found 110).
      [ERROR] src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java:[212] (sizes) LineLength: Line is longer than 100 characters (found 114).
      [ERROR] src/test/java/org/apache/spark/mllib/tree/JavaDecisionTreeSuite.java:[36] (sizes) LineLength: Line is longer than 100 characters (found 101).
      [ERROR] src/main/java/org/apache/spark/examples/streaming/JavaKinesisWordCountASL.java:[26,8] (imports) UnusedImports: Unused import - com.amazonaws.regions.RegionUtils.
      [ERROR] src/test/java/org/apache/spark/streaming/kinesis/JavaKinesisStreamSuite.java:[20,8] (imports) UnusedImports: Unused import - com.amazonaws.regions.RegionUtils.
      [ERROR] src/test/java/org/apache/spark/streaming/kinesis/JavaKinesisStreamSuite.java:[94] (sizes) LineLength: Line is longer than 100 characters (found 103).
      [ERROR] src/main/java/org/apache/spark/examples/ml/JavaTokenizerExample.java:[30,8] (imports) UnusedImports: Unused import - org.apache.spark.sql.api.java.UDF1.
      [ERROR] src/main/java/org/apache/spark/examples/ml/JavaTokenizerExample.java:[72] (sizes) LineLength: Line is longer than 100 characters (found 104).
      [ERROR] src/main/java/org/apache/spark/examples/mllib/JavaRankingMetricsExample.java:[121] (sizes) LineLength: Line is longer than 100 characters (found 101).
      [ERROR] src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java:[28,8] (imports) UnusedImports: Unused import - org.apache.spark.api.java.JavaRDD.
      [ERROR] src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java:[29,8] (imports) UnusedImports: Unused import - org.apache.spark.api.java.JavaSparkContext.
      ```
      
      ## How was this patch tested?
      
      Manually via
      
      ```bash
      ./dev/lint-java
      ```
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #17072 from HyukjinKwon/java-lint.
      4ba9c6c4
  18. Feb 25, 2017
    • Boaz Mohar's avatar
      [MINOR][DOCS] Fixes two problems in the SQL programing guide page · 061bcfb8
      Boaz Mohar authored
      ## What changes were proposed in this pull request?
      
      Removed duplicated lines in sql python example and found a typo.
      
      ## How was this patch tested?
      
      Searched for other typo's in the page to minimize PR's.
      
      Author: Boaz Mohar <boazmohar@gmail.com>
      
      Closes #17066 from boazmohar/doc-fix.
      061bcfb8
  19. Feb 21, 2017
  20. Feb 19, 2017
  21. Feb 17, 2017
  22. Feb 16, 2017
    • Sean Owen's avatar
      [SPARK-19550][BUILD][CORE][WIP] Remove Java 7 support · 0e240549
      Sean Owen authored
      - Move external/java8-tests tests into core, streaming, sql and remove
      - Remove MaxPermGen and related options
      - Fix some reflection / TODOs around Java 8+ methods
      - Update doc references to 1.7/1.8 differences
      - Remove Java 7/8 related build profiles
      - Update some plugins for better Java 8 compatibility
      - Fix a few Java-related warnings
      
      For the future:
      
      - Update Java 8 examples to fully use Java 8
      - Update Java tests to use lambdas for simplicity
      - Update Java internal implementations to use lambdas
      
      ## How was this patch tested?
      
      Existing tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #16871 from srowen/SPARK-19493.
      Unverified
      0e240549
  23. Feb 15, 2017
    • Yun Ni's avatar
      [SPARK-18080][ML][PYTHON] Python API & Examples for Locality Sensitive Hashing · 08c1972a
      Yun Ni authored
      ## What changes were proposed in this pull request?
      This pull request includes python API and examples for LSH. The API changes was based on yanboliang 's PR #15768 and resolved conflicts and API changes on the Scala API. The examples are consistent with Scala examples of MinHashLSH and BucketedRandomProjectionLSH.
      
      ## How was this patch tested?
      API and examples are tested using spark-submit:
      `bin/spark-submit examples/src/main/python/ml/min_hash_lsh.py`
      `bin/spark-submit examples/src/main/python/ml/bucketed_random_projection_lsh.py`
      
      User guide changes are generated and manually inspected:
      `SKIP_API=1 jekyll build`
      
      Author: Yun Ni <yunn@uber.com>
      Author: Yanbo Liang <ybliang8@gmail.com>
      Author: Yunni <Euler57721@gmail.com>
      
      Closes #16715 from Yunni/spark-18080.
      08c1972a
  24. Feb 13, 2017
    • Xiao Li's avatar
      [SPARK-19574][ML][DOCUMENTATION] Fix Liquid Exception: Start indices amount is... · 855a1b75
      Xiao Li authored
      [SPARK-19574][ML][DOCUMENTATION] Fix Liquid Exception: Start indices amount is not equal to end indices amount
      
      ### What changes were proposed in this pull request?
      ```
      Liquid Exception: Start indices amount is not equal to end indices amount, see /Users/xiao/IdeaProjects/sparkDelivery/docs/../examples/src/main/java/org/apache/spark/examples/ml/JavaTokenizerExample.java. in ml-features.md
      ```
      
      So far, the build is broken after merging https://github.com/apache/spark/pull/16789
      
      This PR is to fix it.
      
      ## How was this patch tested?
      Manual
      
      Author: Xiao Li <gatorsmile@gmail.com>
      
      Closes #16908 from gatorsmile/docMLFix.
      Unverified
      855a1b75
  25. Feb 07, 2017
    • Aseem Bansal's avatar
      [SPARK-19444][ML][DOCUMENTATION] Fix imports not being present in documentation · aee2bd2c
      Aseem Bansal authored
      ## What changes were proposed in this pull request?
      
      SPARK-19444 imports not being present in documentation
      
      ## How was this patch tested?
      
      Manual
      
      ## Disclaimer
      
      Contribution is original work and I license the work to the project under the project’s open source license
      
      Author: Aseem Bansal <anshbansal@users.noreply.github.com>
      
      Closes #16789 from anshbansal/patch-1.
      Unverified
      aee2bd2c
  26. Feb 03, 2017
  27. Jan 25, 2017
    • aokolnychyi's avatar
      [SPARK-16046][DOCS] Aggregations in the Spark SQL programming guide · 3fdce814
      aokolnychyi authored
      ## What changes were proposed in this pull request?
      
      - A separate subsection for Aggregations under “Getting Started” in the Spark SQL programming guide. It mentions which aggregate functions are predefined and how users can create their own.
      - Examples of using the `UserDefinedAggregateFunction` abstract class for untyped aggregations in Java and Scala.
      - Examples of using the `Aggregator` abstract class for type-safe aggregations in Java and Scala.
      - Python is not covered.
      - The PR might not resolve the ticket since I do not know what exactly was planned by the author.
      
      In total, there are four new standalone examples that can be executed via `spark-submit` or `run-example`. The updated Spark SQL programming guide references to these examples and does not contain hard-coded snippets.
      
      ## How was this patch tested?
      
      The patch was tested locally by building the docs. The examples were run as well.
      
      ![image](https://cloud.githubusercontent.com/assets/6235869/21292915/04d9d084-c515-11e6-811a-999d598dffba.png)
      
      Author: aokolnychyi <okolnychyyanton@gmail.com>
      
      Closes #16329 from aokolnychyi/SPARK-16046.
      3fdce814
  28. Jan 18, 2017
  29. Jan 12, 2017
    • Yanbo Liang's avatar
      [SPARK-19158][SPARKR][EXAMPLES] Fix ml.R example fails due to lack of e1071 package. · 2c586f50
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      ```ml.R``` example depends on ```e1071``` package, if it's not available in users' environment, it will fail. I think the example should not depends on third-party packages, so I update it to remove the dependency.
      
      ## How was this patch tested?
      Manual test.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #16548 from yanboliang/spark-19158.
      2c586f50
  30. Jan 10, 2017
    • hyukjinkwon's avatar
      [SPARK-19134][EXAMPLE] Fix several sql, mllib and status api examples not working · b0e5840d
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      **binary_classification_metrics_example.py**
      
      LibSVM datasource loads `ml.linalg.SparseVector` whereas the example requires it to be `mllib.linalg.SparseVector`.  For the equivalent Scala exmaple, `BinaryClassificationMetricsExample.scala` seems fine.
      
      ```
      ./bin/spark-submit examples/src/main/python/mllib/binary_classification_metrics_example.py
      ```
      
      ```
        File ".../spark/examples/src/main/python/mllib/binary_classification_metrics_example.py", line 39, in <lambda>
          .rdd.map(lambda row: LabeledPoint(row[0], row[1]))
        File ".../spark/python/pyspark/mllib/regression.py", line 54, in __init__
          self.features = _convert_to_vector(features)
        File ".../spark/python/pyspark/mllib/linalg/__init__.py", line 80, in _convert_to_vector
          raise TypeError("Cannot convert type %s into Vector" % type(l))
      TypeError: Cannot convert type <class 'pyspark.ml.linalg.SparseVector'> into Vector
      ```
      
      **status_api_demo.py** (this one does not work on Python 3.4.6)
      
      It's `queue` in Python 3+.
      
      ```
      PYSPARK_PYTHON=python3 ./bin/spark-submit examples/src/main/python/status_api_demo.py
      ```
      
      ```
      Traceback (most recent call last):
        File ".../spark/examples/src/main/python/status_api_demo.py", line 22, in <module>
          import Queue
      ImportError: No module named 'Queue'
      ```
      
      **bisecting_k_means_example.py**
      
      `BisectingKMeansModel` does not implement `save` and `load` in Python.
      
      ```bash
      ./bin/spark-submit examples/src/main/python/mllib/bisecting_k_means_example.py
      ```
      
      ```
      Traceback (most recent call last):
        File ".../spark/examples/src/main/python/mllib/bisecting_k_means_example.py", line 46, in <module>
          model.save(sc, path)
      AttributeError: 'BisectingKMeansModel' object has no attribute 'save'
      ```
      
      **elementwise_product_example.py**
      
      It calls `collect` from the vector.
      
      ```bash
      ./bin/spark-submit examples/src/main/python/mllib/elementwise_product_example.py
      ```
      
      ```
      Traceback (most recent call last):
        File ".../spark/examples/src/main/python/mllib/elementwise_product_example.py", line 48, in <module>
          for each in transformedData2.collect():
        File ".../spark/python/pyspark/mllib/linalg/__init__.py", line 478, in __getattr__
          return getattr(self.array, item)
      AttributeError: 'numpy.ndarray' object has no attribute 'collect'
      ```
      
      **These three tests look throwing an exception for a relative path set in `spark.sql.warehouse.dir`.**
      
      **hive.py**
      
      ```
      ./bin/spark-submit examples/src/main/python/sql/hive.py
      ```
      
      ```
      Traceback (most recent call last):
        File ".../spark/examples/src/main/python/sql/hive.py", line 47, in <module>
          spark.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING) USING hive")
        File ".../spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 541, in sql
        File ".../spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
        File ".../spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 69, in deco
      pyspark.sql.utils.AnalysisException: 'org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:./spark-warehouse);'
      ```
      
      **SparkHiveExample.scala**
      
      ```
      ./bin/run-example sql.hive.SparkHiveExample
      ```
      
      ```
      Exception in thread "main" org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:./spark-warehouse
      	at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:498)
      	at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:484)
      	at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1668)
      ```
      
      **JavaSparkHiveExample.java**
      
      ```
      ./bin/run-example sql.hive.JavaSparkHiveExample
      ```
      
      ```
      Exception in thread "main" org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:./spark-warehouse
      	at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:498)
      	at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:484)
      	at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1668)
      ```
      
      ## How was this patch tested?
      
      Manually via
      
      ```
      ./bin/spark-submit examples/src/main/python/mllib/binary_classification_metrics_example.py
      ```
      
      ```
      PYSPARK_PYTHON=python3 ./bin/spark-submit examples/src/main/python/status_api_demo.py
      ```
      
      ```
      ./bin/spark-submit examples/src/main/python/mllib/bisecting_k_means_example.py
      ```
      
      ```
      ./bin/spark-submit examples/src/main/python/mllib/elementwise_product_example.py
      ```
      
      ```
      ./bin/spark-submit examples/src/main/python/sql/hive.py
      ```
      
      ```
      ./bin/run-example sql.hive.JavaSparkHiveExample
      ```
      
      ```
      ./bin/run-example sql.hive.SparkHiveExample
      ```
      
      These were found via
      
      ```bash
      find ./examples/src/main/python -name "*.py" -exec spark-submit {} \;
      ```
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #16515 from HyukjinKwon/minor-example-fix.
      b0e5840d
  31. Jan 05, 2017
  32. Jan 04, 2017
    • Niranjan Padmanabhan's avatar
      [MINOR][DOCS] Remove consecutive duplicated words/typo in Spark Repo · a1e40b1f
      Niranjan Padmanabhan authored
      ## What changes were proposed in this pull request?
      There are many locations in the Spark repo where the same word occurs consecutively. Sometimes they are appropriately placed, but many times they are not. This PR removes the inappropriately duplicated words.
      
      ## How was this patch tested?
      N/A since only docs or comments were updated.
      
      Author: Niranjan Padmanabhan <niranjan.padmanabhan@gmail.com>
      
      Closes #16455 from neurons/np.structure_streaming_doc.
      Unverified
      a1e40b1f
  33. Jan 03, 2017
    • Weiqing Yang's avatar
      [MINOR] Add missing sc.stop() to end of examples · e5c307c5
      Weiqing Yang authored
      ## What changes were proposed in this pull request?
      
      Add `finally` clause for `sc.stop()` in the `test("register and deregister Spark listener from SparkContext")`.
      
      ## How was this patch tested?
      Pass the build and unit tests.
      
      Author: Weiqing Yang <yangweiqing001@gmail.com>
      
      Closes #16426 from weiqingy/testIssue.
      Unverified
      e5c307c5
  34. Jan 02, 2017
    • hyukjinkwon's avatar
      [SPARK-19002][BUILD][PYTHON] Check pep8 against all Python scripts · 46b21260
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      This PR proposes to check pep8 against all other Python scripts and fix the errors as below:
      
      ```bash
      ./dev/create-release/generate-contributors.py
      ./dev/create-release/releaseutils.py
      ./dev/create-release/translate-contributors.py
      ./dev/lint-python
      ./python/docs/epytext.py
      ./examples/src/main/python/mllib/decision_tree_classification_example.py
      ./examples/src/main/python/mllib/decision_tree_regression_example.py
      ./examples/src/main/python/mllib/gradient_boosting_classification_example.py
      ./examples/src/main/python/mllib/gradient_boosting_regression_example.py
      ./examples/src/main/python/mllib/linear_regression_with_sgd_example.py
      ./examples/src/main/python/mllib/logistic_regression_with_lbfgs_example.py
      ./examples/src/main/python/mllib/naive_bayes_example.py
      ./examples/src/main/python/mllib/random_forest_classification_example.py
      ./examples/src/main/python/mllib/random_forest_regression_example.py
      ./examples/src/main/python/mllib/svm_with_sgd_example.py
      ./examples/src/main/python/streaming/network_wordjoinsentiments.py
      ./sql/hive/src/test/resources/data/scripts/cat.py
      ./sql/hive/src/test/resources/data/scripts/cat_error.py
      ./sql/hive/src/test/resources/data/scripts/doubleescapedtab.py
      ./sql/hive/src/test/resources/data/scripts/dumpdata_script.py
      ./sql/hive/src/test/resources/data/scripts/escapedcarriagereturn.py
      ./sql/hive/src/test/resources/data/scripts/escapednewline.py
      ./sql/hive/src/test/resources/data/scripts/escapedtab.py
      ./sql/hive/src/test/resources/data/scripts/input20_script.py
      ./sql/hive/src/test/resources/data/scripts/newline.py
      ```
      
      ## How was this patch tested?
      
      - `./python/docs/epytext.py`
      
        ```bash
        cd ./python/docs $$ make html
        ```
      
      - pep8 check (Python 2.7 / Python 3.3.6)
      
        ```
        ./dev/lint-python
        ```
      
      - `./dev/merge_spark_pr.py` (Python 2.7 only / Python 3.3.6 not working)
      
        ```bash
        python -m doctest -v ./dev/merge_spark_pr.py
        ```
      
      - `./dev/create-release/releaseutils.py` `./dev/create-release/generate-contributors.py` `./dev/create-release/translate-contributors.py` (Python 2.7 only / Python 3.3.6 not working)
      
        ```bash
        python generate-contributors.py
        python translate-contributors.py
        ```
      
      - Examples (Python 2.7 / Python 3.3.6)
      
        ```bash
        ./bin/spark-submit examples/src/main/python/mllib/decision_tree_classification_example.py
        ./bin/spark-submit examples/src/main/python/mllib/decision_tree_regression_example.py
        ./bin/spark-submit examples/src/main/python/mllib/gradient_boosting_classification_example.py
        ./bin/spark-submit examples/src/main/python/mllib/gradient_boosting_regression_example.p
        ./bin/spark-submit examples/src/main/python/mllib/random_forest_classification_example.py
        ./bin/spark-submit examples/src/main/python/mllib/random_forest_regression_example.py
        ```
      
      - Examples (Python 2.7 only / Python 3.3.6 not working)
        ```
        ./bin/spark-submit examples/src/main/python/mllib/linear_regression_with_sgd_example.py
        ./bin/spark-submit examples/src/main/python/mllib/logistic_regression_with_lbfgs_example.py
        ./bin/spark-submit examples/src/main/python/mllib/naive_bayes_example.py
        ./bin/spark-submit examples/src/main/python/mllib/svm_with_sgd_example.py
        ```
      
      - `sql/hive/src/test/resources/data/scripts/*.py` (Python 2.7 / Python 3.3.6 within suggested changes)
      
        Manually tested only changed ones.
      
      - `./dev/github_jira_sync.py` (Python 2.7 only / Python 3.3.6 not working)
      
        Manually tested this after disabling actually adding comments and links.
      
      And also via Jenkins tests.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #16405 from HyukjinKwon/minor-pep8.
      Unverified
      46b21260
  35. Dec 08, 2016
    • Yanbo Liang's avatar
      [SPARK-18325][SPARKR][ML] SparkR ML wrappers example code and user guide · 9bf8f3cd
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      * Add all R examples for ML wrappers which were added during 2.1 release cycle.
      * Split the whole ```ml.R``` example file into individual example for each algorithm, which will be convenient for users to rerun them.
      * Add corresponding examples to ML user guide.
      * Update ML section of SparkR user guide.
      
      Note: MLlib Scala/Java/Python examples will be consistent, however, SparkR examples may different from them, since R users may use the algorithms in a different way, for example, using R ```formula``` to specify ```featuresCol``` and ```labelCol```.
      
      ## How was this patch tested?
      Run all examples manually.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #16148 from yanboliang/spark-18325.
      9bf8f3cd
Loading