Skip to content
Snippets Groups Projects
  1. May 12, 2016
  2. May 11, 2016
    • cody koeninger's avatar
      [SPARK-15085][STREAMING][KAFKA] Rename streaming-kafka artifact · 89e67d66
      cody koeninger authored
      ## What changes were proposed in this pull request?
      Renaming the streaming-kafka artifact to include kafka version, in anticipation of needing a different artifact for later kafka versions
      
      ## How was this patch tested?
      Unit tests
      
      Author: cody koeninger <cody@koeninger.org>
      
      Closes #12946 from koeninger/SPARK-15085.
      89e67d66
    • Dongjoon Hyun's avatar
      [SPARK-14933][HOTFIX] Replace `sqlContext` with `spark`. · e1576478
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      This fixes compile errors.
      
      ## How was this patch tested?
      
      Pass the Jenkins tests.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #13053 from dongjoon-hyun/hotfix_sqlquerysuite.
      e1576478
    • Zheng RuiFeng's avatar
      [SPARK-15150][EXAMPLE][DOC] Update LDA examples · d88afabd
      Zheng RuiFeng authored
      ## What changes were proposed in this pull request?
      1,create a libsvm-type dataset for lda: `data/mllib/sample_lda_libsvm_data.txt`
      2,add python example
      3,directly read the datafile in examples
      4,BTW, change to `SparkSession` in `aft_survival_regression.py`
      
      ## How was this patch tested?
      manual tests
      `./bin/spark-submit examples/src/main/python/ml/lda_example.py`
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #12927 from zhengruifeng/lda_pe.
      d88afabd
    • Zheng RuiFeng's avatar
      [SPARK-15149][EXAMPLE][DOC] update kmeans example · 8beae591
      Zheng RuiFeng authored
      ## What changes were proposed in this pull request?
      Python example for ml.kmeans already exists, but not included in user guide.
      1,small changes like: `example_on` `example_off`
      2,add it to user guide
      3,update examples to directly read datafile
      
      ## How was this patch tested?
      manual tests
      `./bin/spark-submit examples/src/main/python/ml/kmeans_example.py
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #12925 from zhengruifeng/km_pe.
      8beae591
    • Zheng RuiFeng's avatar
      [SPARK-14340][EXAMPLE][DOC] Update Examples and User Guide for ml.BisectingKMeans · cef73b56
      Zheng RuiFeng authored
      ## What changes were proposed in this pull request?
      
      1, add BisectingKMeans to ml-clustering.md
      2, add the missing Scala BisectingKMeansExample
      3, create a new datafile `data/mllib/sample_kmeans_data.txt`
      
      ## How was this patch tested?
      
      manual tests
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #11844 from zhengruifeng/doc_bkm.
      cef73b56
    • Zheng RuiFeng's avatar
      [SPARK-15141][EXAMPLE][DOC] Update OneVsRest Examples · ad1a8466
      Zheng RuiFeng authored
      ## What changes were proposed in this pull request?
      1, Add python example for OneVsRest
      2, remove args-parsing
      
      ## How was this patch tested?
      manual tests
      `./bin/spark-submit examples/src/main/python/ml/one_vs_rest_example.py`
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #12920 from zhengruifeng/ovr_pe.
      ad1a8466
  3. May 09, 2016
    • hyukjinkwon's avatar
      [MINOR][DOCS] Remove remaining sqlContext in documentation at examples · 2992a215
      hyukjinkwon authored
      This PR removes `sqlContext` in examples. Actual usage was all replaced in https://github.com/apache/spark/pull/12809 but there are some in comments.
      
      Manual style checking.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #13006 from HyukjinKwon/minor-docs.
      2992a215
    • Yanbo Liang's avatar
      [MINOR] [SPARKR] Update data-manipulation.R to use native csv reader · ee3b1715
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      * Since Spark has supported native csv reader, it does not necessary to use the third party ```spark-csv``` in ```examples/src/main/r/data-manipulation.R```. Meanwhile, remove all ```spark-csv``` usage in SparkR.
      * Running R applications through ```sparkR``` is not supported as of Spark 2.0, so we change to use ```./bin/spark-submit``` to run the example.
      
      ## How was this patch tested?
      Offline test.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #13005 from yanboliang/r-df-examples.
      ee3b1715
  4. May 07, 2016
    • Nick Pentreath's avatar
      [MINOR][ML][PYSPARK] ALS example cleanup · b0cafdb6
      Nick Pentreath authored
      Cleans up ALS examples by removing unnecessary casts to double for `rating` and `prediction` columns, since `RegressionEvaluator` now supports `Double` & `Float` input types.
      
      ## How was this patch tested?
      
      Manual compile and run with `run-example ml.ALSExample` and `spark-submit examples/src/main/python/ml/als_example.py`.
      
      Author: Nick Pentreath <nickp@za.ibm.com>
      
      Closes #12892 from MLnick/als-examples-cleanup.
      b0cafdb6
  5. May 06, 2016
  6. May 05, 2016
    • Dongjoon Hyun's avatar
      [SPARK-15134][EXAMPLE] Indent SparkSession builder patterns and update... · 2c170dd3
      Dongjoon Hyun authored
      [SPARK-15134][EXAMPLE] Indent SparkSession builder patterns and update binary_classification_metrics_example.py
      
      ## What changes were proposed in this pull request?
      
      This issue addresses the comments in SPARK-15031 and also fix java-linter errors.
      - Use multiline format in SparkSession builder patterns.
      - Update `binary_classification_metrics_example.py` to use `SparkSession`.
      - Fix Java Linter errors (in SPARK-13745, SPARK-15031, and so far)
      
      ## How was this patch tested?
      
      After passing the Jenkins tests and run `dev/lint-java` manually.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #12911 from dongjoon-hyun/SPARK-15134.
      2c170dd3
    • Sandeep Singh's avatar
      [SPARK-15072][SQL][REPL][EXAMPLES] Remove SparkSession.withHiveSupport · ed6f3f8a
      Sandeep Singh authored
      ## What changes were proposed in this pull request?
      Removing the `withHiveSupport` method of `SparkSession`, instead use `enableHiveSupport`
      
      ## How was this patch tested?
      ran tests locally
      
      Author: Sandeep Singh <sandeep@techaddict.me>
      
      Closes #12851 from techaddict/SPARK-15072.
      ed6f3f8a
  7. May 04, 2016
    • Dongjoon Hyun's avatar
      [SPARK-15031][EXAMPLE] Use SparkSession in Scala/Python/Java example. · cdce4e62
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      This PR aims to update Scala/Python/Java examples by replacing `SQLContext` with newly added `SparkSession`.
      
      - Use **SparkSession Builder Pattern** in 154(Scala 55, Java 52, Python 47) files.
      - Add `getConf` in Python SparkContext class: `python/pyspark/context.py`
      - Replace **SQLContext Singleton Pattern** with **SparkSession Singleton Pattern**:
        - `SqlNetworkWordCount.scala`
        - `JavaSqlNetworkWordCount.java`
        - `sql_network_wordcount.py`
      
      Now, `SQLContexts` are used only in R examples and the following two Python examples. The python examples are untouched in this PR since it already fails some unknown issue.
      - `simple_params_example.py`
      - `aft_survival_regression.py`
      
      ## How was this patch tested?
      
      Manual.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #12809 from dongjoon-hyun/SPARK-15031.
      cdce4e62
    • Zheng RuiFeng's avatar
      [MINOR] Add python3 compatibility in python examples · 4530250f
      Zheng RuiFeng authored
      ## What changes were proposed in this pull request?
      Add python3 compatibility in python examples
      
      ## How was this patch tested?
      manual tests
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #12868 from zhengruifeng/fix_gmm_py.
      4530250f
  8. May 03, 2016
  9. Apr 30, 2016
    • Dongjoon Hyun's avatar
      [MINOR][EXAMPLE] Use SparkSession instead of SQLContext in RDDRelation.scala · f86f7176
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      Now, `SQLContext` is used for backward-compatibility, we had better use `SparkSession` in Spark 2.0 examples.
      
      ## How was this patch tested?
      
      It's just example change. After building, run `bin/run-example org.apache.spark.examples.sql.RDDRelation`.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #12808 from dongjoon-hyun/rddrelation.
      f86f7176
  10. Apr 27, 2016
    • wm624@hotmail.com's avatar
      [SPARK-14937][ML][DOCUMENT] spark.ml LogisticRegression sqlCtx in scala is... · c74fd1e5
      wm624@hotmail.com authored
      [SPARK-14937][ML][DOCUMENT] spark.ml LogisticRegression sqlCtx in scala is inconsistent with java and python
      
      ## What changes were proposed in this pull request?
      In spark.ml document, the LogisticRegression scala example uses sqlCtx. It is inconsistent with java and python examples which use sqlContext. In addition, a user can't copy & paste to run the example in spark-shell as sqlCtx doesn't exist in spark-shell while sqlContext exists.
      
      Change the scala example referred by the spark.ml example.
      
      ## How was this patch tested?
      
      Compile the example scala file and it passes compilation.
      
      Author: wm624@hotmail.com <wm624@hotmail.com>
      
      Closes #12717 from wangmiao1981/doc.
      c74fd1e5
  11. Apr 26, 2016
    • Josh Rosen's avatar
      [SPARK-14925][BUILD] Re-introduce 'unused' dependency so that published POMs are flattened · 75879ac3
      Josh Rosen authored
      Spark's published POMs are supposed to be flattened and not contain variable substitution (see SPARK-3812), but the dummy dependency that was required for this was accidentally removed. We should re-introduce this dependency in order to fix an issue where the un-flattened POMs cause the wrong dependencies to be included in Scala 2.10 published POMs.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #12706 from JoshRosen/SPARK-14925-published-poms-should-be-flattened.
      75879ac3
    • Zheng RuiFeng's avatar
      [SPARK-14514][DOC] Add python example for VectorSlicer · e88476c8
      Zheng RuiFeng authored
      ## What changes were proposed in this pull request?
      Add the missing python example for VectorSlicer
      
      ## How was this patch tested?
      manual tests
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #12282 from zhengruifeng/vecslicer_pe.
      e88476c8
    • Azeem Jiva's avatar
      [SPARK-14756][CORE] Use parseLong instead of valueOf · de6e6334
      Azeem Jiva authored
      ## What changes were proposed in this pull request?
      
      Use Long.parseLong which returns a primative.
      Use a series of appends() reduces the creation of an extra StringBuilder type
      
      ## How was this patch tested?
      
      Unit tests
      
      Author: Azeem Jiva <azeemj@gmail.com>
      
      Closes #12520 from javawithjiva/minor.
      de6e6334
  12. Apr 25, 2016
    • Andrew Or's avatar
      [SPARK-14721][SQL] Remove HiveContext (part 2) · 3c5e65c3
      Andrew Or authored
      ## What changes were proposed in this pull request?
      
      This removes the class `HiveContext` itself along with all code usages associated with it. The bulk of the work was already done in #12485. This is mainly just code cleanup and actually removing the class.
      
      Note: A couple of things will break after this patch. These will be fixed separately.
      - the python HiveContext
      - all the documentation / comments referencing HiveContext
      - there will be no more HiveContext in the REPL (fixed by #12589)
      
      ## How was this patch tested?
      
      No change in functionality.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #12585 from andrewor14/delete-hive-context.
      3c5e65c3
    • Marcelo Vanzin's avatar
      [SPARK-14744][EXAMPLES] Clean up examples packaging, remove outdated examples. · a680562a
      Marcelo Vanzin authored
      First, make all dependencies in the examples module provided, and explicitly
      list a couple of ones that somehow are promoted to compile by maven. This
      means that to run streaming examples, the streaming connector package needs
      to be provided to run-examples using --packages or --jars, just like regular
      apps.
      
      Also, remove a couple of outdated examples. HBase has had Spark bindings for
      a while and is even including them in the HBase distribution in the next
      version, making the examples obsolete. The same applies to Cassandra, which
      seems to have a proper Spark binding library already.
      
      I just tested the build, which passes, and ran SparkPi. The examples jars
      directory now has only two jars:
      
      ```
      $ ls -1 examples/target/scala-2.11/jars/
      scopt_2.11-3.3.0.jar
      spark-examples_2.11-2.0.0-SNAPSHOT.jar
      ```
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #12544 from vanzin/SPARK-14744.
      a680562a
    • Dongjoon Hyun's avatar
      [SPARK-14883][DOCS] Fix wrong R examples and make them up-to-date · 6ab4d9e0
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      This issue aims to fix some errors in R examples and make them up-to-date in docs and example modules.
      
      - Remove the wrong usage of `map`. We need to use `lapply` in `sparkR` if needed. However, `lapply` is private so far. The corrected example will be added later.
      - Fix the wrong example in Section `Generic Load/Save Functions` of `docs/sql-programming-guide.md` for consistency
      - Fix datatypes in `sparkr.md`.
      - Update a data result in `sparkr.md`.
      - Replace deprecated functions to remove warnings: jsonFile -> read.json, parquetFile -> read.parquet
      - Use up-to-date R-like functions: loadDF -> read.df, saveDF -> write.df, saveAsParquetFile -> write.parquet
      - Replace `SparkR DataFrame` with `SparkDataFrame` in `dataframe.R` and `data-manipulation.R`.
      - Other minor syntax fixes and a typo.
      
      ## How was this patch tested?
      
      Manual.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #12649 from dongjoon-hyun/SPARK-14883.
      6ab4d9e0
  13. Apr 23, 2016
    • Sean Owen's avatar
      [SPARK-14873][CORE] Java sampleByKey methods take ju.Map but with Scala Double... · be0d5d3b
      Sean Owen authored
      [SPARK-14873][CORE] Java sampleByKey methods take ju.Map but with Scala Double values; results in type Object
      
      ## What changes were proposed in this pull request?
      
      Java `sampleByKey` methods should accept `Map` with `java.lang.Double` values
      
      ## How was this patch tested?
      
      Existing (updated) Jenkins tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #12637 from srowen/SPARK-14873.
      be0d5d3b
  14. Apr 21, 2016
    • Sean Owen's avatar
      [SPARK-8393][STREAMING] JavaStreamingContext#awaitTermination() throws... · 8bd05c9d
      Sean Owen authored
      [SPARK-8393][STREAMING] JavaStreamingContext#awaitTermination() throws non-declared InterruptedException
      
      ## What changes were proposed in this pull request?
      
      `JavaStreamingContext.awaitTermination` methods should be declared as `throws[InterruptedException]` so that this exception can be handled in Java code. Note this is not just a doc change, but an API change, since now (in Java) the method has a checked exception to handle. All await-like methods in Java APIs behave this way, so seems worthwhile for 2.0.
      
      ## How was this patch tested?
      
      Jenkins tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #12418 from srowen/SPARK-8393.
      8bd05c9d
  15. Apr 20, 2016
    • Yuhao Yang's avatar
      [SPARK-14635][ML] Documentation and Examples for TF-IDF only refer to HashingTF · ed9d8038
      Yuhao Yang authored
      ## What changes were proposed in this pull request?
      
      Currently, the docs for TF-IDF only refer to using HashingTF with IDF. However, CountVectorizer can also be used. We should probably amend the user guide and examples to show this.
      
      ## How was this patch tested?
      
      unit tests and doc generation
      
      Author: Yuhao Yang <hhbyyh@gmail.com>
      
      Closes #12454 from hhbyyh/tfdoc.
      ed9d8038
  16. Apr 18, 2016
  17. Apr 16, 2016
    • hyukjinkwon's avatar
      [MINOR] Remove inappropriate type notation and extra anonymous closure within... · 9f678e97
      hyukjinkwon authored
      [MINOR] Remove inappropriate type notation and extra anonymous closure within functional transformations
      
      ## What changes were proposed in this pull request?
      
      This PR removes
      
      - Inappropriate type notations
          For example, from
          ```scala
          words.foreachRDD { (rdd: RDD[String], time: Time) =>
          ...
          ```
          to
          ```scala
          words.foreachRDD { (rdd, time) =>
          ...
          ```
      
      - Extra anonymous closure within functional transformations.
          For example,
          ```scala
          .map(item => {
            ...
          })
          ```
      
          which can be just simply as below:
      
          ```scala
          .map { item =>
            ...
          }
          ```
      
      and corrects some obvious style nits.
      
      ## How was this patch tested?
      
      This was tested after adding rules in `scalastyle-config.xml`, which ended up with not finding all perfectly.
      
      The rules applied were below:
      
      - For the first correction,
      
      ```xml
      <check customId="NoExtraClosure" level="error" class="org.scalastyle.file.RegexChecker" enabled="true">
          <parameters><parameter name="regex">(?m)\.[a-zA-Z_][a-zA-Z0-9]*\(\s*[^,]+s*=>\s*\{[^\}]+\}\s*\)</parameter></parameters>
      </check>
      ```
      
      ```xml
      <check customId="NoExtraClosure" level="error" class="org.scalastyle.file.RegexChecker" enabled="true">
          <parameters><parameter name="regex">\.[a-zA-Z_][a-zA-Z0-9]*\s*[\{|\(]([^\n>,]+=>)?\s*\{([^()]|(?R))*\}^[,]</parameter></parameters>
      </check>
      ```
      
      - For the second correction
      ```xml
      <check customId="TypeNotation" level="error" class="org.scalastyle.file.RegexChecker" enabled="true">
          <parameters><parameter name="regex">\.[a-zA-Z_][a-zA-Z0-9]*\s*[\{|\(]\s*\([^):]*:R))*\}^[,]</parameter></parameters>
      </check>
      ```
      
      **Those rules were not added**
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #12413 from HyukjinKwon/SPARK-style.
      9f678e97
  18. Apr 14, 2016
    • hyukjinkwon's avatar
      [MINOR][SQL] Remove extra anonymous closure within functional transformations · 6fc3dc88
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      This PR removes extra anonymous closure within functional transformations.
      
      For example,
      
      ```scala
      .map(item => {
        ...
      })
      ```
      
      which can be just simply as below:
      
      ```scala
      .map { item =>
        ...
      }
      ```
      
      ## How was this patch tested?
      
      Related unit tests and `sbt scalastyle`.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #12382 from HyukjinKwon/minor-extra-closers.
      6fc3dc88
  19. Apr 13, 2016
  20. Apr 12, 2016
  21. Apr 11, 2016
    • Joseph K. Bradley's avatar
      [MINOR][ML] Fixed MLlib build warnings · e9e1adc0
      Joseph K. Bradley authored
      ## What changes were proposed in this pull request?
      
      Fixes to eliminate warnings during package and doc builds.
      
      ## How was this patch tested?
      
      Existing unit tests
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #12263 from jkbradley/warning-cleanups.
      e9e1adc0
    • Xiangrui Meng's avatar
      [SPARK-14500] [ML] Accept Dataset[_] instead of DataFrame in MLlib APIs · 1c751fcf
      Xiangrui Meng authored
      ## What changes were proposed in this pull request?
      
      This PR updates MLlib APIs to accept `Dataset[_]` as input where `DataFrame` was the input type. This PR doesn't change the output type. In Java, `Dataset[_]` maps to `Dataset<?>`, which includes `Dataset<Row>`. Some implementations were changed in order to return `DataFrame`. Tests and examples were updated. Note that this is a breaking change for subclasses of Transformer/Estimator.
      
      Lol, we don't have to rename the input argument, which has been `dataset` since Spark 1.2.
      
      TODOs:
      - [x] update MiMaExcludes (seems all covered by explicit filters from SPARK-13920)
      - [x] Python
      - [x] add a new test to accept Dataset[LabeledPoint]
      - [x] remove unused imports of Dataset
      
      ## How was this patch tested?
      
      Exiting unit tests with some modifications.
      
      cc: rxin jkbradley
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #12274 from mengxr/SPARK-14500.
      1c751fcf
  22. Apr 10, 2016
    • Örjan Lundberg's avatar
      Update KMeansExample.scala · b5c78562
      Örjan Lundberg authored
      ## What changes were proposed in this pull request?
      example does not work wo DataFrame import
      
      ## How was this patch tested?
      
      example doc only
      
      example does not work wo DataFrame import
      
      Author: Örjan Lundberg <orjan.lundberg@gmail.com>
      
      Closes #12277 from oluies/patch-1.
      b5c78562
Loading