Skip to content
Snippets Groups Projects
  1. Jul 14, 2016
    • Bryan Cutler's avatar
      [SPARK-16403][EXAMPLES] Cleanup to remove unused imports, consistent style, minor fixes · e3f8a033
      Bryan Cutler authored
      ## What changes were proposed in this pull request?
      
      Cleanup of examples, mostly from PySpark-ML to fix minor issues:  unused imports, style consistency, pipeline_example is a duplicate, use future print funciton, and a spelling error.
      
      * The "Pipeline Example" is duplicated by "Simple Text Classification Pipeline" in Scala, Python, and Java.
      
      * "Estimator Transformer Param Example" is duplicated by "Simple Params Example" in Scala, Python and Java
      
      * Synced random_forest_classifier_example.py with Scala by adding IndexToString label converted
      
      * Synced train_validation_split.py (in Scala ModelSelectionViaTrainValidationExample) by adjusting data split, adding grid for intercept.
      
      * RegexTokenizer was doing nothing in tokenizer_example.py and JavaTokenizerExample.java, synced with Scala version
      
      ## How was this patch tested?
      local tests and run modified examples
      
      Author: Bryan Cutler <cutlerb@gmail.com>
      
      Closes #14081 from BryanCutler/examples-cleanup-SPARK-16403.
      e3f8a033
  2. Jun 29, 2016
    • Bryan Cutler's avatar
      [SPARK-16261][EXAMPLES][ML] Fixed incorrect appNames in ML Examples · 21385d02
      Bryan Cutler authored
      ## What changes were proposed in this pull request?
      
      Some appNames in ML examples are incorrect, mostly in PySpark but one in Scala.  This corrects the names.
      
      ## How was this patch tested?
      Style, local tests
      
      Author: Bryan Cutler <cutlerb@gmail.com>
      
      Closes #13949 from BryanCutler/pyspark-example-appNames-fix-SPARK-16261.
      21385d02
  3. May 11, 2016
    • Zheng RuiFeng's avatar
      [SPARK-15150][EXAMPLE][DOC] Update LDA examples · d88afabd
      Zheng RuiFeng authored
      ## What changes were proposed in this pull request?
      1,create a libsvm-type dataset for lda: `data/mllib/sample_lda_libsvm_data.txt`
      2,add python example
      3,directly read the datafile in examples
      4,BTW, change to `SparkSession` in `aft_survival_regression.py`
      
      ## How was this patch tested?
      manual tests
      `./bin/spark-submit examples/src/main/python/ml/lda_example.py`
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #12927 from zhengruifeng/lda_pe.
      d88afabd
    • Zheng RuiFeng's avatar
      [SPARK-15149][EXAMPLE][DOC] update kmeans example · 8beae591
      Zheng RuiFeng authored
      ## What changes were proposed in this pull request?
      Python example for ml.kmeans already exists, but not included in user guide.
      1,small changes like: `example_on` `example_off`
      2,add it to user guide
      3,update examples to directly read datafile
      
      ## How was this patch tested?
      manual tests
      `./bin/spark-submit examples/src/main/python/ml/kmeans_example.py
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #12925 from zhengruifeng/km_pe.
      8beae591
  4. May 05, 2016
    • Dongjoon Hyun's avatar
      [SPARK-15134][EXAMPLE] Indent SparkSession builder patterns and update... · 2c170dd3
      Dongjoon Hyun authored
      [SPARK-15134][EXAMPLE] Indent SparkSession builder patterns and update binary_classification_metrics_example.py
      
      ## What changes were proposed in this pull request?
      
      This issue addresses the comments in SPARK-15031 and also fix java-linter errors.
      - Use multiline format in SparkSession builder patterns.
      - Update `binary_classification_metrics_example.py` to use `SparkSession`.
      - Fix Java Linter errors (in SPARK-13745, SPARK-15031, and so far)
      
      ## How was this patch tested?
      
      After passing the Jenkins tests and run `dev/lint-java` manually.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #12911 from dongjoon-hyun/SPARK-15134.
      2c170dd3
  5. May 04, 2016
    • Dongjoon Hyun's avatar
      [SPARK-15031][EXAMPLE] Use SparkSession in Scala/Python/Java example. · cdce4e62
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      This PR aims to update Scala/Python/Java examples by replacing `SQLContext` with newly added `SparkSession`.
      
      - Use **SparkSession Builder Pattern** in 154(Scala 55, Java 52, Python 47) files.
      - Add `getConf` in Python SparkContext class: `python/pyspark/context.py`
      - Replace **SQLContext Singleton Pattern** with **SparkSession Singleton Pattern**:
        - `SqlNetworkWordCount.scala`
        - `JavaSqlNetworkWordCount.java`
        - `sql_network_wordcount.py`
      
      Now, `SQLContexts` are used only in R examples and the following two Python examples. The python examples are untouched in this PR since it already fails some unknown issue.
      - `simple_params_example.py`
      - `aft_survival_regression.py`
      
      ## How was this patch tested?
      
      Manual.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #12809 from dongjoon-hyun/SPARK-15031.
      cdce4e62
  6. Mar 11, 2016
  7. Aug 02, 2015
    • Yu ISHIKAWA's avatar
      [SPARK-9149] [ML] [EXAMPLES] Add an example of spark.ml KMeans · 244016a9
      Yu ISHIKAWA authored
      [SPARK-9149] Add an example of spark.ml KMeans - ASF JIRA https://issues.apache.org/jira/browse/SPARK-9149
      
      jkbradley Should we support other data formats, such as TSV or CSV. I have implemented these examples which support only space separated file which is same as the example for `spark.mllib`'s `KMeans`.
      
      Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
      
      Closes #7697 from yu-iskw/SPARK-9149 and squashes the following commits:
      
      7137bad [Yu ISHIKAWA] Fix the typo
      56b9da2 [Yu ISHIKAWA] Fix the place of the wrong import statment
      554e574 [Yu ISHIKAWA] Change the way to format input data in KMeansExample
      e7a948a [Yu ISHIKAWA] Import spark.ml.clustering.KMeans
      1901e0c [Yu ISHIKAWA] Change how to initialize an array for a DataFrame schema
      d8043f5 [Yu ISHIKAWA] Return a value directly
      d81bf55 [Yu ISHIKAWA] Fix a typo and its access specifiers
      3e0862d [Yu ISHIKAWA] Make KMeansExample more simple
      51ce9c1 [Yu ISHIKAWA] Make JavaKMeansExample more simple
      a5a01e0 [Yu ISHIKAWA] Fix a Javadoc about the command to execute the example
      b09ec13 [Yu ISHIKAWA] [SPARK-9149][ML][Examples] Add an example of spark.ml KMeans
      244016a9
Loading