Skip to content
Snippets Groups Projects
  1. Oct 17, 2016
    • Maxime Rihouey's avatar
      Fix example of tf_idf with minDocFreq · e3bf37fa
      Maxime Rihouey authored
      ## What changes were proposed in this pull request?
      
      The python example for tf_idf with the parameter "minDocFreq" is not properly set up because the same variable is used to transform the document for both with and without the "minDocFreq" parameter.
      The IDF(minDocFreq=2) is stored in the variable "idfIgnore" but then it is the original variable "idf" used to transform the "tf" instead of the "idfIgnore".
      
      ## How was this patch tested?
      
      Before the results for "tfidf" and "tfidfIgnore" were the same:
      tfidf:
      (1048576,[1046921],[3.75828890549])
      (1048576,[1046920],[3.75828890549])
      (1048576,[1046923],[3.75828890549])
      (1048576,[892732],[3.75828890549])
      (1048576,[892733],[3.75828890549])
      (1048576,[892734],[3.75828890549])
      tfidfIgnore:
      (1048576,[1046921],[3.75828890549])
      (1048576,[1046920],[3.75828890549])
      (1048576,[1046923],[3.75828890549])
      (1048576,[892732],[3.75828890549])
      (1048576,[892733],[3.75828890549])
      (1048576,[892734],[3.75828890549])
      
      After the fix those are how they should be:
      tfidf:
      (1048576,[1046921],[3.75828890549])
      (1048576,[1046920],[3.75828890549])
      (1048576,[1046923],[3.75828890549])
      (1048576,[892732],[3.75828890549])
      (1048576,[892733],[3.75828890549])
      (1048576,[892734],[3.75828890549])
      tfidfIgnore:
      (1048576,[1046921],[0.0])
      (1048576,[1046920],[0.0])
      (1048576,[1046923],[0.0])
      (1048576,[892732],[0.0])
      (1048576,[892733],[0.0])
      (1048576,[892734],[0.0])
      
      Author: Maxime Rihouey <maxime.rihouey@gmail.com>
      
      Closes #15503 from maximerihouey/patch-1.
      Unverified
      e3bf37fa
  2. Oct 10, 2016
    • Wenchen Fan's avatar
      [SPARK-17338][SQL] add global temp view · 23ddff4b
      Wenchen Fan authored
      ## What changes were proposed in this pull request?
      
      Global temporary view is a cross-session temporary view, which means it's shared among all sessions. Its lifetime is the lifetime of the Spark application, i.e. it will be automatically dropped when the application terminates. It's tied to a system preserved database `global_temp`(configurable via SparkConf), and we must use the qualified name to refer a global temp view, e.g. SELECT * FROM global_temp.view1.
      
      changes for `SessionCatalog`:
      
      1. add a new field `gloabalTempViews: GlobalTempViewManager`, to access the shared global temp views, and the global temp db name.
      2. `createDatabase` will fail if users wanna create `global_temp`, which is system preserved.
      3. `setCurrentDatabase` will fail if users wanna set `global_temp`, which is system preserved.
      4. add `createGlobalTempView`, which is used in `CreateViewCommand` to create global temp views.
      5. add `dropGlobalTempView`, which is used in `CatalogImpl` to drop global temp view.
      6. add `alterTempViewDefinition`, which is used in `AlterViewAsCommand` to update the view definition for local/global temp views.
      7. `renameTable`/`dropTable`/`isTemporaryTable`/`lookupRelation`/`getTempViewOrPermanentTableMetadata`/`refreshTable` will handle global temp views.
      
      changes for SQL commands:
      
      1. `CreateViewCommand`/`AlterViewAsCommand` is updated to support global temp views
      2. `ShowTablesCommand` outputs a new column `database`, which is used to distinguish global and local temp views.
      3. other commands can also handle global temp views if they call `SessionCatalog` APIs which accepts global temp views, e.g. `DropTableCommand`, `AlterTableRenameCommand`, `ShowColumnsCommand`, etc.
      
      changes for other public API
      
      1. add a new method `dropGlobalTempView` in `Catalog`
      2. `Catalog.findTable` can find global temp view
      3. add a new method `createGlobalTempView` in `Dataset`
      
      ## How was this patch tested?
      
      new tests in `SQLViewSuite`
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #14897 from cloud-fan/global-temp-view.
      23ddff4b
  3. Oct 05, 2016
    • sethah's avatar
      [SPARK-17239][ML][DOC] Update user guide for multiclass logistic regression · 9df54f53
      sethah authored
      ## What changes were proposed in this pull request?
      Updates user guide to reflect that LogisticRegression now supports multiclass. Also adds new examples to show multiclass training.
      
      ## How was this patch tested?
      Ran locally using spark-submit, run-example, and copy/paste from user guide into shells. Generated docs and verified correct output.
      
      Author: sethah <seth.hendrickson16@gmail.com>
      
      Closes #15349 from sethah/SPARK-17239.
      Unverified
      9df54f53
  4. Sep 26, 2016
    • Justin Pihony's avatar
      [SPARK-14525][SQL] Make DataFrameWrite.save work for jdbc · 50b89d05
      Justin Pihony authored
      ## What changes were proposed in this pull request?
      
      This change modifies the implementation of DataFrameWriter.save such that it works with jdbc, and the call to jdbc merely delegates to save.
      
      ## How was this patch tested?
      
      This was tested via unit tests in the JDBCWriteSuite, of which I added one new test to cover this scenario.
      
      ## Additional details
      
      rxin This seems to have been most recently touched by you and was also commented on in the JIRA.
      
      This contribution is my original work and I license the work to the project under the project's open source license.
      
      Author: Justin Pihony <justin.pihony@gmail.com>
      Author: Justin Pihony <justin.pihony@typesafe.com>
      
      Closes #12601 from JustinPihony/jdbc_reconciliation.
      Unverified
      50b89d05
  5. Sep 12, 2016
  6. Sep 03, 2016
  7. Aug 27, 2016
    • Sean Owen's avatar
      [SPARK-17001][ML] Enable standardScaler to standardize sparse vectors when withMean=True · e07baf14
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      Allow centering / mean scaling of sparse vectors in StandardScaler, if requested. This is for compatibility with `VectorAssembler` in common usages.
      
      ## How was this patch tested?
      
      Jenkins tests, including new caes to reflect the new behavior.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #14663 from srowen/SPARK-17001.
      e07baf14
  8. Aug 24, 2016
    • Weiqing Yang's avatar
      [MINOR][BUILD] Fix Java CheckStyle Error · 673a80d2
      Weiqing Yang authored
      ## What changes were proposed in this pull request?
      As Spark 2.0.1 will be released soon (mentioned in the spark dev mailing list), besides the critical bugs, it's better to fix the code style errors before the release.
      
      Before:
      ```
      ./dev/lint-java
      Checkstyle checks failed at following occurrences:
      [ERROR] src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java:[525] (sizes) LineLength: Line is longer than 100 characters (found 119).
      [ERROR] src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredNetworkWordCount.java:[64] (sizes) LineLength: Line is longer than 100 characters (found 103).
      ```
      After:
      ```
      ./dev/lint-java
      Using `mvn` from path: /usr/local/bin/mvn
      Checkstyle checks passed.
      ```
      ## How was this patch tested?
      Manual.
      
      Author: Weiqing Yang <yangweiqing001@gmail.com>
      
      Closes #14768 from Sherry302/fixjavastyle.
      673a80d2
  9. Aug 20, 2016
    • wm624@hotmail.com's avatar
      [SPARKR][EXAMPLE] change example APP name · 3e5fdeb3
      wm624@hotmail.com authored
      ## What changes were proposed in this pull request?
      
      (Please fill in changes proposed in this fix)
      
      For R SQL example, appname is "MyApp". While examples in scala, Java and python, the appName is "x Spark SQL basic example".
      
      I made the R example consistent with other examples.
      
      ## How was this patch tested?
      
      (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
      
      Manual test
      (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
      
      Author: wm624@hotmail.com <wm624@hotmail.com>
      
      Closes #14703 from wangmiao1981/example.
      3e5fdeb3
  10. Aug 11, 2016
    • hyukjinkwon's avatar
      [SPARK-16886][EXAMPLES][DOC] Fix some examples to be consistent and indentation in documentation · 7186e8c3
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      Originally this PR was based on #14491 but I realised that fixing examples are more sensible rather than comments.
      
      This PR fixes three things below:
      
       - Fix two wrong examples in `structured-streaming-programming-guide.md`. Loading via `read.load(..)` without `as` will be `Dataset<Row>` not `Dataset<String>` in Java.
      
      - Fix indentation across `structured-streaming-programming-guide.md`. Python has 4 spaces and Scala and Java have double spaces. These are inconsistent across the examples.
      
      - Fix `StructuredNetworkWordCountWindowed` and  `StructuredNetworkWordCount` in Java and Scala to initially load `DataFrame` and `Dataset<Row>` to be consistent with the comments and some examples in `structured-streaming-programming-guide.md` and to match Scala and Java to Python one (Python one loads it as `DataFrame` initially).
      
      ## How was this patch tested?
      
      N/A
      
      Closes https://github.com/apache/spark/pull/14491
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      Author: Ganesh Chand <ganeshchand@Ganeshs-MacBook-Pro-2.local>
      
      Closes #14564 from HyukjinKwon/SPARK-16886.
      7186e8c3
  11. Aug 08, 2016
    • Weiqing Yang's avatar
      [SPARK-16945] Fix Java Lint errors · e10ca8de
      Weiqing Yang authored
      ## What changes were proposed in this pull request?
      This PR is to fix the minor Java linter errors as following:
      [ERROR] src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java:[42,10] (modifier) RedundantModifier: Redundant 'final' modifier.
      [ERROR] src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java:[97,10] (modifier) RedundantModifier: Redundant 'final' modifier.
      
      ## How was this patch tested?
      Manual test.
      dev/lint-java
      Using `mvn` from path: /usr/local/bin/mvn
      Checkstyle checks passed.
      
      Author: Weiqing Yang <yangweiqing001@gmail.com>
      
      Closes #14532 from Sherry302/master.
      e10ca8de
  12. Aug 05, 2016
    • Bryan Cutler's avatar
      [SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs · 180fd3e0
      Bryan Cutler authored
      ## What changes were proposed in this pull request?
      Improve example outputs to better reflect the functionality that is being presented.  This mostly consisted of modifying what was printed at the end of the example, such as calling show() with truncate=False, but sometimes required minor tweaks in the example data to get relevant output.  Explicitly set parameters when they are used as part of the example.  Fixed Java examples that failed to run because of using old-style MLlib Vectors or problem with schema.  Synced examples between different APIs.
      
      ## How was this patch tested?
      Ran each example for Scala, Python, and Java and made sure output was legible on a terminal of width 100.
      
      Author: Bryan Cutler <cutlerb@gmail.com>
      
      Closes #14308 from BryanCutler/ml-examples-improve-output-SPARK-16260.
      180fd3e0
  13. Aug 02, 2016
    • sandy's avatar
      [SPARK-16816] Modify java example which is also reflect in documentation exmaple · cbdff493
      sandy authored
      ## What changes were proposed in this pull request?
      
      Modify java example which is also reflect in document.
      
      ## How was this patch tested?
      
      run test cases.
      
      Author: sandy <phalodi@gmail.com>
      
      Closes #14436 from phalodi/SPARK-16816.
      cbdff493
    • Xusen Yin's avatar
      [SPARK-16558][EXAMPLES][MLLIB] examples/mllib/LDAExample should use MLVector... · dd8514fa
      Xusen Yin authored
      [SPARK-16558][EXAMPLES][MLLIB] examples/mllib/LDAExample should use MLVector instead of MLlib Vector
      
      ## What changes were proposed in this pull request?
      
      mllib.LDAExample uses ML pipeline and MLlib LDA algorithm. The former transforms original data into MLVector format, while the latter uses MLlibVector format.
      
      ## How was this patch tested?
      
      Test manually.
      
      Author: Xusen Yin <yinxusen@gmail.com>
      
      Closes #14212 from yinxusen/SPARK-16558.
      dd8514fa
    • Cheng Lian's avatar
      [SPARK-16734][EXAMPLES][SQL] Revise examples of all language bindings · 10e1c0e6
      Cheng Lian authored
      ## What changes were proposed in this pull request?
      
      This PR makes various minor updates to examples of all language bindings to make sure they are consistent with each other. Some typos and missing parts (JDBC example in Scala/Java/Python) are also fixed.
      
      ## How was this patch tested?
      
      Manually tested.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #14368 from liancheng/revise-examples.
      10e1c0e6
  14. Jul 30, 2016
    • Bryan Cutler's avatar
      [SPARK-16800][EXAMPLES][ML] Fix Java examples that fail to run due to exception · a6290e51
      Bryan Cutler authored
      ## What changes were proposed in this pull request?
      Some Java examples are using mllib.linalg.Vectors instead of ml.linalg.Vectors and causes an exception when run.  Also there are some Java examples that incorrectly specify data types in the schema, also causing an exception.
      
      ## How was this patch tested?
      Ran corrected examples locally
      
      Author: Bryan Cutler <cutlerb@gmail.com>
      
      Closes #14405 from BryanCutler/java-examples-ml.Vectors-fix-SPARK-16800.
      a6290e51
    • Sean Owen's avatar
      [SPARK-16694][CORE] Use for/foreach rather than map for Unit expressions whose... · 0dc4310b
      Sean Owen authored
      [SPARK-16694][CORE] Use for/foreach rather than map for Unit expressions whose side effects are required
      
      ## What changes were proposed in this pull request?
      
      Use foreach/for instead of map where operation requires execution of body, not actually defining a transformation
      
      ## How was this patch tested?
      
      Jenkins
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #14332 from srowen/SPARK-16694.
      0dc4310b
  15. Jul 23, 2016
    • Cheng Lian's avatar
      [SPARK-16380][EXAMPLES] Update SQL examples and programming guide for Python language binding · 53b2456d
      Cheng Lian authored
      This PR is based on PR #14098 authored by wangmiao1981.
      
      ## What changes were proposed in this pull request?
      
      This PR replaces the original Python Spark SQL example file with the following three files:
      
      - `sql/basic.py`
      
        Demonstrates basic Spark SQL features.
      
      - `sql/datasource.py`
      
        Demonstrates various Spark SQL data sources.
      
      - `sql/hive.py`
      
        Demonstrates Spark SQL Hive interaction.
      
      This PR also removes hard-coded Python example snippets in the SQL programming guide by extracting snippets from the above files using the `include_example` Liquid template tag.
      
      ## How was this patch tested?
      
      Manually tested.
      
      Author: wm624@hotmail.com <wm624@hotmail.com>
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #14317 from liancheng/py-examples-update.
      53b2456d
  16. Jul 19, 2016
    • Xin Ren's avatar
      [SPARK-16535][BUILD] In pom.xml, remove groupId which is redundant definition... · 21a6dd2a
      Xin Ren authored
      [SPARK-16535][BUILD] In pom.xml, remove groupId which is redundant definition and inherited from the parent
      
      https://issues.apache.org/jira/browse/SPARK-16535
      
      ## What changes were proposed in this pull request?
      
      When I scan through the pom.xml of sub projects, I found this warning as below and attached screenshot
      ```
      Definition of groupId is redundant, because it's inherited from the parent
      ```
      ![screen shot 2016-07-13 at 3 13 11 pm](https://cloud.githubusercontent.com/assets/3925641/16823121/744f893e-4916-11e6-8a52-042f83b9db4e.png)
      
      I've tried to remove some of the lines with groupId definition, and the build on my local machine is still ok.
      ```
      <groupId>org.apache.spark</groupId>
      ```
      As I just find now `<maven.version>3.3.9</maven.version>` is being used in Spark 2.x, and Maven-3 supports versionless parent elements: Maven 3 will remove the need to specify the parent version in sub modules. THIS is great (in Maven 3.1).
      
      ref: http://stackoverflow.com/questions/3157240/maven-3-worth-it/3166762#3166762
      
      ## How was this patch tested?
      
      I've tested by re-building the project, and build succeeded.
      
      Author: Xin Ren <iamshrek@126.com>
      
      Closes #14189 from keypointt/SPARK-16535.
      21a6dd2a
    • Dongjoon Hyun's avatar
      [MINOR][BUILD] Fix Java Linter `LineLength` errors · 556a9437
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      This PR fixes four java linter `LineLength` errors. Those are all `LineLength` errors, but we had better remove all java linter errors before release.
      
      ## How was this patch tested?
      
      After pass the Jenkins, `./dev/lint-java`.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #14255 from dongjoon-hyun/minor_java_linter.
      556a9437
    • Cheng Lian's avatar
      [SPARK-16303][DOCS][EXAMPLES] Minor Scala/Java example update · 1426a080
      Cheng Lian authored
      ## What changes were proposed in this pull request?
      
      This PR moves one and the last hard-coded Scala example snippet from the SQL programming guide into `SparkSqlExample.scala`. It also renames all Scala/Java example files so that all "Sql" in the file names are updated to "SQL".
      
      ## How was this patch tested?
      
      Manually verified the generated HTML page.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #14245 from liancheng/minor-scala-example-update.
      1426a080
    • Zheng RuiFeng's avatar
      [MINOR] Remove unused arg in als.py · e5fbb182
      Zheng RuiFeng authored
      ## What changes were proposed in this pull request?
      The second arg in method `update()` is never used. So I delete it.
      
      ## How was this patch tested?
      local run with `./bin/spark-submit examples/src/main/python/als.py`
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #14247 from zhengruifeng/als_refine.
      e5fbb182
  17. Jul 18, 2016
  18. Jul 14, 2016
    • Bryan Cutler's avatar
      [SPARK-16403][EXAMPLES] Cleanup to remove unused imports, consistent style, minor fixes · e3f8a033
      Bryan Cutler authored
      ## What changes were proposed in this pull request?
      
      Cleanup of examples, mostly from PySpark-ML to fix minor issues:  unused imports, style consistency, pipeline_example is a duplicate, use future print funciton, and a spelling error.
      
      * The "Pipeline Example" is duplicated by "Simple Text Classification Pipeline" in Scala, Python, and Java.
      
      * "Estimator Transformer Param Example" is duplicated by "Simple Params Example" in Scala, Python and Java
      
      * Synced random_forest_classifier_example.py with Scala by adding IndexToString label converted
      
      * Synced train_validation_split.py (in Scala ModelSelectionViaTrainValidationExample) by adjusting data split, adding grid for intercept.
      
      * RegexTokenizer was doing nothing in tokenizer_example.py and JavaTokenizerExample.java, synced with Scala version
      
      ## How was this patch tested?
      local tests and run modified examples
      
      Author: Bryan Cutler <cutlerb@gmail.com>
      
      Closes #14081 from BryanCutler/examples-cleanup-SPARK-16403.
      e3f8a033
  19. Jul 13, 2016
  20. Jul 11, 2016
  21. Jul 04, 2016
    • wm624@hotmail.com's avatar
      [SPARK-16260][ML][EXAMPLE] PySpark ML Example Improvements and Cleanup · a539b724
      wm624@hotmail.com authored
      ## What changes were proposed in this pull request?
      1). Remove unused import in Scala example;
      
      2). Move spark session import outside example off;
      
      3). Change parameter setting the same as Scala;
      
      4). Change comment to be consistent;
      
      5). Make sure that Scala and python using the same data set;
      
      I did one pass and fixed the above issues. There are missing examples in python, which might be added later.
      
      TODO: For some examples, there are comments on how to run examples; But there are many missing. We can add them later.
      
      ## How was this patch tested?
      
      (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
      
      Manually test them
      
      Author: wm624@hotmail.com <wm624@hotmail.com>
      
      Closes #14021 from wangmiao1981/ann.
      a539b724
  22. Jul 02, 2016
    • WeichenXu's avatar
      [SPARK-16345][DOCUMENTATION][EXAMPLES][GRAPHX] Extract graphx programming... · 0bd7cd18
      WeichenXu authored
      [SPARK-16345][DOCUMENTATION][EXAMPLES][GRAPHX] Extract graphx programming guide example snippets from source files instead of hard code them
      
      ## What changes were proposed in this pull request?
      
      I extract 6 example programs from GraphX programming guide and replace them with
      `include_example` label.
      
      The 6 example programs are:
      - AggregateMessagesExample.scala
      - SSSPExample.scala
      - TriangleCountingExample.scala
      - ConnectedComponentsExample.scala
      - ComprehensiveExample.scala
      - PageRankExample.scala
      
      All the example code can run using
      `bin/run-example graphx.EXAMPLE_NAME`
      
      ## How was this patch tested?
      
      Manual.
      
      Author: WeichenXu <WeichenXu123@outlook.com>
      
      Closes #14015 from WeichenXu123/graphx_example_plugin.
      0bd7cd18
  23. Jun 30, 2016
  24. Jun 29, 2016
    • Bryan Cutler's avatar
      [SPARK-16261][EXAMPLES][ML] Fixed incorrect appNames in ML Examples · 21385d02
      Bryan Cutler authored
      ## What changes were proposed in this pull request?
      
      Some appNames in ML examples are incorrect, mostly in PySpark but one in Scala.  This corrects the names.
      
      ## How was this patch tested?
      Style, local tests
      
      Author: Bryan Cutler <cutlerb@gmail.com>
      
      Closes #13949 from BryanCutler/pyspark-example-appNames-fix-SPARK-16261.
      21385d02
  25. Jun 28, 2016
    • James Thomas's avatar
      [SPARK-16114][SQL] structured streaming network word count examples · 3554713a
      James Thomas authored
      ## What changes were proposed in this pull request?
      
      Network word count example for structured streaming
      
      ## How was this patch tested?
      
      Run locally
      
      Author: James Thomas <jamesjoethomas@gmail.com>
      Author: James Thomas <jamesthomas@Jamess-MacBook-Pro.local>
      
      Closes #13816 from jjthomas/master.
      3554713a
  26. Jun 27, 2016
  27. Jun 24, 2016
  28. Jun 20, 2016
  29. Jun 17, 2016
    • GayathriMurali's avatar
      [SPARK-15129][R][DOC] R API changes in ML · af2a4b08
      GayathriMurali authored
      ## What changes were proposed in this pull request?
      
      Make user guide changes to SparkR documentation for all changes that happened in 2.0 to Machine Learning APIs
      
      Author: GayathriMurali <gayathri.m@intel.com>
      
      Closes #13285 from GayathriMurali/SPARK-15129.
      af2a4b08
Loading