Skip to content
Snippets Groups Projects
  1. Jun 22, 2017
  2. Jun 21, 2017
  3. Jun 20, 2017
    • Joseph K. Bradley's avatar
      [SPARK-20929][ML] LinearSVC should use its own threshold param · cc67bd57
      Joseph K. Bradley authored
      ## What changes were proposed in this pull request?
      
      LinearSVC should use its own threshold param, rather than the shared one, since it applies to rawPrediction instead of probability.  This PR changes the param in the Scala, Python and R APIs.
      
      ## How was this patch tested?
      
      New unit test to make sure the threshold can be set to any Double value.
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #18151 from jkbradley/ml-2.2-linearsvc-cleanup.
      cc67bd57
  4. Jun 19, 2017
  5. Jun 18, 2017
    • actuaryzhang's avatar
      [SPARK-20892][SPARKR] Add SQL trunc function to SparkR · 110ce1f2
      actuaryzhang authored
      ## What changes were proposed in this pull request?
      
      Add SQL trunc function
      
      ## How was this patch tested?
      standard test
      
      Author: actuaryzhang <actuaryzhang10@gmail.com>
      
      Closes #18291 from actuaryzhang/sparkRTrunc2.
      110ce1f2
    • hyukjinkwon's avatar
      [SPARK-21128][R] Remove both "spark-warehouse" and "metastore_db" before listing files in R tests · 05f83c53
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      This PR proposes to list the files in test _after_ removing both "spark-warehouse" and "metastore_db" so that the next run of R tests pass fine. This is sometimes a bit annoying.
      
      ## How was this patch tested?
      
      Manually running multiple times R tests via `./R/run-tests.sh`.
      
      **Before**
      
      Second run:
      
      ```
      SparkSQL functions: Spark package found in SPARK_HOME: .../spark
      ...............................................................................................................................................................
      ...............................................................................................................................................................
      ...............................................................................................................................................................
      ...............................................................................................................................................................
      ...............................................................................................................................................................
      ....................................................................................................1234.......................
      
      Failed -------------------------------------------------------------------------
      1. Failure: No extra files are created in SPARK_HOME by starting session and making calls (test_sparkSQL.R#3384)
      length(list1) not equal to length(list2).
      1/1 mismatches
      [1] 25 - 23 == 2
      
      2. Failure: No extra files are created in SPARK_HOME by starting session and making calls (test_sparkSQL.R#3384)
      sort(list1, na.last = TRUE) not equal to sort(list2, na.last = TRUE).
      10/25 mismatches
      x[16]: "metastore_db"
      y[16]: "pkg"
      
      x[17]: "pkg"
      y[17]: "R"
      
      x[18]: "R"
      y[18]: "README.md"
      
      x[19]: "README.md"
      y[19]: "run-tests.sh"
      
      x[20]: "run-tests.sh"
      y[20]: "SparkR_2.2.0.tar.gz"
      
      x[21]: "metastore_db"
      y[21]: "pkg"
      
      x[22]: "pkg"
      y[22]: "R"
      
      x[23]: "R"
      y[23]: "README.md"
      
      x[24]: "README.md"
      y[24]: "run-tests.sh"
      
      x[25]: "run-tests.sh"
      y[25]: "SparkR_2.2.0.tar.gz"
      
      3. Failure: No extra files are created in SPARK_HOME by starting session and making calls (test_sparkSQL.R#3388)
      length(list1) not equal to length(list2).
      1/1 mismatches
      [1] 25 - 23 == 2
      
      4. Failure: No extra files are created in SPARK_HOME by starting session and making calls (test_sparkSQL.R#3388)
      sort(list1, na.last = TRUE) not equal to sort(list2, na.last = TRUE).
      10/25 mismatches
      x[16]: "metastore_db"
      y[16]: "pkg"
      
      x[17]: "pkg"
      y[17]: "R"
      
      x[18]: "R"
      y[18]: "README.md"
      
      x[19]: "README.md"
      y[19]: "run-tests.sh"
      
      x[20]: "run-tests.sh"
      y[20]: "SparkR_2.2.0.tar.gz"
      
      x[21]: "metastore_db"
      y[21]: "pkg"
      
      x[22]: "pkg"
      y[22]: "R"
      
      x[23]: "R"
      y[23]: "README.md"
      
      x[24]: "README.md"
      y[24]: "run-tests.sh"
      
      x[25]: "run-tests.sh"
      y[25]: "SparkR_2.2.0.tar.gz"
      
      DONE ===========================================================================
      ```
      
      **After**
      
      Second run:
      
      ```
      SparkSQL functions: Spark package found in SPARK_HOME: .../spark
      ...............................................................................................................................................................
      ...............................................................................................................................................................
      ...............................................................................................................................................................
      ...............................................................................................................................................................
      ...............................................................................................................................................................
      ...............................................................................................................................
      ```
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #18335 from HyukjinKwon/SPARK-21128.
      05f83c53
  6. Jun 16, 2017
    • Yuming Wang's avatar
      [MINOR][DOCS] Improve Running R Tests docs · 45824fb6
      Yuming Wang authored
      ## What changes were proposed in this pull request?
      
      Update Running R Tests dependence packages to:
      ```bash
      R -e "install.packages(c('knitr', 'rmarkdown', 'testthat', 'e1071', 'survival'), repos='http://cran.us.r-project.org')"
      ```
      
      ## How was this patch tested?
      manual tests
      
      Author: Yuming Wang <wgyumg@gmail.com>
      
      Closes #18271 from wangyum/building-spark.
      45824fb6
  7. Jun 15, 2017
    • Xiao Li's avatar
      [SPARK-20980][SQL] Rename `wholeFile` to `multiLine` for both CSV and JSON · 20514281
      Xiao Li authored
      ### What changes were proposed in this pull request?
      The current option name `wholeFile` is misleading for CSV users. Currently, it is not representing a record per file. Actually, one file could have multiple records. Thus, we should rename it. Now, the proposal is `multiLine`.
      
      ### How was this patch tested?
      N/A
      
      Author: Xiao Li <gatorsmile@gmail.com>
      
      Closes #18202 from gatorsmile/renameCVSOption.
      20514281
  8. Jun 11, 2017
    • Felix Cheung's avatar
      [SPARK-20877][SPARKR][FOLLOWUP] clean up after test move · 9f4ff955
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      clean up after big test move
      
      ## How was this patch tested?
      
      unit tests, jenkins
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #18267 from felixcheung/rtestset2.
      9f4ff955
    • Felix Cheung's avatar
      [SPARK-20877][SPARKR] refactor tests to basic tests only for CRAN · dc4c3518
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      Move all existing tests to non-installed directory so that it will never run by installing SparkR package
      
      For a follow-up PR:
      - remove all skip_on_cran() calls in tests
      - clean up test timer
      - improve or change basic tests that do run on CRAN (if anyone has suggestion)
      
      It looks like `R CMD build pkg` will still put pkg\tests (ie. the full tests) into the source package but `R CMD INSTALL` on such source package does not install these tests (and so `R CMD check` does not run them)
      
      ## How was this patch tested?
      
      - [x] unit tests, Jenkins
      - [x] AppVeyor
      - [x] make a source package, install it, `R CMD check` it - verify the full tests are not installed or run
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #18264 from felixcheung/rtestset.
      dc4c3518
  9. Jun 09, 2017
    • Reynold Xin's avatar
      [SPARK-21042][SQL] Document Dataset.union is resolution by position · b78e3849
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      Document Dataset.union is resolution by position, not by name, since this has been a confusing point for a lot of users.
      
      ## How was this patch tested?
      N/A - doc only change.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #18256 from rxin/SPARK-21042.
      b78e3849
  10. May 31, 2017
    • Felix Cheung's avatar
      [SPARK-20877][SPARKR][WIP] add timestamps to test runs · 382fefd1
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      to investigate how long they run
      
      ## How was this patch tested?
      
      Jenkins, AppVeyor
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #18104 from felixcheung/rtimetest.
      382fefd1
  11. May 26, 2017
    • Zheng RuiFeng's avatar
      [SPARK-20849][DOC][SPARKR] Document R DecisionTree · a97c4970
      Zheng RuiFeng authored
      ## What changes were proposed in this pull request?
      1, add an example for sparkr `decisionTree`
      2, document it in user guide
      
      ## How was this patch tested?
      local submit
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #18067 from zhengruifeng/dt_example.
      a97c4970
  12. May 23, 2017
    • Yanbo Liang's avatar
      [MINOR][SPARKR][ML] Joint coefficients with intercept for SparkR linear SVM summary. · ad09e4ca
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      Joint coefficients with intercept for SparkR linear SVM summary.
      
      ## How was this patch tested?
      Existing tests.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #18035 from yanboliang/svm-r.
      ad09e4ca
    • Shivaram Venkataraman's avatar
      [SPARK-20727] Skip tests that use Hadoop utils on CRAN Windows · d06610f9
      Shivaram Venkataraman authored
      ## What changes were proposed in this pull request?
      
      This change skips tests that use the Hadoop libraries while running
      on CRAN check with Windows as the operating system. This is to handle
      cases where the Hadoop winutils binaries are missing on the target
      system. The skipped tests consist of
      1. Tests that save, load a model in MLlib
      2. Tests that save, load CSV, JSON and Parquet files in SQL
      3. Hive tests
      
      ## How was this patch tested?
      
      Tested by running on a local windows VM with HADOOP_HOME unset. Also testing with https://win-builder.r-project.org
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #17966 from shivaram/sparkr-windows-cran.
      d06610f9
  13. May 22, 2017
  14. May 19, 2017
  15. May 14, 2017
  16. May 12, 2017
    • hyukjinkwon's avatar
      [SPARK-20639][SQL] Add single argument support for to_timestamp in SQL with... · 720708cc
      hyukjinkwon authored
      [SPARK-20639][SQL] Add single argument support for to_timestamp in SQL with documentation improvement
      
      ## What changes were proposed in this pull request?
      
      This PR proposes three things as below:
      
      - Use casting rules to a timestamp in `to_timestamp` by default (it was `yyyy-MM-dd HH:mm:ss`).
      
      - Support single argument for `to_timestamp` similarly with APIs in other languages.
      
        For example, the one below works
      
        ```
        import org.apache.spark.sql.functions._
        Seq("2016-12-31 00:12:00.00").toDF("a").select(to_timestamp(col("a"))).show()
        ```
      
        prints
      
        ```
        +----------------------------------------+
        |to_timestamp(`a`, 'yyyy-MM-dd HH:mm:ss')|
        +----------------------------------------+
        |                     2016-12-31 00:12:00|
        +----------------------------------------+
        ```
      
        whereas this does not work in SQL.
      
        **Before**
      
        ```
        spark-sql> SELECT to_timestamp('2016-12-31 00:12:00');
        Error in query: Invalid number of arguments for function to_timestamp; line 1 pos 7
        ```
      
        **After**
      
        ```
        spark-sql> SELECT to_timestamp('2016-12-31 00:12:00');
        2016-12-31 00:12:00
        ```
      
      - Related document improvement for SQL function descriptions and other API descriptions accordingly.
      
        **Before**
      
        ```
        spark-sql> DESCRIBE FUNCTION extended to_date;
        ...
        Usage: to_date(date_str, fmt) - Parses the `left` expression with the `fmt` expression. Returns null with invalid input.
        Extended Usage:
            Examples:
              > SELECT to_date('2016-12-31', 'yyyy-MM-dd');
               2016-12-31
        ```
      
        ```
        spark-sql> DESCRIBE FUNCTION extended to_timestamp;
        ...
        Usage: to_timestamp(timestamp, fmt) - Parses the `left` expression with the `format` expression to a timestamp. Returns null with invalid input.
        Extended Usage:
            Examples:
              > SELECT to_timestamp('2016-12-31', 'yyyy-MM-dd');
               2016-12-31 00:00:00.0
        ```
      
        **After**
      
        ```
        spark-sql> DESCRIBE FUNCTION extended to_date;
        ...
        Usage:
            to_date(date_str[, fmt]) - Parses the `date_str` expression with the `fmt` expression to
              a date. Returns null with invalid input. By default, it follows casting rules to a date if
              the `fmt` is omitted.
      
        Extended Usage:
            Examples:
              > SELECT to_date('2009-07-30 04:17:52');
               2009-07-30
              > SELECT to_date('2016-12-31', 'yyyy-MM-dd');
               2016-12-31
        ```
      
        ```
        spark-sql> DESCRIBE FUNCTION extended to_timestamp;
        ...
         Usage:
            to_timestamp(timestamp[, fmt]) - Parses the `timestamp` expression with the `fmt` expression to
              a timestamp. Returns null with invalid input. By default, it follows casting rules to
              a timestamp if the `fmt` is omitted.
      
        Extended Usage:
            Examples:
              > SELECT to_timestamp('2016-12-31 00:12:00');
               2016-12-31 00:12:00
              > SELECT to_timestamp('2016-12-31', 'yyyy-MM-dd');
               2016-12-31 00:00:00
        ```
      
      ## How was this patch tested?
      
      Added tests in `datetime.sql`.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #17901 from HyukjinKwon/to_timestamp_arg.
      720708cc
    • Felix Cheung's avatar
      [SPARK-20704][SPARKR] change CRAN test to run single thread · 888b84ab
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      - [x] need to test by running R CMD check --as-cran
      - [x] sanity check vignettes
      
      ## How was this patch tested?
      
      Jenkins
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #17945 from felixcheung/rchangesforpackage.
      888b84ab
  17. May 09, 2017
  18. May 08, 2017
  19. May 07, 2017
    • zero323's avatar
      [SPARK-20550][SPARKR] R wrapper for Dataset.alias · 1f73d358
      zero323 authored
      ## What changes were proposed in this pull request?
      
      - Add SparkR wrapper for `Dataset.alias`.
      - Adjust roxygen annotations for `functions.alias` (including example usage).
      
      ## How was this patch tested?
      
      Unit tests, `check_cran.sh`.
      
      Author: zero323 <zero323@users.noreply.github.com>
      
      Closes #17825 from zero323/SPARK-20550.
      1f73d358
    • Felix Cheung's avatar
      [SPARK-20543][SPARKR][FOLLOWUP] Don't skip tests on AppVeyor · 7087e011
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      add environment
      
      ## How was this patch tested?
      
      wait for appveyor run
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #17878 from felixcheung/appveyorrcran.
      7087e011
  20. May 04, 2017
    • Felix Cheung's avatar
      [SPARK-20571][SPARKR][SS] Flaky Structured Streaming tests · 57b64703
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      Make tests more reliable by having it till processed.
      Increasing timeout value might help but ultimately the flakiness from processing delay when Jenkins is hard to account for. This isn't an actual public API supported
      
      ## How was this patch tested?
      unit tests
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #17857 from felixcheung/rsstestrelia.
      57b64703
    • zero323's avatar
      [SPARK-20544][SPARKR] R wrapper for input_file_name · f21897fc
      zero323 authored
      ## What changes were proposed in this pull request?
      
      Adds wrapper for `o.a.s.sql.functions.input_file_name`
      
      ## How was this patch tested?
      
      Existing unit tests, additional unit tests, `check-cran.sh`.
      
      Author: zero323 <zero323@users.noreply.github.com>
      
      Closes #17818 from zero323/SPARK-20544.
      f21897fc
    • zero323's avatar
      [SPARK-20585][SPARKR] R generic hint support · 9c36aa27
      zero323 authored
      ## What changes were proposed in this pull request?
      
      Adds support for generic hints on `SparkDataFrame`
      
      ## How was this patch tested?
      
      Unit tests, `check-cran.sh`
      
      Author: zero323 <zero323@users.noreply.github.com>
      
      Closes #17851 from zero323/SPARK-20585.
      9c36aa27
    • Felix Cheung's avatar
      [SPARK-20015][SPARKR][SS][DOC][EXAMPLE] Document R Structured Streaming... · b8302ccd
      Felix Cheung authored
      [SPARK-20015][SPARKR][SS][DOC][EXAMPLE] Document R Structured Streaming (experimental) in R vignettes and R & SS programming guide, R example
      
      ## What changes were proposed in this pull request?
      
      Add
      - R vignettes
      - R programming guide
      - SS programming guide
      - R example
      
      Also disable spark.als in vignettes for now since it's failing (SPARK-20402)
      
      ## How was this patch tested?
      
      manually
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #17814 from felixcheung/rdocss.
      b8302ccd
  21. May 03, 2017
    • Felix Cheung's avatar
      [SPARK-20543][SPARKR] skip tests when running on CRAN · fc472bdd
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      General rule on skip or not:
      skip if
      - RDD tests
      - tests could run long or complicated (streaming, hivecontext)
      - tests on error conditions
      - tests won't likely change/break
      
      ## How was this patch tested?
      
      unit tests, `R CMD check --as-cran`, `R CMD check`
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #17817 from felixcheung/rskiptest.
      fc472bdd
  22. May 02, 2017
  23. May 01, 2017
    • zero323's avatar
      [SPARK-20532][SPARKR] Implement grouping and grouping_id · 90d77e97
      zero323 authored
      ## What changes were proposed in this pull request?
      
      Adds R wrappers for:
      
      - `o.a.s.sql.functions.grouping` as `o.a.s.sql.functions.is_grouping` (to avoid shading `base::grouping`
      - `o.a.s.sql.functions.grouping_id`
      
      ## How was this patch tested?
      
      Existing unit tests, additional unit tests. `check-cran.sh`.
      
      Author: zero323 <zero323@users.noreply.github.com>
      
      Closes #17807 from zero323/SPARK-20532.
      90d77e97
    • Felix Cheung's avatar
      [SPARK-20541][SPARKR][SS] support awaitTermination without timeout · a355b667
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      Add without param for timeout - will need this to submit a job that runs until stopped
      Need this for 2.2
      
      ## How was this patch tested?
      
      manually, unit test
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #17815 from felixcheung/rssawaitinfinite.
      a355b667
    • zero323's avatar
      [SPARK-20490][SPARKR] Add R wrappers for eqNullSafe and ! / not · 80e9cf1b
      zero323 authored
      ## What changes were proposed in this pull request?
      
      - Add null-safe equality operator `%<=>%` (sames as `o.a.s.sql.Column.eqNullSafe`, `o.a.s.sql.Column.<=>`)
      - Add boolean negation operator `!` and function `not `.
      
      ## How was this patch tested?
      
      Existing unit tests, additional unit tests, `check-cran.sh`.
      
      Author: zero323 <zero323@users.noreply.github.com>
      
      Closes #17783 from zero323/SPARK-20490.
      80e9cf1b
  24. Apr 30, 2017
    • zero323's avatar
      [SPARK-20535][SPARKR] R wrappers for explode_outer and posexplode_outer · ae3df4e9
      zero323 authored
      ## What changes were proposed in this pull request?
      
      Ad R wrappers for
      
      - `o.a.s.sql.functions.explode_outer`
      - `o.a.s.sql.functions.posexplode_outer`
      
      ## How was this patch tested?
      
      Additional unit tests, manual testing.
      
      Author: zero323 <zero323@users.noreply.github.com>
      
      Closes #17809 from zero323/SPARK-20535.
      ae3df4e9
Loading