Skip to content
Snippets Groups Projects
  1. May 01, 2017
    • zero323's avatar
      [SPARK-20532][SPARKR] Implement grouping and grouping_id · 90d77e97
      zero323 authored
      ## What changes were proposed in this pull request?
      
      Adds R wrappers for:
      
      - `o.a.s.sql.functions.grouping` as `o.a.s.sql.functions.is_grouping` (to avoid shading `base::grouping`
      - `o.a.s.sql.functions.grouping_id`
      
      ## How was this patch tested?
      
      Existing unit tests, additional unit tests. `check-cran.sh`.
      
      Author: zero323 <zero323@users.noreply.github.com>
      
      Closes #17807 from zero323/SPARK-20532.
      90d77e97
    • Felix Cheung's avatar
      [SPARK-20541][SPARKR][SS] support awaitTermination without timeout · a355b667
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      Add without param for timeout - will need this to submit a job that runs until stopped
      Need this for 2.2
      
      ## How was this patch tested?
      
      manually, unit test
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #17815 from felixcheung/rssawaitinfinite.
      a355b667
    • zero323's avatar
      [SPARK-20490][SPARKR] Add R wrappers for eqNullSafe and ! / not · 80e9cf1b
      zero323 authored
      ## What changes were proposed in this pull request?
      
      - Add null-safe equality operator `%<=>%` (sames as `o.a.s.sql.Column.eqNullSafe`, `o.a.s.sql.Column.<=>`)
      - Add boolean negation operator `!` and function `not `.
      
      ## How was this patch tested?
      
      Existing unit tests, additional unit tests, `check-cran.sh`.
      
      Author: zero323 <zero323@users.noreply.github.com>
      
      Closes #17783 from zero323/SPARK-20490.
      80e9cf1b
  2. Apr 30, 2017
    • zero323's avatar
      [SPARK-20535][SPARKR] R wrappers for explode_outer and posexplode_outer · ae3df4e9
      zero323 authored
      ## What changes were proposed in this pull request?
      
      Ad R wrappers for
      
      - `o.a.s.sql.functions.explode_outer`
      - `o.a.s.sql.functions.posexplode_outer`
      
      ## How was this patch tested?
      
      Additional unit tests, manual testing.
      
      Author: zero323 <zero323@users.noreply.github.com>
      
      Closes #17809 from zero323/SPARK-20535.
      ae3df4e9
  3. Apr 29, 2017
    • hyukjinkwon's avatar
      [SPARK-20493][R] De-duplicate parse logics for DDL-like type strings in R · 70f1bcd7
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      It seems we are using `SQLUtils.getSQLDataType` for type string in structField. It looks we can replace this with `CatalystSqlParser.parseDataType`.
      
      They look similar DDL-like type definitions as below:
      
      ```scala
      scala> Seq(Tuple1(Tuple1("a"))).toDF.show()
      ```
      ```
      +---+
      | _1|
      +---+
      |[a]|
      +---+
      ```
      
      ```scala
      scala> Seq(Tuple1(Tuple1("a"))).toDF.select($"_1".cast("struct<_1:string>")).show()
      ```
      ```
      +---+
      | _1|
      +---+
      |[a]|
      +---+
      ```
      
      Such type strings looks identical when R’s one as below:
      
      ```R
      > write.df(sql("SELECT named_struct('_1', 'a') as struct"), "/tmp/aa", "parquet")
      > collect(read.df("/tmp/aa", "parquet", structType(structField("struct", "struct<_1:string>"))))
        struct
      1      a
      ```
      
      R’s one is stricter because we are checking the types via regular expressions in R side ahead.
      
      Actual logics there look a bit different but as we check it ahead in R side, it looks replacing it would not introduce (I think) no behaviour changes. To make this sure, the tests dedicated for it were added in SPARK-20105. (It looks `structField` is the only place that calls this method).
      
      ## How was this patch tested?
      
      Existing tests - https://github.com/apache/spark/blob/master/R/pkg/inst/tests/testthat/test_sparkSQL.R#L143-L194 should cover this.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #17785 from HyukjinKwon/SPARK-20493.
      70f1bcd7
  4. Apr 27, 2017
    • zero323's avatar
      [DOCS][MINOR] Add missing since to SparkR repeat_string note. · b58cf77c
      zero323 authored
      ## What changes were proposed in this pull request?
      
      Replace
      
          note repeat_string 2.3.0
      
      with
      
          note repeat_string since 2.3.0
      
      ## How was this patch tested?
      
      `create-docs.sh`
      
      Author: zero323 <zero323@users.noreply.github.com>
      
      Closes #17779 from zero323/REPEAT-NOTE.
      b58cf77c
    • Takeshi Yamamuro's avatar
      [SPARK-20425][SQL] Support a vertical display mode for Dataset.show · b4724db1
      Takeshi Yamamuro authored
      ## What changes were proposed in this pull request?
      This pr added a new display mode for `Dataset.show` to print output rows vertically (one line per column value). In the current master, when printing Dataset with many columns, the readability is low like;
      
      ```
      scala> val df = spark.range(100).selectExpr((0 until 100).map(i => s"rand() AS c$i"): _*)
      scala> df.show(3, 0)
      +------------------+------------------+------------------+-------------------+------------------+------------------+-------------------+------------------+------------------+------------------+------------------+-------------------+------------------+------------------+------------------+-------------------+-------------------+-------------------+------------------+------------------+-------------------+------------------+-------------------+------------------+-------------------+-------------------+-------------------+--------------------+-------------------+------------------+-------------------+--------------------+------------------+------------------+-------------------+-------------------+-------------------+------------------+------------------+-------------------+------------------+------------------+-------------------+-------------------+-------------------+------------------+--------------------+--------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+--------------------+-------------------+-------------------+-------------------+-------------------+------------------+------------------+-------------------+-------------------+------------------+-------------------+------------------+------------------+-----------------+-------------------+-------------------+------------------+-------------------+------------------+-------------------+-------------------+-------------------+------------------+-------------------+------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+------------------+-------------------+-------------------+------------------+------------------+------------------+-------------------+------------------+-------------------+------------------+-------------------+-------------------+-------------------+
      |c0                |c1                |c2                |c3                 |c4                |c5                |c6                 |c7                |c8                |c9                |c10               |c11                |c12               |c13               |c14               |c15                |c16                |c17                |c18               |c19               |c20                |c21               |c22                |c23               |c24                |c25                |c26                |c27                 |c28                |c29               |c30                |c31                 |c32               |c33               |c34                |c35                |c36                |c37               |c38               |c39                |c40               |c41               |c42                |c43                |c44                |c45               |c46                 |c47                 |c48                |c49                |c50                |c51                |c52                |c53                |c54                 |c55                |c56                |c57                |c58                |c59               |c60               |c61                |c62                |c63               |c64                |c65               |c66               |c67              |c68                |c69                |c70               |c71                |c72               |c73                |c74                |c75                |c76               |c77                |c78               |c79                |c80                |c81                |c82                |c83                |c84                |c85                |c86                |c87               |c88                |c89                |c90               |c91               |c92               |c93                |c94               |c95                |c96               |c97                |c98                |c99                |
      +------------------+------------------+------------------+-------------------+------------------+------------------+-------------------+------------------+------------------+------------------+------------------+-------------------+------------------+------------------+------------------+-------------------+-------------------+-------------------+------------------+------------------+-------------------+------------------+-------------------+------------------+-------------------+-------------------+-------------------+--------------------+-------------------+------------------+-------------------+--------------------+------------------+------------------+-------------------+-------------------+-------------------+------------------+------------------+-------------------+------------------+------------------+-------------------+-------------------+-------------------+------------------+--------------------+--------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+--------------------+-------------------+-------------------+-------------------+-------------------+------------------+------------------+-------------------+-------------------+------------------+-------------------+------------------+------------------+-----------------+-------------------+-------------------+------------------+-------------------+------------------+-------------------+-------------------+-------------------+------------------+-------------------+------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+------------------+-------------------+-------------------+------------------+------------------+------------------+-------------------+------------------+-------------------+------------------+-------------------+-------------------+-------------------+
      |0.6306087152476858|0.9174349686288383|0.5511324165035159|0.3320844128641819 |0.7738486877101489|0.2154915886962553|0.4754997600674299 |0.922780639280355 |0.7136894772661909|0.2277580838165979|0.5926874459847249|0.40311408392226633|0.467830264333843 |0.8330466896984213|0.1893258482389527|0.6320849515511165 |0.7530911056912044 |0.06700254871955424|0.370528597355559 |0.2755437445193154|0.23704391110980128|0.8067400174905822|0.13597793616251852|0.1708888820162453|0.01672725007605702|0.983118121881555  |0.25040195628629924|0.060537253723083384|0.20000530582637488|0.3400572407133511|0.9375689433322597 |0.057039316954370256|0.8053269714347623|0.5247817572228813|0.28419308820527944|0.9798908885194533 |0.31805988175678146|0.7034448027077574|0.5400575751346084|0.25336322371116216|0.9361634546853429|0.6118681368289798|0.6295081549153907 |0.13417468943957422|0.41617137072255794|0.7267230869252035|0.023792726137561115|0.5776157058356362  |0.04884204913195467|0.26728716103441275|0.646680370807925  |0.9782712690657244 |0.16434031314818154|0.20985522381321275|0.24739842475440077 |0.26335189682977334|0.19604841662422068|0.10742950487300651|0.20283136488091502|0.3100312319723688|0.886959006630645 |0.25157102269776244|0.34428775168410786|0.3500506818575777|0.3781142441912052 |0.8560316444386715|0.4737104888956839|0.735903101602148|0.02236617130529006|0.8769074095835873 |0.2001426662503153|0.5534032319238532 |0.7289496620397098|0.41955191309992157|0.9337700133660436 |0.34059094378451005|0.6419144759403556|0.08167496930341167|0.9947099478497635|0.48010888605366586|0.22314796858167918|0.17786598882331306|0.7351521162297135 |0.5422057170020095 |0.9521927872726792 |0.7459825486368227 |0.40907708791990627|0.8903819313311575|0.7251413746923618 |0.2977174938745204 |0.9515209660203555|0.9375968604766713|0.5087851740042524|0.4255237544908751 |0.8023768698664653|0.48003189618006703|0.1775841829745185|0.09050775629268382|0.6743909291138167 |0.2498415755876865 |
      |0.6866473844170801|0.4774360641212433|0.631696201340726 |0.33979113021468343|0.5663049010847052|0.7280190472258865|0.41370958502324806|0.9977433873622218|0.7671957338989901|0.2788708556233931|0.3355106391656496|0.88478952319287   |0.0333974166999893|0.6061744715862606|0.9617779139652359|0.22484954822341863|0.12770906021550898|0.5577789629508672 |0.2877649024640704|0.5566577406549361|0.9334933255278052 |0.9166720585157266|0.9689249324600591 |0.6367502457478598|0.7993572745928459 |0.23213222324218108|0.11928284054154137|0.6173493362456599  |0.0505122058694798 |0.9050228629552983|0.17112767911121707|0.47395598348370005 |0.5820498657823081|0.6241124650645072|0.18587258258036776|0.14987593554122225|0.3079446253653946 |0.9414228822867968|0.8362276265462365|0.9155655305576353 |0.5121559807153562|0.8963362656525707|0.22765970274318037|0.8177039187132797 |0.8190326635933787 |0.5256005177032199|0.8167598457269669  |0.030936807130934496|0.6733006585281015 |0.4208049626816347 |0.24603085738518538|0.22719198954208153|0.1622280557565281 |0.22217325159218038|0.014684419513742553|0.08987111517447499|0.2157764759142622 |0.8223414104088321 |0.4868624404491777 |0.4016191733088167|0.6169281906889263|0.15603611040433385|0.18289285085714913|0.9538408988218972|0.15037154865295121|0.5364516961987454|0.8077254873163031|0.712600478545675|0.7277477241003857 |0.19822912960348305|0.8305051199208777|0.18631911396566114|0.8909532487898342|0.3470409226992506 |0.35306974180587636|0.9107058868891469 |0.3321327206004986|0.48952332459050607|0.3630403307479373|0.5400046826340376 |0.5387377194310529 |0.42860539421837585|0.23214101630985995|0.21438968839794847|0.15370603160082352|0.04355605642700022|0.6096006707067466 |0.6933354157094292|0.06302172470859002|0.03174631856164001|0.664243581650643 |0.7833239547446621|0.696884598352864 |0.34626385933237736|0.9263495598791336|0.404818892816584  |0.2085585394755507|0.6150004897990109 |0.05391193524302473|0.28188484028329097|
      +------------------+------------------+------------------+-------------------+------------------+------------------+-------------------+------------------+------------------+------------------+------------------+-------------------+------------------+------------------+------------------+-------------------+-------------------+-------------------+------------------+------------------+-------------------+------------------+-------------------+------------------+-------------------+-------------------+-------------------+--------------------+-------------------+------------------+-------------------+--------------------+------------------+------------------+-------------------+-------------------+-------------------+------------------+------------------+-------------------+------------------+------------------+-------------------+-------------------+-------------------+------------------+--------------------+--------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+--------------------+-------------------+-------------------+-------------------+-------------------+------------------+------------------+-------------------+-------------------+------------------+-------------------+------------------+------------------+-----------------+-------------------+-------------------+------------------+-------------------+------------------+-------------------+-------------------+-------------------+------------------+-------------------+------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+------------------+-------------------+-------------------+------------------+------------------+------------------+-------------------+------------------+-------------------+------------------+-------------------+-------------------+-------------------+
      only showing top 2 rows
      ```
      
      `psql`, CLI for PostgreSQL, supports a vertical display mode for this case like:
      http://stackoverflow.com/questions/9604723/alternate-output-format-for-psql
      
      ```
      -RECORD 0-------------------
       c0  | 0.6306087152476858
       c1  | 0.9174349686288383
       c2  | 0.5511324165035159
      ...
       c98 | 0.05391193524302473
       c99 | 0.28188484028329097
      -RECORD 1-------------------
       c0  | 0.6866473844170801
       c1  | 0.4774360641212433
       c2  | 0.631696201340726
      ...
       c98 | 0.05391193524302473
       c99 | 0.28188484028329097
      only showing top 2 rows
      ```
      
      ## How was this patch tested?
      Added tests in `DataFrameSuite`.
      
      Author: Takeshi Yamamuro <yamamuro@apache.org>
      
      Closes #17733 from maropu/SPARK-20425.
      b4724db1
  5. Apr 26, 2017
    • Yanbo Liang's avatar
      [MINOR][ML] Fix some PySpark & SparkR flaky tests · dbb06c68
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      Some PySpark & SparkR tests run with tiny dataset and tiny ```maxIter```, which means they are not converged. I don’t think checking intermediate result during iteration make sense, and these intermediate result may vulnerable and not stable, so we should switch to check the converged result. We hit this issue at #17746 when we upgrade breeze to 0.13.1.
      
      ## How was this patch tested?
      Existing tests.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #17757 from yanboliang/flaky-test.
      dbb06c68
    • zero323's avatar
      [SPARK-20437][R] R wrappers for rollup and cube · df58a95a
      zero323 authored
      ## What changes were proposed in this pull request?
      
      - Add `rollup` and `cube` methods and corresponding generics.
      - Add short description to the vignette.
      
      ## How was this patch tested?
      
      - Existing unit tests.
      - Additional unit tests covering new features.
      - `check-cran.sh`.
      
      Author: zero323 <zero323@users.noreply.github.com>
      
      Closes #17728 from zero323/SPARK-20437.
      df58a95a
  6. Apr 25, 2017
    • Yanbo Liang's avatar
      [SPARK-20449][ML] Upgrade breeze version to 0.13.1 · 67eef47a
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      Upgrade breeze version to 0.13.1, which fixed some critical bugs of L-BFGS-B.
      
      ## How was this patch tested?
      Existing unit tests.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #17746 from yanboliang/spark-20449.
      Unverified
      67eef47a
  7. Apr 24, 2017
    • zero323's avatar
      [SPARK-20438][R] SparkR wrappers for split and repeat · 8a272ddc
      zero323 authored
      ## What changes were proposed in this pull request?
      
      Add wrappers for `o.a.s.sql.functions`:
      
      - `split` as `split_string`
      - `repeat` as `repeat_string`
      
      ## How was this patch tested?
      
      Existing tests, additional unit tests, `check-cran.sh`
      
      Author: zero323 <zero323@users.noreply.github.com>
      
      Closes #17729 from zero323/SPARK-20438.
      8a272ddc
  8. Apr 21, 2017
    • zero323's avatar
      [SPARK-20371][R] Add wrappers for collect_list and collect_set · fd648bff
      zero323 authored
      ## What changes were proposed in this pull request?
      
      Adds wrappers for `collect_list` and `collect_set`.
      
      ## How was this patch tested?
      
      Unit tests, `check-cran.sh`
      
      Author: zero323 <zero323@users.noreply.github.com>
      
      Closes #17672 from zero323/SPARK-20371.
      fd648bff
  9. Apr 19, 2017
    • zero323's avatar
      [SPARK-20375][R] R wrappers for array and map · 46c57497
      zero323 authored
      ## What changes were proposed in this pull request?
      
      Adds wrappers for `o.a.s.sql.functions.array` and `o.a.s.sql.functions.map`
      
      ## How was this patch tested?
      
      Unit tests, `check-cran.sh`
      
      Author: zero323 <zero323@users.noreply.github.com>
      
      Closes #17674 from zero323/SPARK-20375.
      46c57497
    • Shixiong Zhu's avatar
      [SPARK-20397][SPARKR][SS] Fix flaky test: test_streaming.R.Terminated by error · 4fea7848
      Shixiong Zhu authored
      ## What changes were proposed in this pull request?
      
      Checking a source parameter is asynchronous. When the query is created, it's not guaranteed that source has been created. This PR just increases the timeout of awaitTermination to ensure the parsing error is thrown.
      
      ## How was this patch tested?
      
      Jenkins
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #17687 from zsxwing/SPARK-20397.
      4fea7848
  10. Apr 18, 2017
    • zero323's avatar
      [SPARK-20208][R][DOCS] Document R fpGrowth support · 702d85af
      zero323 authored
      ## What changes were proposed in this pull request?
      
      Document  fpGrowth in:
      
      - vignettes
      - programming guide
      - code example
      
      ## How was this patch tested?
      
      Manual tests.
      
      Author: zero323 <zero323@users.noreply.github.com>
      
      Closes #17557 from zero323/SPARK-20208.
      702d85af
  11. Apr 17, 2017
    • hyukjinkwon's avatar
      [SPARK-19828][R][FOLLOWUP] Rename asJsonArray to as.json.array in from_json function in R · 24f09b39
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      This was suggested to be `as.json.array` at the first place in the PR to SPARK-19828 but we could not do this as the lint check emits an error for multiple dots in the variable names.
      
      After SPARK-20278, now we are able to use `multiple.dots.in.names`. `asJsonArray` in `from_json` function is still able to be changed as 2.2 is not released yet.
      
      So, this PR proposes to rename `asJsonArray` to `as.json.array`.
      
      ## How was this patch tested?
      
      Jenkins tests, local tests with `./R/run-tests.sh` and manual `./dev/lint-r`. Existing tests should cover this.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #17653 from HyukjinKwon/SPARK-19828-followup.
      24f09b39
  12. Apr 16, 2017
    • hyukjinkwon's avatar
      [SPARK-20278][R] Disable 'multiple_dots_linter' lint rule that is against project's code style · 86d251c5
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      Currently, multi-dot separated variables in R is not allowed. For example,
      
      ```diff
       setMethod("from_json", signature(x = "Column", schema = "structType"),
      -          function(x, schema, asJsonArray = FALSE, ...) {
      +          function(x, schema, as.json.array = FALSE, ...) {
                   if (asJsonArray) {
                     jschema <- callJStatic("org.apache.spark.sql.types.DataTypes",
                                            "createArrayType",
      ```
      
      produces an error as below:
      
      ```
      R/functions.R:2462:31: style: Words within variable and function names should be separated by '_' rather than '.'.
                function(x, schema, as.json.array = FALSE, ...) {
                                    ^~~~~~~~~~~~~
      ```
      
      This seems against https://google.github.io/styleguide/Rguide.xml#identifiers which says
      
      > The preferred form for variable names is all lower case letters and words separated with dots
      
      This looks because lintr by default https://github.com/jimhester/lintr follows http://r-pkgs.had.co.nz/style.html as written in the README.md. Few cases seems not following Google's one as "a few tweaks".
      
      Per [SPARK-6813](https://issues.apache.org/jira/browse/SPARK-6813), we follow Google's R Style Guide with few exceptions https://google.github.io/styleguide/Rguide.xml. This is also merged into Spark's website - https://github.com/apache/spark-website/pull/43
      
      Also, it looks we have no limit on function name. This rule also looks affecting to the name of functions as written in the README.md.
      
      > `multiple_dots_linter`: check that function and variable names are separated by _ rather than ..
      
      ## How was this patch tested?
      
      Manually tested `./dev/lint-r`with the manual change below in `R/functions.R`:
      
      ```diff
       setMethod("from_json", signature(x = "Column", schema = "structType"),
      -          function(x, schema, asJsonArray = FALSE, ...) {
      +          function(x, schema, as.json.array = FALSE, ...) {
                   if (asJsonArray) {
                     jschema <- callJStatic("org.apache.spark.sql.types.DataTypes",
                                            "createArrayType",
      ```
      
      **Before**
      
      ```R
      R/functions.R:2462:31: style: Words within variable and function names should be separated by '_' rather than '.'.
                function(x, schema, as.json.array = FALSE, ...) {
                                    ^~~~~~~~~~~~~
      ```
      
      **After**
      
      ```
      lintr checks passed.
      ```
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #17590 from HyukjinKwon/disable-dot-in-name.
      86d251c5
  13. Apr 12, 2017
  14. Apr 07, 2017
  15. Apr 06, 2017
  16. Apr 05, 2017
    • Felix Cheung's avatar
      [SPARKR][DOC] update doc for fpgrowth · c1b8b667
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      minor update
      
      zero323
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #17526 from felixcheung/rfpgrowthfollowup.
      c1b8b667
  17. Apr 04, 2017
    • hyukjinkwon's avatar
      [MINOR][R] Reorder `Collate` fields in DESCRIPTION file · 0e2ee820
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      It seems cran check scripts corrects `R/pkg/DESCRIPTION` and follows the order in `Collate` fields.
      
      This PR proposes to fix `catalog.R`'s order so that running this script does not show up a small diff in this file every time.
      
      ## How was this patch tested?
      
      Manually via `./R/check-cran.sh`.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #17528 from HyukjinKwon/minor-reorder-description.
      0e2ee820
    • zero323's avatar
      [SPARK-19825][R][ML] spark.ml R API for FPGrowth · b34f7665
      zero323 authored
      ## What changes were proposed in this pull request?
      
      Adds SparkR API for FPGrowth: [SPARK-19825](https://issues.apache.org/jira/browse/SPARK-19825):
      
      - `spark.fpGrowth` -model training.
      - `freqItemsets` and `associationRules` methods with new corresponding generics.
      - Scala helper: `org.apache.spark.ml.r. FPGrowthWrapper`
      - unit tests.
      
      ## How was this patch tested?
      
      Feature specific unit tests.
      
      Author: zero323 <zero323@users.noreply.github.com>
      
      Closes #17170 from zero323/SPARK-19825.
      b34f7665
  18. Apr 02, 2017
  19. Mar 27, 2017
  20. Mar 26, 2017
  21. Mar 21, 2017
  22. Mar 20, 2017
    • Wenchen Fan's avatar
      [SPARK-19949][SQL] unify bad record handling in CSV and JSON · 68d65fae
      Wenchen Fan authored
      ## What changes were proposed in this pull request?
      
      Currently JSON and CSV have exactly the same logic about handling bad records, this PR tries to abstract it and put it in a upper level to reduce code duplication.
      
      The overall idea is, we make the JSON and CSV parser to throw a BadRecordException, then the upper level, FailureSafeParser, handles bad records according to the parse mode.
      
      Behavior changes:
      1. with PERMISSIVE mode, if the number of tokens doesn't match the schema, previously CSV parser will treat it as a legal record and parse as many tokens as possible. After this PR, we treat it as an illegal record, and put the raw record string in a special column, but we still parse as many tokens as possible.
      2. all logging is removed as they are not very useful in practice.
      
      ## How was this patch tested?
      
      existing tests
      
      Author: Wenchen Fan <wenchen@databricks.com>
      Author: hyukjinkwon <gurwls223@gmail.com>
      Author: Wenchen Fan <cloud0fan@gmail.com>
      
      Closes #17315 from cloud-fan/bad-record2.
      68d65fae
    • Felix Cheung's avatar
      [SPARK-20020][SPARKR][FOLLOWUP] DataFrame checkpoint API fix version tag · f14f81e9
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      doc only change
      
      ## How was this patch tested?
      
      manual
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #17356 from felixcheung/rdfcheckpoint2.
      f14f81e9
    • Felix Cheung's avatar
      [SPARK-20020][SPARKR] DataFrame checkpoint API · c4059772
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      Add checkpoint, setCheckpointDir API to R
      
      ## How was this patch tested?
      
      unit tests, manual tests
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #17351 from felixcheung/rdfcheckpoint.
      c4059772
    • hyukjinkwon's avatar
      [SPARK-19849][SQL] Support ArrayType in to_json to produce JSON array · 0cdcf911
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      This PR proposes to support an array of struct type in `to_json` as below:
      
      ```scala
      import org.apache.spark.sql.functions._
      
      val df = Seq(Tuple1(Tuple1(1) :: Nil)).toDF("a")
      df.select(to_json($"a").as("json")).show()
      ```
      
      ```
      +----------+
      |      json|
      +----------+
      |[{"_1":1}]|
      +----------+
      ```
      
      Currently, it throws an exception as below (a newline manually inserted for readability):
      
      ```
      org.apache.spark.sql.AnalysisException: cannot resolve 'structtojson(`array`)' due to data type
      mismatch: structtojson requires that the expression is a struct expression.;;
      ```
      
      This allows the roundtrip with `from_json` as below:
      
      ```scala
      import org.apache.spark.sql.functions._
      import org.apache.spark.sql.types._
      
      val schema = ArrayType(StructType(StructField("a", IntegerType) :: Nil))
      val df = Seq("""[{"a":1}, {"a":2}]""").toDF("json").select(from_json($"json", schema).as("array"))
      df.show()
      
      // Read back.
      df.select(to_json($"array").as("json")).show()
      ```
      
      ```
      +----------+
      |     array|
      +----------+
      |[[1], [2]]|
      +----------+
      
      +-----------------+
      |             json|
      +-----------------+
      |[{"a":1},{"a":2}]|
      +-----------------+
      ```
      
      Also, this PR proposes to rename from `StructToJson` to `StructsToJson ` and `JsonToStruct` to `JsonToStructs`.
      
      ## How was this patch tested?
      
      Unit tests in `JsonFunctionsSuite` and `JsonExpressionsSuite` for Scala, doctest for Python and test in `test_sparkSQL.R` for R.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #17192 from HyukjinKwon/SPARK-19849.
      0cdcf911
  23. Mar 19, 2017
    • Felix Cheung's avatar
      [SPARK-18817][SPARKR][SQL] change derby log output to temp dir · 422aa67d
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      Passes R `tempdir()` (this is the R session temp dir, shared with other temp files/dirs) to JVM, set System.Property for derby home dir to move derby.log
      
      ## How was this patch tested?
      
      Manually, unit tests
      
      With this, these are relocated to under /tmp
      ```
      # ls /tmp/RtmpG2M0cB/
      derby.log
      ```
      And they are removed automatically when the R session is ended.
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #16330 from felixcheung/rderby.
      422aa67d
    • hyukjinkwon's avatar
      [MINOR][R] Reorder `Collate` fields in DESCRIPTION file · 60262bc9
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      It seems cran check scripts corrects `R/pkg/DESCRIPTION` and follows the order in `Collate` fields.
      
      This PR proposes to fix this so that running this script does not show up a diff in this file.
      
      ## How was this patch tested?
      
      Manually via `./R/check-cran.sh`.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #17349 from HyukjinKwon/minor-cran.
      60262bc9
  24. Mar 18, 2017
  25. Mar 14, 2017
    • hyukjinkwon's avatar
      [SPARK-19828][R] Support array type in from_json in R · d1f6c64c
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      Since we could not directly define the array type in R, this PR proposes to support array types in R as string types that are used in `structField` as below:
      
      ```R
      jsonArr <- "[{\"name\":\"Bob\"}, {\"name\":\"Alice\"}]"
      df <- as.DataFrame(list(list("people" = jsonArr)))
      collect(select(df, alias(from_json(df$people, "array<struct<name:string>>"), "arrcol")))
      ```
      
      prints
      
      ```R
            arrcol
      1 Bob, Alice
      ```
      
      ## How was this patch tested?
      
      Unit tests in `test_sparkSQL.R`.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #17178 from HyukjinKwon/SPARK-19828.
      d1f6c64c
    • actuaryzhang's avatar
      [SPARK-19391][SPARKR][ML] Tweedie GLM API for SparkR · f6314eab
      actuaryzhang authored
      ## What changes were proposed in this pull request?
      Port Tweedie GLM  #16344  to SparkR
      
      felixcheung yanboliang
      
      ## How was this patch tested?
      new test in SparkR
      
      Author: actuaryzhang <actuaryzhang10@gmail.com>
      
      Closes #16729 from actuaryzhang/sparkRTweedie.
      f6314eab
Loading