Skip to content
Snippets Groups Projects
  1. Aug 27, 2015
  2. Aug 26, 2015
    • Yu ISHIKAWA's avatar
      [MINOR] [SPARKR] Fix some validation problems in SparkR · 773ca037
      Yu ISHIKAWA authored
      Getting rid of some validation problems in SparkR
      https://github.com/apache/spark/pull/7883
      
      cc shivaram
      
      ```
      inst/tests/test_Serde.R:26:1: style: Trailing whitespace is superfluous.
      
      ^~
      inst/tests/test_Serde.R:34:1: style: Trailing whitespace is superfluous.
      
      ^~
      inst/tests/test_Serde.R:37:38: style: Trailing whitespace is superfluous.
        expect_equal(class(x), "character")
                                           ^~
      inst/tests/test_Serde.R:50:1: style: Trailing whitespace is superfluous.
      
      ^~
      inst/tests/test_Serde.R:55:1: style: Trailing whitespace is superfluous.
      
      ^~
      inst/tests/test_Serde.R:60:1: style: Trailing whitespace is superfluous.
      
      ^~
      inst/tests/test_sparkSQL.R:611:1: style: Trailing whitespace is superfluous.
      
      ^~
      R/DataFrame.R:664:1: style: Trailing whitespace is superfluous.
      
      ^~~~~~~~~~~~~~
      R/DataFrame.R:670:55: style: Trailing whitespace is superfluous.
                      df <- data.frame(row.names = 1 : nrow)
                                                            ^~~~~~~~~~~~~~~~
      R/DataFrame.R:672:1: style: Trailing whitespace is superfluous.
      
      ^~~~~~~~~~~~~~
      R/DataFrame.R:686:49: style: Trailing whitespace is superfluous.
                          df[[names[colIndex]]] <- vec
                                                      ^~~~~~~~~~~~~~~~~~
      ```
      
      Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
      
      Closes #8474 from yu-iskw/minor-fix-sparkr.
      773ca037
    • Shivaram Venkataraman's avatar
      [SPARK-10308] [SPARKR] Add %in% to the exported namespace · ad7f0f16
      Shivaram Venkataraman authored
      I also checked all the other functions defined in column.R, functions.R and DataFrame.R and everything else looked fine.
      
      cc yu-iskw
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #8473 from shivaram/in-namespace.
      ad7f0f16
    • felixcheung's avatar
      [SPARK-9316] [SPARKR] Add support for filtering using `[` (synonym for filter / select) · 75d4773a
      felixcheung authored
      Add support for
      ```
         df[df$name == "Smith", c(1,2)]
         df[df$age %in% c(19, 30), 1:2]
      ```
      
      shivaram
      
      Author: felixcheung <felixcheung_m@hotmail.com>
      
      Closes #8394 from felixcheung/rsubset.
      75d4773a
  3. Aug 25, 2015
  4. Aug 24, 2015
  5. Aug 19, 2015
  6. Aug 18, 2015
  7. Aug 17, 2015
    • Yu ISHIKAWA's avatar
      [SPARK-9871] [SPARKR] Add expression functions into SparkR which have a variable parameter · 26e76058
      Yu ISHIKAWA authored
      ### Summary
      
      - Add `lit` function
      - Add `concat`, `greatest`, `least` functions
      
      I think we need to improve `collect` function in order to implement `struct` function. Since `collect` doesn't work with arguments which includes a nested `list` variable. It seems that a list against `struct` still has `jobj` classes. So it would be better to solve this problem on another issue.
      
      ### JIRA
      [[SPARK-9871] Add expression functions into SparkR which have a variable parameter - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-9871)
      
      Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
      
      Closes #8194 from yu-iskw/SPARK-9856.
      26e76058
  8. Aug 16, 2015
    • Sun Rui's avatar
      [SPARK-8844] [SPARKR] head/collect is broken in SparkR. · 5f9ce738
      Sun Rui authored
      This is a WIP patch for SPARK-8844  for collecting reviews.
      
      This bug is about reading an empty DataFrame. in readCol(),
            lapply(1:numRows, function(x) {
      does not take into consideration the case where numRows = 0.
      
      Will add unit test case.
      
      Author: Sun Rui <rui.sun@intel.com>
      
      Closes #7419 from sun-rui/SPARK-8844.
      5f9ce738
  9. Aug 12, 2015
    • Burak Yavuz's avatar
      [SPARK-9916] [BUILD] [SPARKR] removed left-over sparkr.zip copy/create commands from codebase · 2fb4901b
      Burak Yavuz authored
      sparkr.zip is now built by SparkSubmit on a need-to-build basis.
      
      cc shivaram
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      
      Closes #8147 from brkyvz/make-dist-fix.
      2fb4901b
    • Yu ISHIKAWA's avatar
      [SPARK-9855] [SPARKR] Add expression functions into SparkR whose params are simple · f4bc01f1
      Yu ISHIKAWA authored
      I added lots of expression functions for SparkR. This PR includes only functions whose params  are only `(Column)` or `(Column, Column)`.  And I think we need to improve how to test those functions. However, it would be better to work on another issue.
      
      ## Diff Summary
      
      - Add lots of functions in `functions.R` and their generic in `generic.R`
      - Add aliases for `ceiling` and `sign`
      - Move expression functions from `column.R` to `functions.R`
      - Modify `rdname` from `column` to `functions`
      
      I haven't supported `not` function, because the name has a collesion with `testthat` package. I didn't think of the way  to define it.
      
      ## New Supported Functions
      
      ```
      approxCountDistinct
      ascii
      base64
      bin
      bitwiseNOT
      ceil (alias: ceiling)
      crc32
      dayofmonth
      dayofyear
      explode
      factorial
      hex
      hour
      initcap
      isNaN
      last_day
      length
      log2
      ltrim
      md5
      minute
      month
      negate
      quarter
      reverse
      round
      rtrim
      second
      sha1
      signum (alias: sign)
      size
      soundex
      to_date
      trim
      unbase64
      unhex
      weekofyear
      year
      
      datediff
      levenshtein
      months_between
      nanvl
      pmod
      ```
      
      ## JIRA
      [[SPARK-9855] Add expression functions into SparkR whose params are simple - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-9855)
      
      Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
      
      Closes #8123 from yu-iskw/SPARK-9855.
      f4bc01f1
  10. Aug 11, 2015
  11. Aug 07, 2015
    • Reynold Xin's avatar
      [SPARK-9700] Pick default page size more intelligently. · 4309262e
      Reynold Xin authored
      Previously, we use 64MB as the default page size, which was way too big for a lot of Spark applications (especially for single node).
      
      This patch changes it so that the default page size, if unset by the user, is determined by the number of cores available and the total execution memory available.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #8012 from rxin/pagesize and squashes the following commits:
      
      16f4756 [Reynold Xin] Fixed failing test.
      5afd570 [Reynold Xin] private...
      0d5fb98 [Reynold Xin] Update default value.
      674a6cd [Reynold Xin] Address review feedback.
      dc00e05 [Reynold Xin] Merge with master.
      73ebdb6 [Reynold Xin] [SPARK-9700] Pick default page size more intelligently.
      4309262e
  12. Aug 04, 2015
    • Burak Yavuz's avatar
      [SPARK-8313] R Spark packages support · c9a4c36d
      Burak Yavuz authored
      shivaram cafreeman Could you please help me in testing this out? Exposing and running `rPackageBuilder` from inside the shell works, but for some reason, I can't get it to work during Spark Submit. It just starts relaunching Spark Submit.
      
      For testing, you may use the R branch with [sbt-spark-package](https://github.com/databricks/sbt-spark-package). You can call spPackage, and then pass the jar using `--jars`.
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      
      Closes #7139 from brkyvz/r-submit and squashes the following commits:
      
      0de384f [Burak Yavuz] remove unused imports 2
      d253708 [Burak Yavuz] removed unused imports
      6603d0d [Burak Yavuz] addressed comments
      4258ffe [Burak Yavuz] merged master
      ddfcc06 [Burak Yavuz] added zipping test
      3a1be7d [Burak Yavuz] don't zip
      77995df [Burak Yavuz] fix URI
      ac45527 [Burak Yavuz] added zipping of all libs
      e6bf7b0 [Burak Yavuz] add println ignores
      1bc5554 [Burak Yavuz] add assumes for tests
      9778e03 [Burak Yavuz] addressed comments
      b42b300 [Burak Yavuz] merged master
      ffd134e [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into r-submit
      d867756 [Burak Yavuz] add apache header
      eff5ba1 [Burak Yavuz] ready for review
      8838edb [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into r-submit
      e5b5a06 [Burak Yavuz] added doc
      bb751ce [Burak Yavuz] fix null bug
      0226768 [Burak Yavuz] fixed issues
      8810beb [Burak Yavuz] R packages support
      c9a4c36d
  13. Jul 31, 2015
    • Hossein's avatar
      [SPARK-9318] [SPARK-9320] [SPARKR] Aliases for merge and summary functions on DataFrames · 712f5b7a
      Hossein authored
      This PR adds synonyms for ```merge``` and ```summary``` in SparkR DataFrame API.
      
      cc shivaram
      
      Author: Hossein <hossein@databricks.com>
      
      Closes #7806 from falaki/SPARK-9320 and squashes the following commits:
      
      72600f7 [Hossein] Updated docs
      92a6e75 [Hossein] Fixed merge generic signature issue
      4c2b051 [Hossein] Fixing naming with mllib summary
      0f3a64c [Hossein] Added ... to generic for merge
      30fbaf8 [Hossein] Merged master
      ae1a4cf [Hossein] Merge branch 'master' into SPARK-9320
      e8eb86f [Hossein] Add a generic for merge
      fc01f2d [Hossein] Added unit test
      8d92012 [Hossein] Added merge as an alias for join
      5b8bedc [Hossein] Added unit test
      632693d [Hossein] Added summary as an alias for describe for DataFrame
      712f5b7a
    • Hossein's avatar
      [SPARK-9324] [SPARK-9322] [SPARK-9321] [SPARKR] Some aliases for R-like functions in DataFrames · 710c2b5d
      Hossein authored
      Adds following aliases:
      * unique (distinct)
      * rbind (unionAll): accepts many DataFrames
      * nrow (count)
      * ncol
      * dim
      * names (columns): along with the replacement function to change names
      
      Author: Hossein <hossein@databricks.com>
      
      Closes #7764 from falaki/sparkR-alias and squashes the following commits:
      
      56016f5 [Hossein] Updated R documentation
      5e4a4d0 [Hossein] Removed extra code
      f51cbef [Hossein] Merge branch 'master' into sparkR-alias
      c1b88bd [Hossein] Moved setGeneric and other comments applied
      d9307f8 [Hossein] Added tests
      b5aa988 [Hossein] Added dim, ncol, nrow, names, rbind, and unique functions to DataFrames
      710c2b5d
    • Shivaram Venkataraman's avatar
      [SPARK-9510] [SPARKR] Remaining SparkR style fixes · 82f47b81
      Shivaram Venkataraman authored
      With the change in this patch, I get no more warnings from `./dev/lint-r` in my machine
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #7834 from shivaram/sparkr-style-fixes and squashes the following commits:
      
      716cd8e [Shivaram Venkataraman] Remaining SparkR style fixes
      82f47b81
    • Yu ISHIKAWA's avatar
      [SPARK-9053] [SPARKR] Fix spaces around parens, infix operators etc. · fc0e57e5
      Yu ISHIKAWA authored
      ### JIRA
      [[SPARK-9053] Fix spaces around parens, infix operators etc. - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-9053)
      
      ### The Result of `lint-r`
      [The result of lint-r at the rivision:a4c83cb1](https://gist.github.com/yu-iskw/d253d7f8ef351f86443d)
      
      Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
      
      Closes #7584 from yu-iskw/SPARK-9053 and squashes the following commits:
      
      613170f [Yu ISHIKAWA] Ignore a warning about a space before a left parentheses
      ede61e1 [Yu ISHIKAWA] Ignores two warnings about a space before a left parentheses. TODO: After updating `lintr`, we will remove the ignores
      de3e0db [Yu ISHIKAWA] Add '## nolint start' & '## nolint end' statement to ignore infix space warnings
      e233ea8 [Yu ISHIKAWA] [SPARK-9053][SparkR] Fix spaces around parens, infix operators etc.
      fc0e57e5
  14. Jul 30, 2015
    • Hossein's avatar
      [SPARK-8742] [SPARKR] Improve SparkR error messages for DataFrame API · 157840d1
      Hossein authored
      This patch improves SparkR error message reporting, especially with DataFrame API. When there is a user error (e.g., malformed SQL query), the message of the cause is sent back through the RPC and the R client reads it and returns it back to user.
      
      cc shivaram
      
      Author: Hossein <hossein@databricks.com>
      
      Closes #7742 from falaki/SPARK-8742 and squashes the following commits:
      
      4f643c9 [Hossein] Not logging exceptions in RBackendHandler
      4a8005c [Hossein] Returning stack track of causing exception from RBackendHandler
      5cf17f0 [Hossein] Adding unit test for error messages from SQLContext
      2af75d5 [Hossein] Reading error message in case of failure and stoping with that message
      f479c99 [Hossein] Wrting exception cause message in JVM
      157840d1
    • Eric Liang's avatar
      [SPARK-9463] [ML] Expose model coefficients with names in SparkR RFormula · e7905a93
      Eric Liang authored
      Preview:
      
      ```
      > summary(m)
                  features coefficients
      1        (Intercept)    1.6765001
      2       Sepal_Length    0.3498801
      3 Species.versicolor   -0.9833885
      4  Species.virginica   -1.0075104
      
      ```
      
      Design doc from umbrella task: https://docs.google.com/document/d/10NZNSEurN2EdWM31uFYsgayIPfCFHiuIu3pCWrUmP_c/edit
      
      cc mengxr
      
      Author: Eric Liang <ekl@databricks.com>
      
      Closes #7771 from ericl/summary and squashes the following commits:
      
      ccd54c3 [Eric Liang] second pass
      a5ca93b [Eric Liang] comments
      2772111 [Eric Liang] clean up
      70483ef [Eric Liang] fix test
      7c247d4 [Eric Liang] Merge branch 'master' into summary
      3c55024 [Eric Liang] working
      8c539aa [Eric Liang] first pass
      e7905a93
    • Josh Rosen's avatar
      [SPARK-8850] [SQL] Enable Unsafe mode by default · 520ec0ff
      Josh Rosen authored
      This pull request enables Unsafe mode by default in Spark SQL. In order to do this, we had to fix a number of small issues:
      
      **List of fixed blockers**:
      
      - [x] Make some default buffer sizes configurable so that HiveCompatibilitySuite can run properly (#7741).
      - [x] Memory leak on grouped aggregation of empty input (fixed by #7560 to fix this)
      - [x] Update planner to also check whether codegen is enabled before planning unsafe operators.
      - [x] Investigate failing HiveThriftBinaryServerSuite test.  This turns out to be caused by a ClassCastException that occurs when Exchange tries to apply an interpreted RowOrdering to an UnsafeRow when range partitioning an RDD.  This could be fixed by #7408, but a shorter-term fix is to just skip the Unsafe exchange path when RangePartitioner is used.
      - [x] Memory leak exceptions masking exceptions that actually caused tasks to fail (will be fixed by #7603).
      - [x]  ~~https://issues.apache.org/jira/browse/SPARK-9162, to implement code generation for ScalaUDF.  This is necessary for `UDFSuite` to pass.  For now, I've just ignored this test in order to try to find other problems while we wait for a fix.~~ This is no longer necessary as of #7682.
      - [x] Memory leaks from Limit after UnsafeExternalSort cause the memory leak detector to fail tests. This is a huge problem in the HiveCompatibilitySuite (fixed by f4ac642a4e5b2a7931c5e04e086bb10e263b1db6).
      - [x] Tests in `AggregationQuerySuite` are failing due to NaN-handling issues in UnsafeRow, which were fixed in #7736.
      - [x] `org.apache.spark.sql.ColumnExpressionSuite.rand` needs to be updated so that the planner check also matches `TungstenProject`.
      - [x] After having lowered the buffer sizes to 4MB so that most of HiveCompatibilitySuite runs:
        - [x] Wrong answer in `join_1to1` (fixed by #7680)
        - [x] Wrong answer in `join_nulls` (fixed by #7680)
        - [x] Managed memory OOM / leak in `lateral_view`
        - [x] Seems to hang indefinitely in `partcols1`.  This might be a deadlock in script transformation or a bug in error-handling code? The hang was fixed by #7710.
        - [x] Error while freeing memory in `partcols1`: will be fixed by #7734.
      - [x] After fixing the `partcols1` hang, it appears that a number of later tests have issues as well.
      - [x] Fix thread-safety bug in codegen fallback expression evaluation (#7759).
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #7564 from JoshRosen/unsafe-by-default and squashes the following commits:
      
      83c0c56 [Josh Rosen] Merge remote-tracking branch 'origin/master' into unsafe-by-default
      f4cc859 [Josh Rosen] Merge remote-tracking branch 'origin/master' into unsafe-by-default
      963f567 [Josh Rosen] Reduce buffer size for R tests
      d6986de [Josh Rosen] Lower page size in PySpark tests
      013b9da [Josh Rosen] Also match TungstenProject in checkNumProjects
      5d0b2d3 [Josh Rosen] Add task completion callback to avoid leak in limit after sort
      ea250da [Josh Rosen] Disable unsafe Exchange path when RangePartitioning is used
      715517b [Josh Rosen] Enable Unsafe by default
      520ec0ff
    • Yuu ISHIKAWA's avatar
      [SPARK-9248] [SPARKR] Closing curly-braces should always be on their own line · 7492a33f
      Yuu ISHIKAWA authored
      ### JIRA
      [[SPARK-9248] Closing curly-braces should always be on their own line - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-9248)
      
      ## The result of `dev/lint-r`
      [The result of `dev/lint-r` for SPARK-9248 at the revistion:6175d6cf](https://gist.github.com/yu-iskw/96cadcea4ce664c41f81)
      
      Author: Yuu ISHIKAWA <yuu.ishikawa@gmail.com>
      
      Closes #7795 from yu-iskw/SPARK-9248 and squashes the following commits:
      
      c8eccd3 [Yuu ISHIKAWA] [SPARK-9248][SparkR] Closing curly-braces should always be on their own line
      7492a33f
  15. Jul 28, 2015
    • Eric Liang's avatar
      [SPARK-9391] [ML] Support minus, dot, and intercept operators in SparkR RFormula · 8d5bb528
      Eric Liang authored
      Adds '.', '-', and intercept parsing to RFormula. Also splits RFormulaParser into a separate file.
      
      Umbrella design doc here: https://docs.google.com/document/d/10NZNSEurN2EdWM31uFYsgayIPfCFHiuIu3pCWrUmP_c/edit?usp=sharing
      
      mengxr
      
      Author: Eric Liang <ekl@databricks.com>
      
      Closes #7707 from ericl/string-features-2 and squashes the following commits:
      
      8588625 [Eric Liang] exclude complex types for .
      8106ffe [Eric Liang] comments
      a9350bb [Eric Liang] s/var/val
      9c50d4d [Eric Liang] Merge branch 'string-features' into string-features-2
      581afb2 [Eric Liang] Merge branch 'master' into string-features
      08ae539 [Eric Liang] Merge branch 'string-features' into string-features-2
      f99131a [Eric Liang] comments
      cecec43 [Eric Liang] Merge branch 'string-features' into string-features-2
      0bf3c26 [Eric Liang] update docs
      4592df2 [Eric Liang] intercept supports
      7412a2e [Eric Liang] Fri Jul 24 14:56:51 PDT 2015
      3cf848e [Eric Liang] fix the parser
      0556c2b [Eric Liang] Merge branch 'string-features' into string-features-2
      c302a2c [Eric Liang] fix tests
      9d1ac82 [Eric Liang] Merge remote-tracking branch 'upstream/master' into string-features
      e713da3 [Eric Liang] comments
      cd231a9 [Eric Liang] Wed Jul 22 17:18:44 PDT 2015
      4d79193 [Eric Liang] revert to seq + distinct
      169a085 [Eric Liang] tweak functional test
      a230a47 [Eric Liang] Merge branch 'master' into string-features
      72bd6f3 [Eric Liang] fix merge
      d841cec [Eric Liang] Merge branch 'master' into string-features
      5b2c4a2 [Eric Liang] Mon Jul 20 18:45:33 PDT 2015
      b01c7c5 [Eric Liang] add test
      8a637db [Eric Liang] encoder wip
      a1d03f4 [Eric Liang] refactor into estimator
      8d5bb528
    • trestletech's avatar
      Use vector-friendly comparison for packages argument. · 61432340
      trestletech authored
      Otherwise, `sparkR.init()` with multiple `sparkPackages` results in this warning:
      
      ```
      Warning message:
      In if (packages != "") { :
        the condition has length > 1 and only the first element will be used
      ```
      
      Author: trestletech <jeff.allen@trestletechnology.net>
      
      Closes #7701 from trestletech/compare-packages and squashes the following commits:
      
      72c8b36 [trestletech] Correct function name.
      c52db0e [trestletech] Added test for multiple packages.
      3aab1a7 [trestletech] Use vector-friendly comparison for packages argument.
      61432340
  16. Jul 27, 2015
    • Eric Liang's avatar
      [SPARK-9230] [ML] Support StringType features in RFormula · 8ddfa52c
      Eric Liang authored
      This adds StringType feature support via OneHotEncoder. As part of this task it was necessary to change RFormula to an Estimator, so that factor levels could be determined from the training dataset.
      
      Not sure if I am using uids correctly here, would be good to get reviewer help on that.
      cc mengxr
      
      Umbrella design doc: https://docs.google.com/document/d/10NZNSEurN2EdWM31uFYsgayIPfCFHiuIu3pCWrUmP_c/edit#
      
      Author: Eric Liang <ekl@databricks.com>
      
      Closes #7574 from ericl/string-features and squashes the following commits:
      
      f99131a [Eric Liang] comments
      0bf3c26 [Eric Liang] update docs
      c302a2c [Eric Liang] fix tests
      9d1ac82 [Eric Liang] Merge remote-tracking branch 'upstream/master' into string-features
      e713da3 [Eric Liang] comments
      4d79193 [Eric Liang] revert to seq + distinct
      169a085 [Eric Liang] tweak functional test
      a230a47 [Eric Liang] Merge branch 'master' into string-features
      72bd6f3 [Eric Liang] fix merge
      d841cec [Eric Liang] Merge branch 'master' into string-features
      5b2c4a2 [Eric Liang] Mon Jul 20 18:45:33 PDT 2015
      b01c7c5 [Eric Liang] add test
      8a637db [Eric Liang] encoder wip
      a1d03f4 [Eric Liang] refactor into estimator
      8ddfa52c
  17. Jul 24, 2015
  18. Jul 23, 2015
  19. Jul 22, 2015
    • Xiangrui Meng's avatar
      [SPARK-8364] [SPARKR] Add crosstab to SparkR DataFrames · 2f5cbd86
      Xiangrui Meng authored
      Add `crosstab` to SparkR DataFrames, which takes two column names and returns a local R data.frame. This is similar to `table` in R. However, `table` in SparkR is used for loading SQL tables as DataFrames. The return type is data.frame instead table for `crosstab` to be compatible with Scala/Python.
      
      I couldn't run R tests successfully on my local. Many unit tests failed. So let's try Jenkins.
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #7318 from mengxr/SPARK-8364 and squashes the following commits:
      
      d75e894 [Xiangrui Meng] fix tests
      53f6ddd [Xiangrui Meng] fix tests
      f1348d6 [Xiangrui Meng] update test
      47cb088 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-8364
      5621262 [Xiangrui Meng] first version without test
      2f5cbd86
  20. Jul 20, 2015
    • Eric Liang's avatar
      [SPARK-9201] [ML] Initial integration of MLlib + SparkR using RFormula · 1cbdd899
      Eric Liang authored
      This exposes the SparkR:::glm() and SparkR:::predict() APIs. It was necessary to change RFormula to silently drop the label column if it was missing from the input dataset, which is kind of a hack but necessary to integrate with the Pipeline API.
      
      The umbrella design doc for MLlib + SparkR integration can be viewed here: https://docs.google.com/document/d/10NZNSEurN2EdWM31uFYsgayIPfCFHiuIu3pCWrUmP_c/edit
      
      mengxr
      
      Author: Eric Liang <ekl@databricks.com>
      
      Closes #7483 from ericl/spark-8774 and squashes the following commits:
      
      3dfac0c [Eric Liang] update
      17ef516 [Eric Liang] more comments
      1753a0f [Eric Liang] make glm generic
      b0f50f8 [Eric Liang] equivalence test
      550d56d [Eric Liang] export methods
      c015697 [Eric Liang] second pass
      117949a [Eric Liang] comments
      5afbc67 [Eric Liang] test label columns
      6b7f15f [Eric Liang] Fri Jul 17 14:20:22 PDT 2015
      3a63ae5 [Eric Liang] Fri Jul 17 13:41:52 PDT 2015
      ce61367 [Eric Liang] Fri Jul 17 13:41:17 PDT 2015
      0299c59 [Eric Liang] Fri Jul 17 13:40:32 PDT 2015
      e37603f [Eric Liang] Fri Jul 17 12:15:03 PDT 2015
      d417d0c [Eric Liang] Merge remote-tracking branch 'upstream/master' into spark-8774
      29a2ce7 [Eric Liang] Merge branch 'spark-8774-1' into spark-8774
      d1959d2 [Eric Liang] clarify comment
      2db68aa [Eric Liang] second round of comments
      dc3c943 [Eric Liang] address comments
      5765ec6 [Eric Liang] fix style checks
      1f361b0 [Eric Liang] doc
      d33211b [Eric Liang] r support
      fb0826b [Eric Liang] [SPARK-8774] Add R model formula with basic support as a transformer
      1cbdd899
    • Yu ISHIKAWA's avatar
      [SPARK-9052] [SPARKR] Fix comments after curly braces · 2bdf9914
      Yu ISHIKAWA authored
      [[SPARK-9052] Fix comments after curly braces - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-9052)
      
      This is the full result of lintr at the rivision:01155162.
      [[SPARK-9052] the result of lint-r at the revision:01155162](https://gist.github.com/yu-iskw/e7246041b173a3f29482)
      
      This is the difference of the result between before and after.
      https://gist.github.com/yu-iskw/e7246041b173a3f29482/revisions
      
      Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
      
      Closes #7440 from yu-iskw/SPARK-9052 and squashes the following commits:
      
      015d738 [Yu ISHIKAWA] Fix the indentations and move the placement of commna
      5cc30fe [Yu ISHIKAWA] Fix the indentation in a condition
      4ead0e5 [Yu ISHIKAWA] [SPARK-9052][SparkR] Fix comments after curly braces
      2bdf9914
  21. Jul 17, 2015
  22. Jul 16, 2015
  23. Jul 15, 2015
    • Liang-Chi Hsieh's avatar
      [SPARK-8840] [SPARKR] Add float coercion on SparkR · 6f690259
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-8840
      
      Currently the type coercion rules don't include float type. This PR simply adds it.
      
      Author: Liang-Chi Hsieh <viirya@appier.com>
      
      Closes #7280 from viirya/add_r_float_coercion and squashes the following commits:
      
      c86dc0e [Liang-Chi Hsieh] For comments.
      dbf0c1b [Liang-Chi Hsieh] Implicitly convert Double to Float based on provided schema.
      733015a [Liang-Chi Hsieh] Add test case for DataFrame with float type.
      30c2a40 [Liang-Chi Hsieh] Update test case.
      52b5294 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into add_r_float_coercion
      6f9159d [Liang-Chi Hsieh] Add another test case.
      8db3244 [Liang-Chi Hsieh] schema also needs to support float. add test case.
      0dcc992 [Liang-Chi Hsieh] Add float coercion on SparkR.
      6f690259
    • Sun Rui's avatar
      [SPARK-8808] [SPARKR] Fix assignments in SparkR. · f650a005
      Sun Rui authored
      Author: Sun Rui <rui.sun@intel.com>
      
      Closes #7395 from sun-rui/SPARK-8808 and squashes the following commits:
      
      ce603bc [Sun Rui] Use '<-' instead of '='.
      88590b1 [Sun Rui] Use '<-' instead of '='.
      f650a005
Loading