Skip to content
Snippets Groups Projects
  1. Oct 21, 2015
    • Davies Liu's avatar
      [SPARK-11197][SQL] run SQL on files directly · f8c6bec6
      Davies Liu authored
      This PR introduce a new feature to run SQL directly on files without create a table, for example:
      
      ```
      select id from json.`path/to/json/files` as j
      ```
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #9173 from davies/source.
      f8c6bec6
  2. Oct 20, 2015
  3. Oct 19, 2015
  4. Oct 14, 2015
  5. Oct 13, 2015
    • Adrian Zhuang's avatar
      [SPARK-10913] [SPARKR] attach() function support · f7f28ee7
      Adrian Zhuang authored
      Bring the change code up to date.
      
      Author: Adrian Zhuang <adrian555@users.noreply.github.com>
      Author: adrian555 <wzhuang@us.ibm.com>
      
      Closes #9031 from adrian555/attach2.
      f7f28ee7
    • Narine Kokhlikyan's avatar
      [SPARK-10888] [SPARKR] Added as.DataFrame as a synonym to createDataFrame · 1e0aba90
      Narine Kokhlikyan authored
      as.DataFrame is more a R-style like signature.
      Also, I'd like to know if we could make the context, e.g. sqlContext global, so that we do not have to specify it as an argument, when we each time create a dataframe.
      
      Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com>
      
      Closes #8952 from NarineK/sparkrasDataFrame.
      1e0aba90
    • Sun Rui's avatar
      [SPARK-10051] [SPARKR] Support collecting data of StructType in DataFrame · 5e3868ba
      Sun Rui authored
      Two points in this PR:
      
      1.    Originally thought was that a named R list is assumed to be a struct in SerDe. But this is problematic because some R functions will implicitly generate named lists that are not intended to be a struct when transferred by SerDe. So SerDe clients have to explicitly mark a names list as struct by changing its class from "list" to "struct".
      
      2.    SerDe is in the Spark Core module, and data of StructType is represented as GenricRow which is defined in Spark SQL module. SerDe can't import GenricRow as in maven build  Spark SQL module depends on Spark Core module. So this PR adds a registration hook in SerDe to allow SQLUtils in Spark SQL module to register its functions for serialization and deserialization of StructType.
      
      Author: Sun Rui <rui.sun@intel.com>
      
      Closes #8794 from sun-rui/SPARK-10051.
      5e3868ba
  6. Oct 10, 2015
    • Sun Rui's avatar
      [SPARK-10079] [SPARKR] Make 'column' and 'col' functions be S4 functions. · 864de3bf
      Sun Rui authored
      1.  Add a "col" function into DataFrame.
      2.  Move the current "col" function in Column.R to functions.R, convert it to S4 function.
      3.  Add a s4 "column" function in functions.R.
      4.  Convert the "column" function in Column.R to S4 function. This is for private use.
      
      Author: Sun Rui <rui.sun@intel.com>
      
      Closes #8864 from sun-rui/SPARK-10079.
      864de3bf
  7. Oct 09, 2015
  8. Oct 08, 2015
    • Narine Kokhlikyan's avatar
      [SPARK-10836] [SPARKR] Added sort(x, decreasing, col, ... ) method to DataFrame · e8f90d9d
      Narine Kokhlikyan authored
      the sort function can be used as an alternative to arrange(... ).
      As arguments it accepts x - dataframe, decreasing - TRUE/FALSE, a list of orderings for columns and the list of columns, represented as string names
      
      for example:
      sort(df, TRUE, "col1","col2","col3","col5") # for example, if we want to sort some of the columns in the same order
      
      sort(df, decreasing=TRUE, "col1")
      sort(df, decreasing=c(TRUE,FALSE), "col1","col2")
      
      Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com>
      
      Closes #8920 from NarineK/sparkrsort.
      e8f90d9d
  9. Oct 07, 2015
  10. Oct 04, 2015
  11. Sep 30, 2015
  12. Sep 25, 2015
    • Narine Kokhlikyan's avatar
      [SPARK-10760] [SPARKR] SparkR glm: the documentation in examples - family argument is missing · 6fcee906
      Narine Kokhlikyan authored
      Hi everyone,
      
      Since the family argument is required for the glm function, the execution of:
      
      model <- glm(Sepal_Length ~ Sepal_Width, df)
      
      is failing.
      
      I've fixed the documentation by adding the family argument and also added the summay(model) which will show the coefficients for the model.
      
      Thanks,
      Narine
      
      Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com>
      
      Closes #8870 from NarineK/sparkrml.
      6fcee906
    • Eric Liang's avatar
      [SPARK-9681] [ML] Support R feature interactions in RFormula · 92233881
      Eric Liang authored
      This integrates the Interaction feature transformer with SparkR R formula support (i.e. support `:`).
      
      To generate reasonable ML attribute names for feature interactions, it was necessary to add the ability to read attribute the original attribute names back from `StructField`, and also to specify custom group prefixes in `VectorAssembler`. This also has the side-benefit of cleaning up the double-underscores in the attributes generated for non-interaction terms.
      
      mengxr
      
      Author: Eric Liang <ekl@databricks.com>
      
      Closes #8830 from ericl/interaction-2.
      92233881
  13. Sep 16, 2015
  14. Sep 15, 2015
  15. Sep 12, 2015
    • JihongMa's avatar
      [SPARK-6548] Adding stddev to DataFrame functions · f4a22808
      JihongMa authored
      Adding STDDEV support for DataFrame using 1-pass online /parallel algorithm to compute variance. Please review the code change.
      
      Author: JihongMa <linlin200605@gmail.com>
      Author: Jihong MA <linlin200605@gmail.com>
      Author: Jihong MA <jihongma@jihongs-mbp.usca.ibm.com>
      Author: Jihong MA <jihongma@Jihongs-MacBook-Pro.local>
      
      Closes #6297 from JihongMA/SPARK-SQL.
      f4a22808
  16. Sep 10, 2015
    • Sun Rui's avatar
      [SPARK-10049] [SPARKR] Support collecting data of ArraryType in DataFrame. · 45e3be5c
      Sun Rui authored
      this PR :
      1.  Enhance reflection in RBackend. Automatically matching a Java array to Scala Seq when finding methods. Util functions like seq(), listToSeq() in R side can be removed, as they will conflict with the Serde logic that transferrs a Scala seq to R side.
      
      2.  Enhance the SerDe to support transferring  a Scala seq to R side. Data of ArrayType in DataFrame
      after collection is observed to be of Scala Seq type.
      
      3.  Support ArrayType in createDataFrame().
      
      Author: Sun Rui <rui.sun@intel.com>
      
      Closes #8458 from sun-rui/SPARK-10049.
      45e3be5c
  17. Sep 04, 2015
  18. Sep 03, 2015
    • CHOIJAEHONG's avatar
      [SPARK-8951] [SPARKR] support Unicode characters in collect() · af0e3125
      CHOIJAEHONG authored
      Spark gives an error message and does not show the output when a field of the result DataFrame contains characters in CJK.
      I changed SerDe.scala in order that Spark support Unicode characters when writes a string to R.
      
      Author: CHOIJAEHONG <redrock07@naver.com>
      
      Closes #7494 from CHOIJAEHONG1/SPARK-8951.
      af0e3125
  19. Aug 28, 2015
  20. Aug 27, 2015
  21. Aug 26, 2015
    • Yu ISHIKAWA's avatar
      [MINOR] [SPARKR] Fix some validation problems in SparkR · 773ca037
      Yu ISHIKAWA authored
      Getting rid of some validation problems in SparkR
      https://github.com/apache/spark/pull/7883
      
      cc shivaram
      
      ```
      inst/tests/test_Serde.R:26:1: style: Trailing whitespace is superfluous.
      
      ^~
      inst/tests/test_Serde.R:34:1: style: Trailing whitespace is superfluous.
      
      ^~
      inst/tests/test_Serde.R:37:38: style: Trailing whitespace is superfluous.
        expect_equal(class(x), "character")
                                           ^~
      inst/tests/test_Serde.R:50:1: style: Trailing whitespace is superfluous.
      
      ^~
      inst/tests/test_Serde.R:55:1: style: Trailing whitespace is superfluous.
      
      ^~
      inst/tests/test_Serde.R:60:1: style: Trailing whitespace is superfluous.
      
      ^~
      inst/tests/test_sparkSQL.R:611:1: style: Trailing whitespace is superfluous.
      
      ^~
      R/DataFrame.R:664:1: style: Trailing whitespace is superfluous.
      
      ^~~~~~~~~~~~~~
      R/DataFrame.R:670:55: style: Trailing whitespace is superfluous.
                      df <- data.frame(row.names = 1 : nrow)
                                                            ^~~~~~~~~~~~~~~~
      R/DataFrame.R:672:1: style: Trailing whitespace is superfluous.
      
      ^~~~~~~~~~~~~~
      R/DataFrame.R:686:49: style: Trailing whitespace is superfluous.
                          df[[names[colIndex]]] <- vec
                                                      ^~~~~~~~~~~~~~~~~~
      ```
      
      Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
      
      Closes #8474 from yu-iskw/minor-fix-sparkr.
      773ca037
    • Shivaram Venkataraman's avatar
      [SPARK-10308] [SPARKR] Add %in% to the exported namespace · ad7f0f16
      Shivaram Venkataraman authored
      I also checked all the other functions defined in column.R, functions.R and DataFrame.R and everything else looked fine.
      
      cc yu-iskw
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #8473 from shivaram/in-namespace.
      ad7f0f16
    • felixcheung's avatar
      [SPARK-9316] [SPARKR] Add support for filtering using `[` (synonym for filter / select) · 75d4773a
      felixcheung authored
      Add support for
      ```
         df[df$name == "Smith", c(1,2)]
         df[df$age %in% c(19, 30), 1:2]
      ```
      
      shivaram
      
      Author: felixcheung <felixcheung_m@hotmail.com>
      
      Closes #8394 from felixcheung/rsubset.
      75d4773a
  22. Aug 25, 2015
  23. Aug 24, 2015
  24. Aug 19, 2015
  25. Aug 18, 2015
  26. Aug 17, 2015
    • Yu ISHIKAWA's avatar
      [SPARK-9871] [SPARKR] Add expression functions into SparkR which have a variable parameter · 26e76058
      Yu ISHIKAWA authored
      ### Summary
      
      - Add `lit` function
      - Add `concat`, `greatest`, `least` functions
      
      I think we need to improve `collect` function in order to implement `struct` function. Since `collect` doesn't work with arguments which includes a nested `list` variable. It seems that a list against `struct` still has `jobj` classes. So it would be better to solve this problem on another issue.
      
      ### JIRA
      [[SPARK-9871] Add expression functions into SparkR which have a variable parameter - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-9871)
      
      Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
      
      Closes #8194 from yu-iskw/SPARK-9856.
      26e76058
  27. Aug 16, 2015
    • Sun Rui's avatar
      [SPARK-8844] [SPARKR] head/collect is broken in SparkR. · 5f9ce738
      Sun Rui authored
      This is a WIP patch for SPARK-8844  for collecting reviews.
      
      This bug is about reading an empty DataFrame. in readCol(),
            lapply(1:numRows, function(x) {
      does not take into consideration the case where numRows = 0.
      
      Will add unit test case.
      
      Author: Sun Rui <rui.sun@intel.com>
      
      Closes #7419 from sun-rui/SPARK-8844.
      5f9ce738
Loading