Skip to content
Snippets Groups Projects
  1. Nov 06, 2015
  2. Nov 05, 2015
  3. Nov 04, 2015
  4. Nov 03, 2015
  5. Nov 02, 2015
  6. Oct 30, 2015
  7. Oct 29, 2015
  8. Oct 28, 2015
  9. Oct 26, 2015
  10. Oct 23, 2015
  11. Oct 22, 2015
    • Forest Fang's avatar
      [SPARK-11244][SPARKR] sparkR.stop() should remove SQLContext · 94e2064f
      Forest Fang authored
      SparkR should remove `.sparkRSQLsc` and `.sparkRHivesc` when `sparkR.stop()` is called. Otherwise even when SparkContext is reinitialized, `sparkRSQL.init` returns the stale copy of the object and complains:
      
      ```r
      sc <- sparkR.init("local")
      sqlContext <- sparkRSQL.init(sc)
      sparkR.stop()
      sc <- sparkR.init("local")
      sqlContext <- sparkRSQL.init(sc)
      sqlContext
      ```
      producing
      ```r
      Error in callJMethod(x, "getClass") :
        Invalid jobj 1. If SparkR was restarted, Spark operations need to be re-executed.
      ```
      
      I have added the check and removal only when SparkContext itself is initialized. I have also added corresponding test for this fix. Let me know if you want me to move the test to SQL test suite instead.
      
      p.s. I tried lint-r but ended up a lots of errors on existing code.
      
      Author: Forest Fang <forest.fang@outlook.com>
      
      Closes #9205 from saurfang/sparkR.stop.
      94e2064f
  12. Oct 21, 2015
    • Davies Liu's avatar
      [SPARK-11197][SQL] run SQL on files directly · f8c6bec6
      Davies Liu authored
      This PR introduce a new feature to run SQL directly on files without create a table, for example:
      
      ```
      select id from json.`path/to/json/files` as j
      ```
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #9173 from davies/source.
      f8c6bec6
  13. Oct 20, 2015
  14. Oct 19, 2015
  15. Oct 14, 2015
  16. Oct 13, 2015
    • Adrian Zhuang's avatar
      [SPARK-10913] [SPARKR] attach() function support · f7f28ee7
      Adrian Zhuang authored
      Bring the change code up to date.
      
      Author: Adrian Zhuang <adrian555@users.noreply.github.com>
      Author: adrian555 <wzhuang@us.ibm.com>
      
      Closes #9031 from adrian555/attach2.
      f7f28ee7
    • Narine Kokhlikyan's avatar
      [SPARK-10888] [SPARKR] Added as.DataFrame as a synonym to createDataFrame · 1e0aba90
      Narine Kokhlikyan authored
      as.DataFrame is more a R-style like signature.
      Also, I'd like to know if we could make the context, e.g. sqlContext global, so that we do not have to specify it as an argument, when we each time create a dataframe.
      
      Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com>
      
      Closes #8952 from NarineK/sparkrasDataFrame.
      1e0aba90
    • Sun Rui's avatar
      [SPARK-10051] [SPARKR] Support collecting data of StructType in DataFrame · 5e3868ba
      Sun Rui authored
      Two points in this PR:
      
      1.    Originally thought was that a named R list is assumed to be a struct in SerDe. But this is problematic because some R functions will implicitly generate named lists that are not intended to be a struct when transferred by SerDe. So SerDe clients have to explicitly mark a names list as struct by changing its class from "list" to "struct".
      
      2.    SerDe is in the Spark Core module, and data of StructType is represented as GenricRow which is defined in Spark SQL module. SerDe can't import GenricRow as in maven build  Spark SQL module depends on Spark Core module. So this PR adds a registration hook in SerDe to allow SQLUtils in Spark SQL module to register its functions for serialization and deserialization of StructType.
      
      Author: Sun Rui <rui.sun@intel.com>
      
      Closes #8794 from sun-rui/SPARK-10051.
      5e3868ba
  17. Oct 10, 2015
    • Sun Rui's avatar
      [SPARK-10079] [SPARKR] Make 'column' and 'col' functions be S4 functions. · 864de3bf
      Sun Rui authored
      1.  Add a "col" function into DataFrame.
      2.  Move the current "col" function in Column.R to functions.R, convert it to S4 function.
      3.  Add a s4 "column" function in functions.R.
      4.  Convert the "column" function in Column.R to S4 function. This is for private use.
      
      Author: Sun Rui <rui.sun@intel.com>
      
      Closes #8864 from sun-rui/SPARK-10079.
      864de3bf
  18. Oct 09, 2015
  19. Oct 08, 2015
    • Narine Kokhlikyan's avatar
      [SPARK-10836] [SPARKR] Added sort(x, decreasing, col, ... ) method to DataFrame · e8f90d9d
      Narine Kokhlikyan authored
      the sort function can be used as an alternative to arrange(... ).
      As arguments it accepts x - dataframe, decreasing - TRUE/FALSE, a list of orderings for columns and the list of columns, represented as string names
      
      for example:
      sort(df, TRUE, "col1","col2","col3","col5") # for example, if we want to sort some of the columns in the same order
      
      sort(df, decreasing=TRUE, "col1")
      sort(df, decreasing=c(TRUE,FALSE), "col1","col2")
      
      Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com>
      
      Closes #8920 from NarineK/sparkrsort.
      e8f90d9d
  20. Oct 07, 2015
  21. Oct 04, 2015
  22. Sep 30, 2015
  23. Sep 25, 2015
    • Narine Kokhlikyan's avatar
      [SPARK-10760] [SPARKR] SparkR glm: the documentation in examples - family argument is missing · 6fcee906
      Narine Kokhlikyan authored
      Hi everyone,
      
      Since the family argument is required for the glm function, the execution of:
      
      model <- glm(Sepal_Length ~ Sepal_Width, df)
      
      is failing.
      
      I've fixed the documentation by adding the family argument and also added the summay(model) which will show the coefficients for the model.
      
      Thanks,
      Narine
      
      Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com>
      
      Closes #8870 from NarineK/sparkrml.
      6fcee906
    • Eric Liang's avatar
      [SPARK-9681] [ML] Support R feature interactions in RFormula · 92233881
      Eric Liang authored
      This integrates the Interaction feature transformer with SparkR R formula support (i.e. support `:`).
      
      To generate reasonable ML attribute names for feature interactions, it was necessary to add the ability to read attribute the original attribute names back from `StructField`, and also to specify custom group prefixes in `VectorAssembler`. This also has the side-benefit of cleaning up the double-underscores in the attributes generated for non-interaction terms.
      
      mengxr
      
      Author: Eric Liang <ekl@databricks.com>
      
      Closes #8830 from ericl/interaction-2.
      92233881
  24. Sep 16, 2015
  25. Sep 15, 2015
  26. Sep 12, 2015
    • JihongMa's avatar
      [SPARK-6548] Adding stddev to DataFrame functions · f4a22808
      JihongMa authored
      Adding STDDEV support for DataFrame using 1-pass online /parallel algorithm to compute variance. Please review the code change.
      
      Author: JihongMa <linlin200605@gmail.com>
      Author: Jihong MA <linlin200605@gmail.com>
      Author: Jihong MA <jihongma@jihongs-mbp.usca.ibm.com>
      Author: Jihong MA <jihongma@Jihongs-MacBook-Pro.local>
      
      Closes #6297 from JihongMA/SPARK-SQL.
      f4a22808
  27. Sep 10, 2015
    • Sun Rui's avatar
      [SPARK-10049] [SPARKR] Support collecting data of ArraryType in DataFrame. · 45e3be5c
      Sun Rui authored
      this PR :
      1.  Enhance reflection in RBackend. Automatically matching a Java array to Scala Seq when finding methods. Util functions like seq(), listToSeq() in R side can be removed, as they will conflict with the Serde logic that transferrs a Scala seq to R side.
      
      2.  Enhance the SerDe to support transferring  a Scala seq to R side. Data of ArrayType in DataFrame
      after collection is observed to be of Scala Seq type.
      
      3.  Support ArrayType in createDataFrame().
      
      Author: Sun Rui <rui.sun@intel.com>
      
      Closes #8458 from sun-rui/SPARK-10049.
      45e3be5c
  28. Sep 04, 2015
  29. Sep 03, 2015
    • CHOIJAEHONG's avatar
      [SPARK-8951] [SPARKR] support Unicode characters in collect() · af0e3125
      CHOIJAEHONG authored
      Spark gives an error message and does not show the output when a field of the result DataFrame contains characters in CJK.
      I changed SerDe.scala in order that Spark support Unicode characters when writes a string to R.
      
      Author: CHOIJAEHONG <redrock07@naver.com>
      
      Closes #7494 from CHOIJAEHONG1/SPARK-8951.
      af0e3125
  30. Aug 28, 2015
  31. Aug 27, 2015
Loading