Skip to content
Snippets Groups Projects
  1. Dec 07, 2015
    • Sun Rui's avatar
      [SPARK-12034][SPARKR] Eliminate warnings in SparkR test cases. · 39d677c8
      Sun Rui authored
      This PR:
      1. Suppress all known warnings.
      2. Cleanup test cases and fix some errors in test cases.
      3. Fix errors in HiveContext related test cases. These test cases are actually not run previously due to a bug of creating TestHiveContext.
      4. Support 'testthat' package version 0.11.0 which prefers that test cases be under 'tests/testthat'
      5. Make sure the default Hadoop file system is local when running test cases.
      6. Turn on warnings into errors.
      
      Author: Sun Rui <rui.sun@intel.com>
      
      Closes #10030 from sun-rui/SPARK-12034.
      39d677c8
  2. Dec 06, 2015
    • Yanbo Liang's avatar
      [SPARK-12044][SPARKR] Fix usage of isnan, isNaN · b6e8e63a
      Yanbo Liang authored
      1, Add ```isNaN``` to ```Column``` for SparkR. ```Column``` should has three related variable functions: ```isNaN, isNull, isNotNull```.
      2, Replace ```DataFrame.isNaN``` with ```DataFrame.isnan``` at SparkR side. Because ```DataFrame.isNaN``` has been deprecated and will be removed at Spark 2.0.
      <del>3, Add ```isnull``` to ```DataFrame``` for SparkR. ```DataFrame``` should has two related functions: ```isnan, isnull```.<del>
      
      cc shivaram sun-rui felixcheung
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #10037 from yanboliang/spark-12044.
      b6e8e63a
  3. Dec 05, 2015
  4. Dec 03, 2015
  5. Nov 29, 2015
  6. Nov 28, 2015
  7. Nov 27, 2015
    • Yanbo Liang's avatar
      [SPARK-12025][SPARKR] Rename some window rank function names for SparkR · ba02f6cb
      Yanbo Liang authored
      Change ```cumeDist -> cume_dist, denseRank -> dense_rank, percentRank -> percent_rank, rowNumber -> row_number``` at SparkR side.
      There are two reasons that we should make this change:
      * We should follow the [naming convention rule of R](http://www.inside-r.org/node/230645)
      * Spark DataFrame has deprecated the old convention (such as ```cumeDist```) and will remove it in Spark 2.0.
      
      It's better to fix this issue before 1.6 release, otherwise we will make breaking API change.
      cc shivaram sun-rui
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #10016 from yanboliang/SPARK-12025.
      ba02f6cb
  8. Nov 20, 2015
  9. Nov 19, 2015
    • felixcheung's avatar
      [SPARK-11339][SPARKR] Document the list of functions in R base package that... · 1a93323c
      felixcheung authored
      [SPARK-11339][SPARKR] Document the list of functions in R base package that are masked by functions with same name in SparkR
      
      Added tests for function that are reported as masked, to make sure the base:: or stats:: function can be called.
      
      For those we can't call, added them to SparkR programming guide.
      
      It would seem to me `table, sample, subset, filter, cov` not working are not actually expected - I investigated/experimented with them but couldn't get them to work. It looks like as they are defined in base or stats they are missing the S3 generic, eg.
      ```
      > methods("transform")
      [1] transform,ANY-method       transform.data.frame
      [3] transform,DataFrame-method transform.default
      see '?methods' for accessing help and source code
      > methods("subset")
      [1] subset.data.frame       subset,DataFrame-method subset.default
      [4] subset.matrix
      see '?methods' for accessing help and source code
      Warning message:
      In .S3methods(generic.function, class, parent.frame()) :
        function 'subset' appears not to be S3 generic; found functions that look like S3 methods
      ```
      Any idea?
      
      More information on masking:
      http://www.ats.ucla.edu/stat/r/faq/referencing_objects.htm
      http://www.sfu.ca/~sweldon/howTo/guide4.pdf
      
      This is what the output doc looks like (minus css):
      ![image](https://cloud.githubusercontent.com/assets/8969467/11229714/2946e5de-8d4d-11e5-94b0-dda9696b6fdd.png)
      
      Author: felixcheung <felixcheung_m@hotmail.com>
      
      Closes #9785 from felixcheung/rmasked.
      1a93323c
  10. Nov 18, 2015
  11. Nov 15, 2015
    • Sun Rui's avatar
      [SPARK-10500][SPARKR] sparkr.zip cannot be created if /R/lib is unwritable · 835a79d7
      Sun Rui authored
      The basic idea is that:
      The archive of the SparkR package itself, that is sparkr.zip, is created during build process and is contained in the Spark binary distribution. No change to it after the distribution is installed as the directory it resides ($SPARK_HOME/R/lib) may not be writable.
      
      When there is R source code contained in jars or Spark packages specified with "--jars" or "--packages" command line option, a temporary directory is created by calling Utils.createTempDir() where the R packages built from the R source code will be installed. The temporary directory is writable, and won't interfere with each other when there are multiple SparkR sessions, and will be deleted when this SparkR session ends. The R binary packages installed in the temporary directory then are packed into an archive named rpkg.zip.
      
      sparkr.zip and rpkg.zip are distributed to the cluster in YARN modes.
      
      The distribution of rpkg.zip in Standalone modes is not supported in this PR, and will be address in another PR.
      
      Various R files are updated to accept multiple lib paths (one is for SparkR package, the other is for other R packages)  so that these package can be accessed in R.
      
      Author: Sun Rui <rui.sun@intel.com>
      
      Closes #9390 from sun-rui/SPARK-10500.
      835a79d7
    • zero323's avatar
      [SPARK-11086][SPARKR] Use dropFactors column-wise instead of nested loop when createDataFrame · d7d9fa0b
      zero323 authored
      Use `dropFactors` column-wise instead of nested loop when `createDataFrame` from a `data.frame`
      
      At this moment SparkR createDataFrame is using nested loop to convert factors to character when called on a local data.frame.  It works but is incredibly slow especially with data.table (~ 2 orders of magnitude compared to PySpark / Pandas version on a DateFrame of size 1M rows x 2 columns).
      
      A simple improvement is to apply `dropFactor `column-wise and then reshape output list.
      
      It should at least partially address [SPARK-8277](https://issues.apache.org/jira/browse/SPARK-8277).
      
      Author: zero323 <matthew.szymkiewicz@gmail.com>
      
      Closes #9099 from zero323/SPARK-11086.
      d7d9fa0b
  12. Nov 12, 2015
    • felixcheung's avatar
      [SPARK-11263][SPARKR] lintr Throws Warnings on Commented Code in Documentation · ed04846e
      felixcheung authored
      Clean out hundreds of `style: Commented code should be removed.` from lintr
      
      Like these:
      ```
      /opt/spark-1.6.0-bin-hadoop2.6/R/pkg/R/DataFrame.R:513:3: style: Commented code should be removed.
      # sc <- sparkR.init()
        ^~~~~~~~~~~~~~~~~~~
      /opt/spark-1.6.0-bin-hadoop2.6/R/pkg/R/DataFrame.R:514:3: style: Commented code should be removed.
      # sqlContext <- sparkRSQL.init(sc)
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      /opt/spark-1.6.0-bin-hadoop2.6/R/pkg/R/DataFrame.R:515:3: style: Commented code should be removed.
      # path <- "path/to/file.json"
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~
      ```
      
      tried without export or rdname, neither work
      instead, added this `#' noRd` to suppress .Rd file generation
      
      also updated `family` for DataFrame functions for longer descriptive text instead of `dataframe_funcs`
      ![image](https://cloud.githubusercontent.com/assets/8969467/10933937/17bf5b1e-8291-11e5-9777-40fc632105dc.png)
      
      this covers *most* of 'Commented code' but I left out a few that looks legitimate.
      
      Author: felixcheung <felixcheung_m@hotmail.com>
      
      Closes #9463 from felixcheung/rlintr.
      ed04846e
    • JihongMa's avatar
      [SPARK-11420] Updating Stddev support via Imperative Aggregate · d292f748
      JihongMa authored
      switched stddev support from DeclarativeAggregate to ImperativeAggregate.
      
      Author: JihongMa <linlin200605@gmail.com>
      
      Closes #9380 from JihongMA/SPARK-11420.
      d292f748
  13. Nov 11, 2015
  14. Nov 10, 2015
    • Yanbo Liang's avatar
      [ML][R] SparkR::glm summary result to compare with native R · f14e9511
      Yanbo Liang authored
      Follow up #9561. Due to [SPARK-11587](https://issues.apache.org/jira/browse/SPARK-11587) has been fixed, we should compare SparkR::glm summary result with native R output rather than hard-code one. mengxr
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #9590 from yanboliang/glm-r-test.
      f14e9511
    • Oscar D. Lara Yejas's avatar
      [SPARK-10863][SPARKR] Method coltypes() (New version) · 47735cdc
      Oscar D. Lara Yejas authored
      This is a follow up on PR #8984, as the corresponding branch for such PR was damaged.
      
      Author: Oscar D. Lara Yejas <olarayej@mail.usf.edu>
      
      Closes #9579 from olarayej/SPARK-10863_NEW14.
      47735cdc
    • Yin Huai's avatar
      [SPARK-9830][SQL] Remove AggregateExpression1 and Aggregate Operator used to... · e0701c75
      Yin Huai authored
      [SPARK-9830][SQL] Remove AggregateExpression1 and Aggregate Operator used to evaluate AggregateExpression1s
      
      https://issues.apache.org/jira/browse/SPARK-9830
      
      This PR contains the following main changes.
      * Removing `AggregateExpression1`.
      * Removing `Aggregate` operator, which is used to evaluate `AggregateExpression1`.
      * Removing planner rule used to plan `Aggregate`.
      * Linking `MultipleDistinctRewriter` to analyzer.
      * Renaming `AggregateExpression2` to `AggregateExpression` and `AggregateFunction2` to `AggregateFunction`.
      * Updating places where we create aggregate expression. The way to create aggregate expressions is `AggregateExpression(aggregateFunction, mode, isDistinct)`.
      * Changing `val`s in `DeclarativeAggregate`s that touch children of this function to `lazy val`s (when we create aggregate expression in DataFrame API, children of an aggregate function can be unresolved).
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #9556 from yhuai/removeAgg1.
      e0701c75
  15. Nov 09, 2015
  16. Nov 06, 2015
  17. Nov 05, 2015
  18. Nov 04, 2015
  19. Nov 03, 2015
  20. Nov 02, 2015
  21. Oct 30, 2015
  22. Oct 29, 2015
  23. Oct 28, 2015
  24. Oct 26, 2015
Loading