Skip to content
Snippets Groups Projects
  1. Oct 07, 2015
    • Marcelo Vanzin's avatar
      [SPARK-10300] [BUILD] [TESTS] Add support for test tags in run-tests.py. · 94fc57af
      Marcelo Vanzin authored
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #8775 from vanzin/SPARK-10300.
      94fc57af
    • Josh Rosen's avatar
      [SPARK-10941] [SQL] Refactor AggregateFunction2 and AlgebraicAggregate... · a9ecd061
      Josh Rosen authored
      [SPARK-10941] [SQL] Refactor AggregateFunction2 and AlgebraicAggregate interfaces to improve code clarity
      
      This patch refactors several of the Aggregate2 interfaces in order to improve code clarity.
      
      The biggest change is a refactoring of the `AggregateFunction2` class hierarchy. In the old code, we had a class named `AlgebraicAggregate` that inherited from `AggregateFunction2`, added a new set of methods, then banned the use of the inherited methods. I found this to be fairly confusing because.
      
      If you look carefully at the existing code, you'll see that subclasses of `AggregateFunction2` fall into two disjoint categories: imperative aggregation functions which directly extended `AggregateFunction2` and declarative, expression-based aggregate functions which extended `AlgebraicAggregate`. In order to make this more explicit, this patch refactors things so that `AggregateFunction2` is a sealed abstract class with two subclasses, `ImperativeAggregateFunction` and `ExpressionAggregateFunction`. The superclass, `AggregateFunction2`, now only contains methods and fields that are common to both subclasses.
      
      After making this change, I updated the various AggregationIterator classes to comply with this new naming scheme. I also performed several small renamings in the aggregate interfaces themselves in order to improve clarity and rewrote or expanded a number of comments.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #8973 from JoshRosen/tungsten-agg-comments.
      a9ecd061
    • Holden Karau's avatar
      [SPARK-9841] [ML] Make clear public · 5be5d247
      Holden Karau authored
      It is currently impossible to clear Param values once set. It would be helpful to be able to.
      
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #8619 from holdenk/SPARK-9841-params-clear-needs-to-be-public.
      5be5d247
    • Marcelo Vanzin's avatar
      [SPARK-10964] [YARN] Correctly register the AM with the driver. · 6ca27f85
      Marcelo Vanzin authored
      The `self` method returns null when called from the constructor;
      instead, registration should happen in the `onStart` method, at
      which point the `self` reference has already been initialized.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #9005 from vanzin/SPARK-10964.
      6ca27f85
    • Marcelo Vanzin's avatar
      [SPARK-10812] [YARN] Fix shutdown of token renewer. · 4b747551
      Marcelo Vanzin authored
      A recent change to fix the referenced bug caused this exception in
      the `SparkContext.stop()` path:
      
      org.apache.spark.SparkException: YarnSparkHadoopUtil is not available in non-YARN mode!
              at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.get(YarnSparkHadoopUtil.scala:167)
              at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.stop(YarnClientSchedulerBackend.scala:182)
              at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:440)
              at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1579)
              at org.apache.spark.SparkContext$$anonfun$stop$7.apply$mcV$sp(SparkContext.scala:1730)
              at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1185)
              at org.apache.spark.SparkContext.stop(SparkContext.scala:1729)
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #8996 from vanzin/SPARK-10812.
      4b747551
    • Michael Armbrust's avatar
      [SPARK-10966] [SQL] Codegen framework cleanup · f5d154bc
      Michael Armbrust authored
      This PR is mostly cosmetic and cleans up some warts in codegen (nearly all of which were inherited from the original quasiquote version).
       - Add lines numbers to errors (in stacktraces when debug logging is on, and always for compile fails)
       - Use a variable for input row instead of hardcoding "i" everywhere
       - rename `primitive` -> `value` (since its often actually an object)
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #9006 from marmbrus/codegen-cleanup.
      f5d154bc
    • Kevin Cox's avatar
      [SPARK-10952] Only add hive to classpath if HIVE_HOME is set. · 9672602c
      Kevin Cox authored
      Currently if it isn't set it scans `/lib/*` and adds every dir to the
      classpath which makes the env too large and every command called
      afterwords fails.
      
      Author: Kevin Cox <kevincox@kevincox.ca>
      
      Closes #8994 from kevincox/kevincox-only-add-hive-to-classpath-if-var-is-set.
      9672602c
    • Sun Rui's avatar
      [SPARK-10752] [SPARKR] Implement corr() and cov in DataFrameStatFunctions. · f57c63d4
      Sun Rui authored
      Author: Sun Rui <rui.sun@intel.com>
      
      Closes #8869 from sun-rui/SPARK-10752.
      f57c63d4
    • Xin Ren's avatar
      [SPARK-10669] [DOCS] Link to each language's API in codetabs in ML docs: spark.mllib · 27cdde2f
      Xin Ren authored
      In the Markdown docs for the spark.mllib Programming Guide, we have code examples with codetabs for each language. We should link to each language's API docs within the corresponding codetab, but we are inconsistent about this. For an example of what we want to do, see the "ChiSqSelector" section in https://github.com/apache/spark/blob/64743870f23bffb8d96dcc8a0181c1452782a151/docs/mllib-feature-extraction.md
      This JIRA is just for spark.mllib, not spark.ml.
      
      Please let me know if more work is needed, thanks a lot.
      
      Author: Xin Ren <iamshrek@126.com>
      
      Closes #8977 from keypointt/SPARK-10669.
      27cdde2f
  2. Oct 06, 2015
  3. Oct 05, 2015
    • zsxwing's avatar
      [SPARK-10900] [STREAMING] Add output operation events to StreamingListener · be7c5ff1
      zsxwing authored
      Add output operation events to StreamingListener so as to implement the following UI features:
      
      1. Progress bar of a batch in the batch list.
      2. Be able to display output operation `description` and `duration` when there is no spark job in a Streaming job.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #8958 from zsxwing/output-operation-events.
      be7c5ff1
    • Wenchen Fan's avatar
      [SPARK-10934] [SQL] handle hashCode of unsafe array correctly · a609eb20
      Wenchen Fan authored
      `Murmur3_x86_32.hashUnsafeWords` only accepts word-aligned bytes, but unsafe array is not.
      
      Author: Wenchen Fan <cloud0fan@163.com>
      
      Closes #8987 from cloud-fan/hash.
      a609eb20
    • Wenchen Fan's avatar
      [SPARK-10585] [SQL] only copy data once when generate unsafe projection · c4871369
      Wenchen Fan authored
      This PR is a completely rewritten of GenerateUnsafeProjection, to accomplish the goal of copying data only once. The old code of GenerateUnsafeProjection is still there to reduce review difficulty.
      
      Instead of creating unsafe conversion code for struct, array and map, we create code of writing the content to the global row buffer.
      
      Author: Wenchen Fan <cloud0fan@163.com>
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #8747 from cloud-fan/copy-once.
      c4871369
  4. Oct 04, 2015
  5. Oct 03, 2015
  6. Oct 02, 2015
  7. Oct 01, 2015
  8. Sep 30, 2015
    • Oscar D. Lara Yejas's avatar
      [SPARK-10807] [SPARKR] Added as.data.frame as a synonym for collect · f21e2da0
      Oscar D. Lara Yejas authored
      Created method as.data.frame as a synonym for collect().
      
      Author: Oscar D. Lara Yejas <olarayej@mail.usf.edu>
      Author: olarayej <oscar.lara.yejas@us.ibm.com>
      Author: Oscar D. Lara Yejas <oscar.lara.yejas@us.ibm.com>
      
      Closes #8908 from olarayej/SPARK-10807.
      f21e2da0
    • Nathan Howell's avatar
      [SPARK-9617] [SQL] Implement json_tuple · 89ea0041
      Nathan Howell authored
      This is an implementation of Hive's `json_tuple` function using Jackson Streaming.
      
      Author: Nathan Howell <nhowell@godaddy.com>
      
      Closes #7946 from NathanHowell/SPARK-9617.
      89ea0041
    • Reynold Xin's avatar
      [SPARK-10770] [SQL] SparkPlan.executeCollect/executeTake should return... · 03cca5dc
      Reynold Xin authored
      [SPARK-10770] [SQL] SparkPlan.executeCollect/executeTake should return InternalRow rather than external Row.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #8900 from rxin/SPARK-10770-1.
      03cca5dc
    • Sun Rui's avatar
      [SPARK-10851] [SPARKR] Exception not failing R applications (in yarn cluster mode) · c7b29ae6
      Sun Rui authored
      The YARN backend doesn't like when user code calls System.exit, since it cannot know the exit status and thus cannot set an appropriate final status for the application.
      
      This PR remove the usage of system.exit to exit the RRunner. Instead, when the R process running an SparkR script returns an exit code other than 0, throws SparkUserAppException which will be caught by ApplicationMaster and ApplicationMaster knows it failed. For other failures, throws SparkException.
      
      Author: Sun Rui <rui.sun@intel.com>
      
      Closes #8938 from sun-rui/SPARK-10851.
      c7b29ae6
Loading