Skip to content
Snippets Groups Projects
  1. Oct 06, 2015
  2. Oct 05, 2015
    • zsxwing's avatar
      [SPARK-10900] [STREAMING] Add output operation events to StreamingListener · be7c5ff1
      zsxwing authored
      Add output operation events to StreamingListener so as to implement the following UI features:
      
      1. Progress bar of a batch in the batch list.
      2. Be able to display output operation `description` and `duration` when there is no spark job in a Streaming job.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #8958 from zsxwing/output-operation-events.
      be7c5ff1
    • Wenchen Fan's avatar
      [SPARK-10934] [SQL] handle hashCode of unsafe array correctly · a609eb20
      Wenchen Fan authored
      `Murmur3_x86_32.hashUnsafeWords` only accepts word-aligned bytes, but unsafe array is not.
      
      Author: Wenchen Fan <cloud0fan@163.com>
      
      Closes #8987 from cloud-fan/hash.
      a609eb20
    • Wenchen Fan's avatar
      [SPARK-10585] [SQL] only copy data once when generate unsafe projection · c4871369
      Wenchen Fan authored
      This PR is a completely rewritten of GenerateUnsafeProjection, to accomplish the goal of copying data only once. The old code of GenerateUnsafeProjection is still there to reduce review difficulty.
      
      Instead of creating unsafe conversion code for struct, array and map, we create code of writing the content to the global row buffer.
      
      Author: Wenchen Fan <cloud0fan@163.com>
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #8747 from cloud-fan/copy-once.
      c4871369
  3. Oct 04, 2015
  4. Oct 03, 2015
  5. Oct 02, 2015
  6. Oct 01, 2015
  7. Sep 30, 2015
    • Oscar D. Lara Yejas's avatar
      [SPARK-10807] [SPARKR] Added as.data.frame as a synonym for collect · f21e2da0
      Oscar D. Lara Yejas authored
      Created method as.data.frame as a synonym for collect().
      
      Author: Oscar D. Lara Yejas <olarayej@mail.usf.edu>
      Author: olarayej <oscar.lara.yejas@us.ibm.com>
      Author: Oscar D. Lara Yejas <oscar.lara.yejas@us.ibm.com>
      
      Closes #8908 from olarayej/SPARK-10807.
      f21e2da0
    • Nathan Howell's avatar
      [SPARK-9617] [SQL] Implement json_tuple · 89ea0041
      Nathan Howell authored
      This is an implementation of Hive's `json_tuple` function using Jackson Streaming.
      
      Author: Nathan Howell <nhowell@godaddy.com>
      
      Closes #7946 from NathanHowell/SPARK-9617.
      89ea0041
    • Reynold Xin's avatar
      [SPARK-10770] [SQL] SparkPlan.executeCollect/executeTake should return... · 03cca5dc
      Reynold Xin authored
      [SPARK-10770] [SQL] SparkPlan.executeCollect/executeTake should return InternalRow rather than external Row.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #8900 from rxin/SPARK-10770-1.
      03cca5dc
    • Sun Rui's avatar
      [SPARK-10851] [SPARKR] Exception not failing R applications (in yarn cluster mode) · c7b29ae6
      Sun Rui authored
      The YARN backend doesn't like when user code calls System.exit, since it cannot know the exit status and thus cannot set an appropriate final status for the application.
      
      This PR remove the usage of system.exit to exit the RRunner. Instead, when the R process running an SparkR script returns an exit code other than 0, throws SparkUserAppException which will be caught by ApplicationMaster and ApplicationMaster knows it failed. For other failures, throws SparkException.
      
      Author: Sun Rui <rui.sun@intel.com>
      
      Closes #8938 from sun-rui/SPARK-10851.
      c7b29ae6
    • Herman van Hovell's avatar
      [SPARK-9741] [SQL] Approximate Count Distinct using the new UDAF interface. · 16fd2a2f
      Herman van Hovell authored
      This PR implements a HyperLogLog based Approximate Count Distinct function using the new UDAF interface.
      
      The implementation is inspired by the ClearSpring HyperLogLog implementation and should produce the same results.
      
      There is still some documentation and testing left to do.
      
      cc yhuai
      
      Author: Herman van Hovell <hvanhovell@questtec.nl>
      
      Closes #8362 from hvanhovell/SPARK-9741.
      16fd2a2f
    • Yanbo Liang's avatar
      [SPARK-10736] [ML] Use 1 for all ratings if $(ratingCol) = "" · 2931e89f
      Yanbo Liang authored
      For some implicit dataset, ratings may not exist in the training data. In this case, we can assume all observed pairs to be positive and treat their ratings as 1. This should happen when users set ```ratingCol``` to an empty string.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #8937 from yanboliang/spark-10736.
      2931e89f
    • Cheng Lian's avatar
      [SPARK-10811] [SQL] Eliminates unnecessary byte array copying · 4d5a005b
      Cheng Lian authored
      When reading Parquet string and binary-backed decimal values, Parquet `Binary.getBytes` always returns a copied byte array, which is unnecessary. Since the underlying implementation of `Binary` values there is guaranteed to be `ByteArraySliceBackedBinary`, and Parquet itself never reuses underlying byte arrays, we can use `Binary.toByteBuffer.array()` to steal the underlying byte arrays without copying them.
      
      This brings performance benefits when scanning Parquet string and binary-backed decimal columns. Note that, this trick doesn't cover binary-backed decimals with precision greater than 18.
      
      My micro-benchmark result is that, this brings a ~15% performance boost for scanning TPC-DS `store_sales` table (scale factor 15).
      
      Another minor optimization done in this PR is that, now we directly construct a Java `BigDecimal` in `Decimal.toJavaBigDecimal` without constructing a Scala `BigDecimal` first. This brings another ~5% performance gain.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #8907 from liancheng/spark-10811/eliminate-array-copying.
      4d5a005b
  8. Sep 29, 2015
Loading