Skip to content
Snippets Groups Projects
  1. Jan 20, 2016
    • gatorsmile's avatar
      [SPARK-12616][SQL] Making Logical Operator `Union` Support Arbitrary Number of Children · 8f90c151
      gatorsmile authored
      The existing `Union` logical operator only supports two children. Thus, adding a new logical operator `Unions` which can have arbitrary number of children to replace the existing one.
      
      `Union` logical plan is a binary node. However, a typical use case for union is to union a very large number of input sources (DataFrames, RDDs, or files). It is not uncommon to union hundreds of thousands of files. In this case, our optimizer can become very slow due to the large number of logical unions. We should change the Union logical plan to support an arbitrary number of children, and add a single rule in the optimizer to collapse all adjacent `Unions` into a single `Unions`. Note that this problem doesn't exist in physical plan, because the physical `Unions` already supports arbitrary number of children.
      
      Author: gatorsmile <gatorsmile@gmail.com>
      Author: xiaoli <lixiao1983@gmail.com>
      Author: Xiao Li <xiaoli@Xiaos-MacBook-Pro.local>
      
      Closes #10577 from gatorsmile/unionAllMultiChildren.
      8f90c151
    • Shixiong Zhu's avatar
      [SPARK-7799][SPARK-12786][STREAMING] Add "streaming-akka" project · b7d74a60
      Shixiong Zhu authored
      Include the following changes:
      
      1. Add "streaming-akka" project and org.apache.spark.streaming.akka.AkkaUtils for creating an actorStream
      2. Remove "StreamingContext.actorStream" and "JavaStreamingContext.actorStream"
      3. Update the ActorWordCount example and add the JavaActorWordCount example
      4. Make "streaming-zeromq" depend on "streaming-akka" and update the codes accordingly
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #10744 from zsxwing/streaming-akka-2.
      b7d74a60
    • Shixiong Zhu's avatar
      [SPARK-12847][CORE][STREAMING] Remove StreamingListenerBus and post all... · 944fdadf
      Shixiong Zhu authored
      [SPARK-12847][CORE][STREAMING] Remove StreamingListenerBus and post all Streaming events to the same thread as Spark events
      
      Including the following changes:
      
      1. Add StreamingListenerForwardingBus to WrappedStreamingListenerEvent process events in `onOtherEvent` to StreamingListener
      2. Remove StreamingListenerBus
      3. Merge AsynchronousListenerBus and LiveListenerBus to the same class LiveListenerBus
      4. Add `logEvent` method to SparkListenerEvent so that EventLoggingListener can use it to ignore WrappedStreamingListenerEvents
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #10779 from zsxwing/streaming-listener.
      944fdadf
    • Takahashi Hiroshi's avatar
      [SPARK-10263][ML] Add @Since annotation to ml.param and ml.* · e3727c40
      Takahashi Hiroshi authored
      Add Since annotations to ml.param and ml.*
      
      Author: Takahashi Hiroshi <takahashi.hiroshi@lab.ntt.co.jp>
      Author: Hiroshi Takahashi <takahashi.hiroshi@lab.ntt.co.jp>
      
      Closes #8935 from taishi-oss/issue10263.
      e3727c40
    • Rajesh Balamohan's avatar
      [SPARK-12898] Consider having dummyCallSite for HiveTableScan · ab4a6bfd
      Rajesh Balamohan authored
      Currently, HiveTableScan runs with getCallSite which is really expensive and shows up when scanning through large table with partitions (e.g TPC-DS) which slows down the overall runtime of the job. It would be good to consider having dummyCallSite in HiveTableScan.
      
      Author: Rajesh Balamohan <rbalamohan@apache.org>
      
      Closes #10825 from rajeshbalamohan/SPARK-12898.
      ab4a6bfd
    • Rajesh Balamohan's avatar
      [SPARK-12925][SQL] Improve HiveInspectors.unwrap for StringObjectIns… · e75e340a
      Rajesh Balamohan authored
      Text is in UTF-8 and converting it via "UTF8String.fromString" incurs decoding and encoding, which turns out to be expensive and redundant.  Profiler snapshot details is attached in the JIRA (ref:https://issues.apache.org/jira/secure/attachment/12783331/SPARK-12925_profiler_cpu_samples.png)
      
      Author: Rajesh Balamohan <rbalamohan@apache.org>
      
      Closes #10848 from rajeshbalamohan/SPARK-12925.
      e75e340a
    • Imran Younus's avatar
      [SPARK-12230][ML] WeightedLeastSquares.fit() should handle division by zero... · 9753835c
      Imran Younus authored
      [SPARK-12230][ML] WeightedLeastSquares.fit() should handle division by zero properly if standard deviation of target variable is zero.
      
      This fixes the behavior of WeightedLeastSquars.fit() when the standard deviation of the target variable is zero. If the fitIntercept is true, there is no need to train.
      
      Author: Imran Younus <iyounus@us.ibm.com>
      
      Closes #10274 from iyounus/SPARK-12230_bug_fix_in_weighted_least_squares.
      9753835c
    • Gábor Lipták's avatar
      [SPARK-11295][PYSPARK] Add packages to JUnit output for Python tests · 9bb35c5b
      Gábor Lipták authored
      This is #9263 from gliptak (improving grouping/display of test case results) with a small fix of bisecting k-means unit test.
      
      Author: Gábor Lipták <gliptak@gmail.com>
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #10850 from mengxr/SPARK-11295.
      9bb35c5b
    • Yu ISHIKAWA's avatar
      [SPARK-6519][ML] Add spark.ml API for bisecting k-means · 9376ae72
      Yu ISHIKAWA authored
      Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
      
      Closes #9604 from yu-iskw/SPARK-6519.
      9376ae72
    • Davies Liu's avatar
      [SPARK-12881] [SQL] subexpress elimination in mutable projection · 8e4f894e
      Davies Liu authored
      Author: Davies Liu <davies@databricks.com>
      
      Closes #10814 from davies/mutable_subexpr.
      8e4f894e
    • Reynold Xin's avatar
      [SPARK-12912][SQL] Add a test suite for EliminateSubQueries · 753b1945
      Reynold Xin authored
      Also updated documentation to explain why ComputeCurrentTime and EliminateSubQueries are in the optimizer rather than analyzer.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #10837 from rxin/optimizer-analyzer-comment.
      753b1945
  2. Jan 19, 2016
  3. Jan 18, 2016
Loading