Commits · 4ee8191e57cb823a23ceca17908af86e70354554 · cs525-sp18-g07 / spark

Jan 25, 2016

[SPARK-12755][CORE] Stop the event logger before the DAG scheduler · 4ee8191e

Michael Allman authored 9 years ago

[SPARK-12755][CORE] Stop the event logger before the DAG scheduler to avoid a race condition where the standalone master attempts to build the app's history UI before the event log is stopped.

This contribution is my original work, and I license this work to the Spark project under the project's open source license.

Author: Michael Allman <michael@videoamp.com>

Closes #10700 from mallman/stop_event_logger_first.

4ee8191e

[SPARK-12932][JAVA API] improved error message for java type inference failure · d8e48052
Andy Grove authored 9 years ago
```
Author: Andy Grove <andygrove73@gmail.com>

Closes #10865 from andygrove/SPARK-12932.
```
d8e48052

[SPARK-12901][SQL] Refactor options for JSON and CSV datasource (not case class and same format). · 3adebfc9

hyukjinkwon authored 9 years ago

https://issues.apache.org/jira/browse/SPARK-12901
This PR refactors the options in JSON and CSV datasources.

In more details,

1. `JSONOptions` uses the same format as `CSVOptions`.
2. Not case classes.
3. `CSVRelation` that does not have to be serializable (it was `with Serializable` but I removed)

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #10895 from HyukjinKwon/SPARK-12901.

3adebfc9

Jan 24, 2016

[SPARK-12624][PYSPARK] Checks row length when converting Java arrays to Python rows · 3327fd28

Cheng Lian authored 9 years ago

When actual row length doesn't conform to specified schema field length, we should give a better error message instead of throwing an unintuitive `ArrayOutOfBoundsException`.

Author: Cheng Lian <lian@databricks.com>

Closes #10886 from liancheng/spark-12624.

3327fd28

[SPARK-12120][PYSPARK] Improve exception message when failing to init… · e789b1d2

Jeff Zhang authored 9 years ago

…ialize HiveContext in PySpark

davies Mind to review ?

This is the error message after this PR

```
15/12/03 16:59:53 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
/Users/jzhang/github/spark/python/pyspark/sql/context.py:689: UserWarning: You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt assembly
  warnings.warn("You must build Spark with Hive. "
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jzhang/github/spark/python/pyspark/sql/context.py", line 663, in read
    return DataFrameReader(self)
  File "/Users/jzhang/github/spark/python/pyspark/sql/readwriter.py", line 56, in __init__
    self._jreader = sqlContext._ssql_ctx.read()
  File "/Users/jzhang/github/spark/python/pyspark/sql/context.py", line 692, in _ssql_ctx
    raise e
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.sql.hive.HiveContext.
: java.lang.RuntimeException: java.net.ConnectException: Call From jzhangMBPr.local/127.0.0.1 to 0.0.0.0:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
	at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
	at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:194)
	at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:238)
	at org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:218)
	at org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:208)
	at org.apache.spark.sql.hive.HiveContext.functionRegistry$lzycompute(HiveContext.scala:462)
	at org.apache.spark.sql.hive.HiveContext.functionRegistry(HiveContext.scala:461)
	at org.apache.spark.sql.UDFRegistration.<init>(UDFRegistration.scala:40)
	at org.apache.spark.sql.SQLContext.<init>(SQLContext.scala:330)
	at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:90)
	at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:101)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
	at py4j.Gateway.invoke(Gateway.java:214)
	at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
	at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
	at py4j.GatewayConnection.run(GatewayConnection.java:209)
	at java.lang.Thread.run(Thread.java:745)
```

Author: Jeff Zhang <zjffdu@apache.org>

Closes #10126 from zjffdu/SPARK-12120.

e789b1d2

[SPARK-10498][TOOLS][BUILD] Add requirements.txt file for dev python tools · a8340013

Holden Karau authored 9 years ago

Minor since so few people use them, but it would probably be good to have a requirements file for our python release tools for easier setup (also version pinning).

cc JoshRosen who looked at the original JIRA.

Author: Holden Karau <holden@us.ibm.com>

Closes #10871 from holdenk/SPARK-10498-add-requirements-file-for-dev-python-tools.

a8340013

[SPARK-12971] Fix Hive tests which fail in Hadoop-2.3 SBT build · f4004601

Josh Rosen authored 9 years ago

ErrorPositionSuite and one of the HiveComparisonTest tests have been consistently failing on the Hadoop 2.3 SBT build (but on no other builds). I believe that this is due to test isolation issues (e.g. tests sharing state via the sets of temporary tables that are registered to TestHive).

This patch attempts to improve the isolation of these tests in order to address this issue.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #10884 from JoshRosen/fix-failing-hadoop-2.3-hive-tests.

f4004601

Jan 23, 2016

[STREAMING][MINOR] Scaladoc + logs · cfdcef70

Jacek Laskowski authored 9 years ago

Found while doing code review

Author: Jacek Laskowski <jacek@japila.pl>

Closes #10878 from jaceklaskowski/streaming-scaladoc-logs-tiny-fixes.

cfdcef70

[SPARK-12904][SQL] Strength reduction for integral and decimal literal comparisons · 423783a0

Reynold Xin authored 9 years ago

This pull request implements strength reduction for comparing integral expressions and decimal literals, which is more common now because we switch to parsing fractional literals as decimal types (rather than doubles). I added the rules to the existing DecimalPrecision rule with some refactoring to simplify the control flow. I also moved DecimalPrecision rule into its own file due to the growing size.

Author: Reynold Xin <rxin@databricks.com>

Closes #10882 from rxin/SPARK-12904-1.

423783a0

[SPARK-11137][STREAMING] Make StreamingContext.stop() exception-safe · 5f569801

jayadevanmurali authored 9 years ago

Make StreamingContext.stop() exception-safe

Author: jayadevanmurali <jayadevan.m@tcs.com>

Closes #10807 from jayadevanmurali/branch-0.1-SPARK-11137.

5f569801

[SPARK-12760][DOCS] inaccurate description for difference between local vs... · aca2a016

Sean Owen authored 9 years ago

[SPARK-12760][DOCS] inaccurate description for difference between local vs cluster mode in closure handling

Clarify that modifying a driver local variable won't have the desired effect in cluster modes, and may or may not work as intended in local mode

Author: Sean Owen <sowen@cloudera.com>

Closes #10866 from srowen/SPARK-12760.

aca2a016

[SPARK-12760][DOCS] invalid lambda expression in python example for … · 56f57f89

Mortada Mehyar authored 9 years ago

…local vs cluster

srowen thanks for the PR at https://github.com/apache/spark/pull/10866! sorry it took me a while.

This is related to https://github.com/apache/spark/pull/10866, basically the assignment in the lambda expression in the python example is actually invalid

```
In [1]: data = [1, 2, 3, 4, 5]
In [2]: counter = 0
In [3]: rdd = sc.parallelize(data)
In [4]: rdd.foreach(lambda x: counter += x)
  File "<ipython-input-4-fcb86c182bad>", line 1
    rdd.foreach(lambda x: counter += x)
                                   ^
SyntaxError: invalid syntax
```

Author: Mortada Mehyar <mortada.mehyar@gmail.com>

Closes #10867 from mortada/doc_python_fix.

56f57f89

[SPARK-12859][STREAMING][WEB UI] Names of input streams with receivers don't fit in Streaming page · 358a33bb

Alex Bozarth authored 9 years ago

Added CSS style to force names of input streams with receivers to wrap

Author: Alex Bozarth <ajbozart@us.ibm.com>

Closes #10873 from ajbozarth/spark12859.

358a33bb

[SPARK-12933][SQL] Initial implementation of Count-Min sketch · 1c690dda

Cheng Lian authored 9 years ago

This PR adds an initial implementation of count min sketch, contained in a new module spark-sketch under `common/sketch`. The implementation is based on the [`CountMinSketch` class in stream-lib][1].

As required by the [design doc][2], spark-sketch should have no external dependency.
Two classes, `Murmur3_x86_32` and `Platform` are copied to spark-sketch from spark-unsafe for hashing facilities. They'll also be used in the upcoming bloom filter implementation.

The following features will be added in future follow-up PRs:

- Serialization support
- DataFrame API integration

[1]: https://github.com/addthis/stream-lib/blob/aac6b4d23a8686b000f80baa447e0922ecac3bcb/src/main/java/com/clearspring/analytics/stream/frequency/CountMinSketch.java
[2]: https://issues.apache.org/jira/secure/attachment/12782378/BloomFilterandCount-MinSketchinSpark2.0.pdf

Author: Cheng Lian <lian@databricks.com>

Closes #10851 from liancheng/count-min-sketch.

1c690dda

[SPARK-12872][SQL] Support to specify the option for compression codec for JSON datasource · 5af5a021

hyukjinkwon authored 9 years ago

https://issues.apache.org/jira/browse/SPARK-12872

This PR makes the JSON datasource can compress output by option instead of manually setting Hadoop configurations.
For reflecting codec by names, it is similar with https://github.com/apache/spark/pull/10805.

As `CSVCompressionCodecs` can be shared with other datasources, it became a separate class to share as `CompressionCodecs`.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #10858 from HyukjinKwon/SPARK-12872.

5af5a021

[HOTFIX]Remove rpcEnv.awaitTermination to avoid dead-lock in some test · ea5c38fe
Shixiong Zhu authored 9 years ago
```
Looks rpcEnv.awaitTermination may block some tests forever. Just remove it and investigate the tests.
```
ea5c38fe

Jan 22, 2016

[SPARK-7997][CORE] Remove Akka from Spark Core and Streaming · bc1babd6

Shixiong Zhu authored 9 years ago

- Remove Akka dependency from core. Note: the streaming-akka project still uses Akka.
- Remove HttpFileServer
- Remove Akka configs from SparkConf and SSLOptions
- Rename `spark.akka.frameSize` to `spark.rpc.message.maxSize`. I think it's still worth to keep this config because using `DirectTaskResult` or `IndirectTaskResult`  depends on it.
- Update comments and docs

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #10854 from zsxwing/remove-akka.

bc1babd6

[HOTFIX][BUILD][TEST-MAVEN] Remove duplicate dependency · d8fefab4
Shixiong Zhu authored 9 years ago
```
Author: Shixiong Zhu <shixiong@databricks.com>

Closes #10868 from zsxwing/hotfix-akka-pom.
```
d8fefab4

[SPARK-12629][SPARKR] Fixes for DataFrame saveAsTable method · 8a88e121

Narine Kokhlikyan authored 9 years ago

I've tried to solve some of the issues mentioned in: https://issues.apache.org/jira/browse/SPARK-12629
Please, let me know what do you think.
Thanks!

Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com>

Closes #10580 from NarineK/sparkrSavaAsRable.

8a88e121

[SPARK-12959][SQL] Writing Bucketed Data with Disabled Bucketing in SQLConf · e13c147e

gatorsmile authored 9 years ago

When users turn off bucketing in SQLConf, we should issue some messages to tell users these operations will be converted to normal way.

Also added a test case for this scenario and fixed the helper function.

Do you think this PR is helpful when using bucket tables? cloud-fan Thank you!

Author: gatorsmile <gatorsmile@gmail.com>

Closes #10870 from gatorsmile/bucketTableWritingTestcases.

e13c147e

Jan 21, 2016

[SPARK-12960] [PYTHON] Some examples are missing support for python2 · 006906db

Mark Grover authored 9 years ago

Without importing the print_function, the lines later on like ```print("Usage: direct_kafka_wordcount.py <broker_list> <topic>", file=sys.stderr)``` fail when using python2.*. Import fixes that problem and doesn't break anything on python3 either.

Author: Mark Grover <mark@apache.org>

Closes #10872 from markgrover/python2_compat.

006906db

[SPARK-12747][SQL] Use correct type name for Postgres JDBC's real array · 55c7dd03

Liang-Chi Hsieh authored 9 years ago

https://issues.apache.org/jira/browse/SPARK-12747

Postgres JDBC driver uses "FLOAT4" or "FLOAT8" not "real".

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #10695 from viirya/fix-postgres-jdbc.

55c7dd03

[SPARK-12908][ML] Add warning message for LogisticRegression for potential converge issue · b4574e38

DB Tsai authored 9 years ago

When all labels are the same, it's a dangerous ground for LogisticRegression without intercept to converge. GLMNET doesn't support this case, and will just exit. GLM can train, but will have a warning message saying the algorithm doesn't converge.

Author: DB Tsai <dbt@netflix.com>

Closes #10862 from dbtsai/add-tests.

b4574e38

[SPARK-12534][DOC] update documentation to list command line equivalent to properties · 85200c09

felixcheung authored 9 years ago

Several Spark properties equivalent to Spark submit command line options are missing.

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #10491 from felixcheung/sparksubmitdoc.

85200c09

Jan 20, 2016

[SPARK-12204][SPARKR] Implement drop method for DataFrame in SparkR. · 1b2a918e
Sun Rui authored 9 years ago
```
Author: Sun Rui <rui.sun@intel.com>

Closes #10201 from sun-rui/SPARK-12204.
```
1b2a918e

[SPARK-12910] Fixes : R version for installing sparkR · d7415991

smishra8 authored 9 years ago

Testing code:
```
$ ./install-dev.sh
USING R_HOME = /usr/bin
ERROR: this R is version 2.15.1, package 'SparkR' requires R >= 3.0
```

Using the new argument:
```
$ ./install-dev.sh /content/username/SOFTWARE/R-3.2.3
USING R_HOME = /content/username/SOFTWARE/R-3.2.3/bin
* installing *source* package â€˜SparkRâ€™ ...
** R
** inst
** preparing package for lazy loading
Creating a new generic function for â€˜colnamesâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜colnames<-â€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜covâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜na.omitâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜filterâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜intersectâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜sampleâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜transformâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜subsetâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜summaryâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜lagâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜rankâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜sdâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜varâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜predictâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜rbindâ€™ in package â€˜SparkRâ€™
Creating a generic function for â€˜lapplyâ€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™
Creating a generic function for â€˜Filterâ€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™
Creating a generic function for â€˜aliasâ€™ from package â€˜statsâ€™ in package â€˜SparkRâ€™
Creating a generic function for â€˜substrâ€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™
Creating a generic function for â€˜%in%â€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™
Creating a generic function for â€˜meanâ€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™
Creating a generic function for â€˜uniqueâ€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™
Creating a generic function for â€˜nrowâ€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™
Creating a generic function for â€˜ncolâ€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™
Creating a generic function for â€˜headâ€™ from package â€˜utilsâ€™ in package â€˜SparkRâ€™
Creating a generic function for â€˜factorialâ€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™
Creating a generic function for â€˜atan2â€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™
Creating a generic function for â€˜ifelseâ€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™
** help
No man pages found in package  â€˜SparkRâ€™
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (SparkR)

```

Author: Shubhanshu Mishra <smishra8@illinois.edu>

Closes #10836 from napsternxg/master.

d7415991

[SPARK-8968] [SQL] [HOT-FIX] Fix scala 2.11 build. · d60f8d74
Yin Huai authored 9 years ago

d60f8d74

[SPARK-8968][SQL] external sort by the partition clomns when dynamic... · 015c8efb

wangfei authored 9 years ago

[SPARK-8968][SQL] external sort by the partition clomns when dynamic partitioning to optimize the memory overhead

Now the hash based writer dynamic partitioning show the bad performance for big data and cause many small files and high GC. This patch we do external sort first so that each time we only need open one writer.

before this patch:
![gc](https://cloud.githubusercontent.com/assets/7018048/9149788/edc48c6e-3dec-11e5-828c-9995b56e4d65.PNG)

after this patch:
![gc-optimize-externalsort](https://cloud.githubusercontent.com/assets/7018048/9149794/60f80c9c-3ded-11e5-8a56-7ae18ddc7a2f.png)

Author: wangfei <wangfei_hello@126.com>
Author: scwf <wangfei1@huawei.com>

Closes #7336 from scwf/dynamic-optimize-basedon-apachespark.

015c8efb

[SPARK-12797] [SQL] Generated TungstenAggregate (without grouping keys) · b362239d

Davies Liu authored 9 years ago

As discussed in #10786, the generated TungstenAggregate does not support imperative functions.

For a query
```
sqlContext.range(10).filter("id > 1").groupBy().count()
```

The generated code will looks like:
```
/* 032 */     if (!initAgg0) {
/* 033 */       initAgg0 = true;
/* 034 */
/* 035 */       // initialize aggregation buffer
/* 037 */       long bufValue2 = 0L;
/* 038 */
/* 039 */
/* 040 */       // initialize Range
/* 041 */       if (!range_initRange5) {
/* 042 */         range_initRange5 = true;
       ...
/* 071 */       }
/* 072 */
/* 073 */       while (!range_overflow8 && range_number7 < range_partitionEnd6) {
/* 074 */         long range_value9 = range_number7;
/* 075 */         range_number7 += 1L;
/* 076 */         if (range_number7 < range_value9 ^ 1L < 0) {
/* 077 */           range_overflow8 = true;
/* 078 */         }
/* 079 */
/* 085 */         boolean primitive11 = false;
/* 086 */         primitive11 = range_value9 > 1L;
/* 087 */         if (!false && primitive11) {
/* 092 */           // do aggregate and update aggregation buffer
/* 099 */           long primitive17 = -1L;
/* 100 */           primitive17 = bufValue2 + 1L;
/* 101 */           bufValue2 = primitive17;
/* 105 */         }
/* 107 */       }
/* 109 */
/* 110 */       // output the result
/* 112 */       bufferHolder25.reset();
/* 114 */       rowWriter26.initialize(bufferHolder25, 1);
/* 118 */       rowWriter26.write(0, bufValue2);
/* 120 */       result24.pointTo(bufferHolder25.buffer, bufferHolder25.totalSize());
/* 121 */       currentRow = result24;
/* 122 */       return;
/* 124 */     }
/* 125 */
```

cc nongli

Author: Davies Liu <davies@databricks.com>

Closes #10840 from davies/gen_agg.

b362239d

[SPARK-12848][SQL] Change parsed decimal literal datatype from Double to Decimal · 10173279

Herman van Hovell authored 9 years ago

The current parser turns a decimal literal, for example ```12.1```, into a Double. The problem with this approach is that we convert an exact literal into a non-exact ```Double```. The PR changes this behavior, a Decimal literal is now converted into an extact ```BigDecimal```.

The behavior for scientific decimals, for example ```12.1e01```, is unchanged. This will be converted into a Double.

This PR replaces the ```BigDecimal``` literal by a ```Double``` literal, because the ```BigDecimal``` is the default now. You can use the double literal by appending a 'D' to the value, for instance: ```3.141527D```

cc davies rxin

Author: Herman van Hovell <hvanhovell@questtec.nl>

Closes #10796 from hvanhovell/SPARK-12848.

10173279

[SPARK-12888][SQL] benchmark the new hash expression · f3934a8d

Wenchen Fan authored 9 years ago

Benchmark it on 4 different schemas, the result:
```
Intel(R) Core(TM) i7-4960HQ CPU  2.60GHz
Hash For simple:                   Avg Time(ms)    Avg Rate(M/s)  Relative Rate
-------------------------------------------------------------------------------
interpreted version                       31.47           266.54         1.00 X
codegen version                           64.52           130.01         0.49 X
```

```
Intel(R) Core(TM) i7-4960HQ CPU  2.60GHz
Hash For normal:                   Avg Time(ms)    Avg Rate(M/s)  Relative Rate
-------------------------------------------------------------------------------
interpreted version                     4068.11             0.26         1.00 X
codegen version                         1175.92             0.89         3.46 X
```

```
Intel(R) Core(TM) i7-4960HQ CPU  2.60GHz
Hash For array:                    Avg Time(ms)    Avg Rate(M/s)  Relative Rate
-------------------------------------------------------------------------------
interpreted version                     9276.70             0.06         1.00 X
codegen version                        14762.23             0.04         0.63 X
```

```
Intel(R) Core(TM) i7-4960HQ CPU  2.60GHz
Hash For map:                      Avg Time(ms)    Avg Rate(M/s)  Relative Rate
-------------------------------------------------------------------------------
interpreted version                    58869.79             0.01         1.00 X
codegen version                         9285.36             0.06         6.34 X
```

Author: Wenchen Fan <wenchen@databricks.com>

Closes #10816 from cloud-fan/hash-benchmark.

f3934a8d

[SPARK-12616][SQL] Making Logical Operator `Union` Support Arbitrary Number of Children · 8f90c151

gatorsmile authored 9 years ago

The existing `Union` logical operator only supports two children. Thus, adding a new logical operator `Unions` which can have arbitrary number of children to replace the existing one.

`Union` logical plan is a binary node. However, a typical use case for union is to union a very large number of input sources (DataFrames, RDDs, or files). It is not uncommon to union hundreds of thousands of files. In this case, our optimizer can become very slow due to the large number of logical unions. We should change the Union logical plan to support an arbitrary number of children, and add a single rule in the optimizer to collapse all adjacent `Unions` into a single `Unions`. Note that this problem doesn't exist in physical plan, because the physical `Unions` already supports arbitrary number of children.

Author: gatorsmile <gatorsmile@gmail.com>
Author: xiaoli <lixiao1983@gmail.com>
Author: Xiao Li <xiaoli@Xiaos-MacBook-Pro.local>

Closes #10577 from gatorsmile/unionAllMultiChildren.

8f90c151

[SPARK-7799][SPARK-12786][STREAMING] Add "streaming-akka" project · b7d74a60

Shixiong Zhu authored 9 years ago

Include the following changes:

1. Add "streaming-akka" project and org.apache.spark.streaming.akka.AkkaUtils for creating an actorStream
2. Remove "StreamingContext.actorStream" and "JavaStreamingContext.actorStream"
3. Update the ActorWordCount example and add the JavaActorWordCount example
4. Make "streaming-zeromq" depend on "streaming-akka" and update the codes accordingly

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #10744 from zsxwing/streaming-akka-2.

b7d74a60

[SPARK-12847][CORE][STREAMING] Remove StreamingListenerBus and post all... · 944fdadf

Shixiong Zhu authored 9 years ago

[SPARK-12847][CORE][STREAMING] Remove StreamingListenerBus and post all Streaming events to the same thread as Spark events

Including the following changes:

1. Add StreamingListenerForwardingBus to WrappedStreamingListenerEvent process events in `onOtherEvent` to StreamingListener
2. Remove StreamingListenerBus
3. Merge AsynchronousListenerBus and LiveListenerBus to the same class LiveListenerBus
4. Add `logEvent` method to SparkListenerEvent so that EventLoggingListener can use it to ignore WrappedStreamingListenerEvents

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #10779 from zsxwing/streaming-listener.

944fdadf

[SPARK-10263][ML] Add @Since annotation to ml.param and ml.* · e3727c40

Takahashi Hiroshi authored 9 years ago

Add Since annotations to ml.param and ml.*

Author: Takahashi Hiroshi <takahashi.hiroshi@lab.ntt.co.jp>
Author: Hiroshi Takahashi <takahashi.hiroshi@lab.ntt.co.jp>

Closes #8935 from taishi-oss/issue10263.

e3727c40

[SPARK-12898] Consider having dummyCallSite for HiveTableScan · ab4a6bfd

Rajesh Balamohan authored 9 years ago

Currently, HiveTableScan runs with getCallSite which is really expensive and shows up when scanning through large table with partitions (e.g TPC-DS) which slows down the overall runtime of the job. It would be good to consider having dummyCallSite in HiveTableScan.

Author: Rajesh Balamohan <rbalamohan@apache.org>

Closes #10825 from rajeshbalamohan/SPARK-12898.

ab4a6bfd

[SPARK-12925][SQL] Improve HiveInspectors.unwrap for StringObjectIns… · e75e340a

Rajesh Balamohan authored 9 years ago

Text is in UTF-8 and converting it via "UTF8String.fromString" incurs decoding and encoding, which turns out to be expensive and redundant. Profiler snapshot details is attached in the JIRA (ref:https://issues.apache.org/jira/secure/attachment/12783331/SPARK-12925_profiler_cpu_samples.png)

Author: Rajesh Balamohan <rbalamohan@apache.org>

Closes #10848 from rajeshbalamohan/SPARK-12925.

e75e340a

[SPARK-12230][ML] WeightedLeastSquares.fit() should handle division by zero... · 9753835c

Imran Younus authored 9 years ago

[SPARK-12230][ML] WeightedLeastSquares.fit() should handle division by zero properly if standard deviation of target variable is zero.

This fixes the behavior of WeightedLeastSquars.fit() when the standard deviation of the target variable is zero. If the fitIntercept is true, there is no need to train.

Author: Imran Younus <iyounus@us.ibm.com>

Closes #10274 from iyounus/SPARK-12230_bug_fix_in_weighted_least_squares.

9753835c

[SPARK-11295][PYSPARK] Add packages to JUnit output for Python tests · 9bb35c5b

Gábor Lipták authored 9 years ago

This is #9263 from gliptak (improving grouping/display of test case results) with a small fix of bisecting k-means unit test.

Author: Gábor Lipták <gliptak@gmail.com>
Author: Xiangrui Meng <meng@databricks.com>

Closes #10850 from mengxr/SPARK-11295.

9bb35c5b

[SPARK-6519][ML] Add spark.ml API for bisecting k-means · 9376ae72
Yu ISHIKAWA authored 9 years ago
```
Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #9604 from yu-iskw/SPARK-6519.
```
9376ae72