Commits · 18ea11c3a84e5eafd81fa0fe7c09224e79c4e93f · cs525-sp18-g07 / spark

Dec 16, 2015

Revert "[HOTFIX] Compile error from commit 31b39101" · 18ea11c3
Reynold Xin authored 9 years ago
```
This reverts commit 840bd2e0.
```
18ea11c3
Style fix for the previous 3 JDBC filter push down commits. · 554d840a
Reynold Xin authored 9 years ago

554d840a

[SPARK-12315][SQL] isnotnull operator not pushed down for JDBC datasource. · 2aad2d37

hyukjinkwon authored 9 years ago

https://issues.apache.org/jira/browse/SPARK-12315
`IsNotNull` filter is not being pushed down for JDBC datasource.

It looks it is SQL standard according to [SQL-92](http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt), SQL:1999, [SQL:2003](http://www.wiscorp.com/sql_2003_standard.zip) and [SQL:201x](http://www.wiscorp.com/sql20nn.zip) and I believe most databases support this.

In this PR, I simply added the case for `IsNotNull` filter to produce a proper filter string.

Author: hyukjinkwon <gurwls223@gmail.com>

This patch had conflicts when merged, resolved by
Committer: Reynold Xin <rxin@databricks.com>

Closes #10287 from HyukjinKwon/SPARK-12315.

2aad2d37

[SPARK-12314][SQL] isnull operator not pushed down for JDBC datasource. · 7f443a68

hyukjinkwon authored 9 years ago

https://issues.apache.org/jira/browse/SPARK-12314
`IsNull` filter is not being pushed down for JDBC datasource.

It looks it is SQL standard according to [SQL-92](http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt), SQL:1999, [SQL:2003](http://www.wiscorp.com/sql_2003_standard.zip) and [SQL:201x](http://www.wiscorp.com/sql20nn.zip) and I believe most databases support this.

In this PR, I simply added the case for `IsNull` filter to produce a proper filter string.

Author: hyukjinkwon <gurwls223@gmail.com>

This patch had conflicts when merged, resolved by
Committer: Reynold Xin <rxin@databricks.com>

Closes #10286 from HyukjinKwon/SPARK-12314.

7f443a68

[SPARK-12249][SQL] JDBC non-equality comparison operator not pushed down. · 0f6936b5

hyukjinkwon authored 9 years ago

https://issues.apache.org/jira/browse/SPARK-12249
Currently `!=` operator is not pushed down correctly.
I simply added a case for this.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #10233 from HyukjinKwon/SPARK-12249.

0f6936b5

Dec 15, 2015

[SPARK-12304][STREAMING] Make Spark Streaming web UI display more fri… · d52bf47e

proflin authored 9 years ago

…endly Receiver graphs

Currently, the Spark Streaming web UI uses the same maxY when displays 'Input Rate Times& Histograms' and 'Per-Receiver Times& Histograms'.

This may lead to somewhat un-friendly graphs: once we have tens of Receivers or more, every 'Per-Receiver Times' line almost hits the ground.

This issue proposes to calculate a new maxY against the original one, which is shared among all the `Per-Receiver Times& Histograms' graphs.

Before:
![before-5](https://cloud.githubusercontent.com/assets/15843379/11761362/d790c356-a0fa-11e5-860e-4b834603de1d.png)

After:
![after-5](https://cloud.githubusercontent.com/assets/15843379/11761361/cfabf692-a0fa-11e5-97d0-4ad124aaca2a.png)

Author: proflin <proflin.me@gmail.com>

Closes #10318 from proflin/SPARK-12304.

d52bf47e

[SPARK-4117][YARN] Spark on Yarn handle AM being told command from RM · ca0690b5

Devaraj K authored 9 years ago

Spark on Yarn handle AM being told command from RM

When RM throws ApplicationAttemptNotFoundException for allocate
invocation, making the ApplicationMaster to finish immediately without any
retries.

Author: Devaraj K <devaraj@apache.org>

Closes #10129 from devaraj-kavali/SPARK-4117.

ca0690b5

[SPARK-10477][SQL] using DSL in ColumnPruningSuite to improve readability · a89e8b61
Wenchen Fan authored 9 years ago
```
Author: Wenchen Fan <cloud0fan@outlook.com>

Closes #8645 from cloud-fan/test.
```
a89e8b61

[SPARK-12062][CORE] Change Master to asyc rebuild UI when application completes · c5b6b398

Bryan Cutler authored 9 years ago

This change builds the event history of completed apps asynchronously so the RPC thread will not be blocked and allow new workers to register/remove if the event log history is very large and takes a long time to rebuild.

Author: Bryan Cutler <bjcutler@us.ibm.com>

Closes #10284 from BryanCutler/async-MasterUI-SPARK-12062.

c5b6b398

[SPARK-9886][CORE] Fix to use ShutdownHookManager in · 8a215d23

Naveen authored 9 years ago

ExternalBlockStore.scala

Author: Naveen <naveenminchu@gmail.com>

Closes #10313 from naveenminchu/branch-fix-SPARK-9886.

8a215d23

[SPARK-10123][DEPLOY] Support specifying deploy mode from configuration · 63ccdef8

jerryshao authored 9 years ago

Please help to review, thanks a lot.

Author: jerryshao <sshao@hortonworks.com>

Closes #10195 from jerryshao/SPARK-10123.

63ccdef8

[SPARK-9026][SPARK-4514] Modifications to JobWaiter, FutureAction, and... · 765a4884

Richard W. Eggert II authored 9 years ago

[SPARK-9026][SPARK-4514] Modifications to JobWaiter, FutureAction, and AsyncRDDActions to support non-blocking operation

These changes rework the implementations of `SimpleFutureAction`, `ComplexFutureAction`, `JobWaiter`, and `AsyncRDDActions` such that asynchronous callbacks on the generated `Futures` NEVER block waiting for a job to complete. A small amount of mutex synchronization is necessary to protect the internal fields that manage cancellation, but these locks are only held very briefly and in practice should almost never cause any blocking to occur. The existing blocking APIs of these classes are retained, but they simply delegate to the underlying non-blocking API and `Await` the results with indefinite timeouts.

Associated JIRA ticket: https://issues.apache.org/jira/browse/SPARK-9026
Also fixes: https://issues.apache.org/jira/browse/SPARK-4514

This pull request contains all my own original work, which I release to the Spark project under its open source license.

Author: Richard W. Eggert II <richard.eggert@gmail.com>

Closes #9264 from reggert/fix-futureaction.

765a4884

[SPARK-9516][UI] Improvement of Thread Dump Page · a63d9edc

CodingCat authored 9 years ago

https://issues.apache.org/jira/browse/SPARK-9516

- [x] new look of Thread Dump Page

- [x] click column title to sort

- [x] grep

- [x] search as you type

squito JoshRosen It's ready for the review now

Author: CodingCat <zhunansjtu@gmail.com>

Closes #7910 from CodingCat/SPARK-9516.

a63d9edc

[SPARK-12351][MESOS] Add documentation about submitting Spark with mesos cluster mode. · c2de99a7

Timothy Chen authored 9 years ago

Adding more documentation about submitting jobs with mesos cluster mode.

Author: Timothy Chen <tnachen@gmail.com>

Closes #10086 from tnachen/mesos_supervise_docs.

c2de99a7

[SPARK-12130] Replace shuffleManagerClass with shortShuffleMgrNames in ExternalShuffleBlockResolver · 369127f0

Lianhui Wang authored 9 years ago

Replace shuffleManagerClassName with shortShuffleMgrName is to reduce time of string's comparison. and put sort's comparison on the front. cc JoshRosen andrewor14

Author: Lianhui Wang <lianhuiwang09@gmail.com>

Closes #10131 from lianhuiwang/spark-12130.

369127f0

[SPARK-12056][CORE] Part 2 Create a TaskAttemptContext only after calling setConf · f725b2ec

tedyu authored 9 years ago

This is continuation of SPARK-12056 where change is applied to SqlNewHadoopRDD.scala

andrewor14
FYI

Author: tedyu <yuzhihong@gmail.com>

Closes #10164 from tedyu/master.

f725b2ec

[HOTFIX] Compile error from commit 31b39101 · 840bd2e0
Andrew Or authored 9 years ago

840bd2e0
[SPARK-12105] [SQL] add convenient show functions · 31b39101
Jean-Baptiste Onofré authored 9 years ago
```
Author: Jean-Baptiste Onofré <jbonofre@apache.org>

Closes #10130 from jbonofre/SPARK-12105.
```
31b39101

[SPARK-12236][SQL] JDBC filter tests all pass if filters are not really pushed down · 28112657

hyukjinkwon authored 9 years ago

https://issues.apache.org/jira/browse/SPARK-12236
Currently JDBC filters are not tested properly. All the tests pass even if the filters are not pushed down due to Spark-side filtering.

In this PR,
Firstly, I corrected the tests to properly check the pushed down filters by removing Spark-side filtering.
Also, `!=` was being tested which is actually not pushed down. So I removed them.
Lastly, I moved the `stripSparkFilter()` function to `SQLTestUtils` as this functions would be shared for all tests for pushed down filters. This function would be also shared with ORC datasource as the filters for that are also not being tested properly.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #10221 from HyukjinKwon/SPARK-12236.

28112657

[SPARK-12271][SQL] Improve error message when Dataset.as[ ] has incompatible schemas. · 86ea64dd
Nong Li authored 9 years ago
```
Author: Nong Li <nong@databricks.com>

Closes #10260 from nongli/spark-11271.
```
86ea64dd

[MINOR][ML] Rename weights to coefficients for examples/DeveloperApiExample · b24c12d7

Yanbo Liang authored 9 years ago

Rename ```weights``` to ```coefficients``` for examples/DeveloperApiExample.

cc mengxr jkbradley

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10280 from yanboliang/spark-coefficients.

b24c12d7

[STREAMING][MINOR] Fix typo in function name of StateImpl · bc1ff9f4

jerryshao authored 9 years ago

cc\ tdas zsxwing , please review. Thanks a lot.

Author: jerryshao <sshao@hortonworks.com>

Closes #10305 from jerryshao/fix-typo-state-impl.

bc1ff9f4

[SPARK-12332][TRIVIAL][TEST] Fix minor typo in ResetSystemProperties · c59df8c5

Holden Karau authored 9 years ago

Fix a minor typo (unbalanced bracket) in ResetSystemProperties.

Author: Holden Karau <holden@us.ibm.com>

Closes #10303 from holdenk/SPARK-12332-trivial-typo-in-ResetSystemProperties-comment.

c59df8c5

Dec 14, 2015

[SPARK-12288] [SQL] Support UnsafeRow in Coalesce/Except/Intersect. · 606f99b9

gatorsmile authored 9 years ago

Support UnsafeRow for the Coalesce/Except/Intersect.

Could you review if my code changes are ok? davies Thank you!

Author: gatorsmile <gatorsmile@gmail.com>

Closes #10285 from gatorsmile/unsafeSupportCIE.

606f99b9

[SPARK-12188][SQL][FOLLOW-UP] Code refactoring and comment correction in Dataset APIs · d13ff82c

gatorsmile authored 9 years ago

marmbrus This PR is to address your comment. Thanks for your review!

Author: gatorsmile <gatorsmile@gmail.com>

Closes #10214 from gatorsmile/followup12188.

d13ff82c

[SPARK-12274][SQL] WrapOption should not have type constraint for child · 9ea1a8ef

Wenchen Fan authored 9 years ago

I think it was a mistake, and we have not catched it so far until https://github.com/apache/spark/pull/10260 which begin to check if the `fromRowExpression` is resolved.

Author: Wenchen Fan <wenchen@databricks.com>

Closes #10263 from cloud-fan/encoder.

9ea1a8ef

[SPARK-12327] Disable commented code lintr temporarily · fb3778de

Shivaram Venkataraman authored 9 years ago

cc yhuai felixcheung shaneknapp

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #10300 from shivaram/comment-lintr-disable.

fb3778de

[SPARK-12016] [MLLIB] [PYSPARK] Wrap Word2VecModel when loading it in pyspark · b51a4cdf

Liang-Chi Hsieh authored 9 years ago

JIRA: https://issues.apache.org/jira/browse/SPARK-12016

We should not directly use Word2VecModel in pyspark. We need to wrap it in a Word2VecModelWrapper when loading it in pyspark.

Author: Liang-Chi Hsieh <viirya@appier.com>

Closes #10100 from viirya/fix-load-py-wordvecmodel.

b51a4cdf

[MINOR][DOC] Fix broken word2vec link · e25f1fe4

BenFradet authored 9 years ago

Follow-up of [SPARK-12199](https://issues.apache.org/jira/browse/SPARK-12199) and #10193 where a broken link has been left as is.

Author: BenFradet <benjamin.fradet@gmail.com>

Closes #10282 from BenFradet/SPARK-12199.

e25f1fe4

[SPARK-12275][SQL] No plan for BroadcastHint in some condition · ed87f6d3

yucai authored 9 years ago

When SparkStrategies.BasicOperators's "case BroadcastHint(child) => apply(child)" is hit, it only recursively invokes BasicOperators.apply with this "child". It makes many strategies have no change to process this plan, which probably leads to "No plan" issue, so we use planLater to go through all strategies.

https://issues.apache.org/jira/browse/SPARK-12275

Author: yucai <yucai.yu@intel.com>

Closes #10265 from yucai/broadcast_hint.

ed87f6d3

[SPARK-12213][SQL] use multiple partitions for single distinct query · 834e7148

Davies Liu authored 9 years ago

Currently, we could generate different plans for query with single distinct (depends on spark.sql.specializeSingleDistinctAggPlanning), one works better on low cardinality columns, the other
works better for high cardinality column (default one).

This PR change to generate a single plan (three aggregations and two exchanges), which work better in both cases, then we could safely remove the flag `spark.sql.specializeSingleDistinctAggPlanning` (introduced in 1.6).

For a query like `SELECT COUNT(DISTINCT a) FROM table` will be
```
AGG-4 (count distinct)
  Shuffle to a single reducer
    Partial-AGG-3 (count distinct, no grouping)
      Partial-AGG-2 (grouping on a)
        Shuffle by a
          Partial-AGG-1 (grouping on a)
```

This PR also includes large refactor for aggregation (reduce 500+ lines of code)

cc yhuai nongli marmbrus

Author: Davies Liu <davies@databricks.com>

Closes #10228 from davies/single_distinct.

834e7148

[SPARK-12281][CORE] Fix a race condition when reporting ExecutorState in the shutdown hook · 2aecda28

Shixiong Zhu authored 9 years ago

1. Make sure workers and masters exit so that no worker or master will still be running when triggering the shutdown hook.
2. Set ExecutorState to FAILED if it's still RUNNING when executing the shutdown hook.

This should fix the potential exceptions when exiting a local cluster
```
java.lang.AssertionError: assertion failed: executor 4 state transfer from RUNNING to RUNNING is illegal
	at scala.Predef$.assert(Predef.scala:179)
	at org.apache.spark.deploy.master.Master$$anonfun$receive$1.applyOrElse(Master.scala:260)
	at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
	at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
	at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
	at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

java.lang.IllegalStateException: Shutdown hooks cannot be modified during shutdown.
	at org.apache.spark.util.SparkShutdownHookManager.add(ShutdownHookManager.scala:246)
	at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:191)
	at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:180)
	at org.apache.spark.deploy.worker.ExecutorRunner.start(ExecutorRunner.scala:73)
	at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:474)
	at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
	at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
	at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
	at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
```

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #10269 from zsxwing/executor-state.

2aecda28

Dec 12, 2015

[SPARK-12267][CORE] Store the remote RpcEnv address to send the correct disconnetion message · 8af2f8c6
Shixiong Zhu authored 9 years ago
```
Author: Shixiong Zhu <shixiong@databricks.com>

Closes #10261 from zsxwing/SPARK-12267.
```
8af2f8c6

[SPARK-12199][DOC] Follow-up: Refine example code in ml-features.md · 98b212d3

Xusen Yin authored 9 years ago

https://issues.apache.org/jira/browse/SPARK-12199

Follow-up PR of SPARK-11551. Fix some errors in ml-features.md

mengxr

Author: Xusen Yin <yinxusen@gmail.com>

Closes #10193 from yinxusen/SPARK-12199.

98b212d3

[SPARK-11193] Use Java ConcurrentHashMap instead of SynchronizedMap trait in... · 03138b67

Jean-Baptiste Onofré authored 9 years ago

[SPARK-11193] Use Java ConcurrentHashMap instead of SynchronizedMap trait in order to avoid ClassCastException due to KryoSerializer in KinesisReceiver

Author: Jean-Baptiste Onofré <jbonofre@apache.org>

Closes #10203 from jbonofre/SPARK-11193.

03138b67

Dec 11, 2015

[SPARK-12158][SPARKR][SQL] Fix 'sample' functions that break R unit test cases · 1e3526c2

gatorsmile authored 9 years ago

The existing sample functions miss the parameter `seed`, however, the corresponding function interface in `generics` has such a parameter. Thus, although the function caller can call the function with the 'seed', we are not using the value.

This could cause SparkR unit tests failed. For example, I hit it in another PR:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47213/consoleFull

Author: gatorsmile <gatorsmile@gmail.com>

Closes #10160 from gatorsmile/sampleR.

1e3526c2

[SPARK-12298][SQL] Fix infinite loop in DataFrame.sortWithinPartitions · 1e799d61

Ankur Dave authored 9 years ago

Modifies the String overload to call the Column overload and ensures this is called in a test.

Author: Ankur Dave <ankurdave@gmail.com>

Closes #10271 from ankurdave/SPARK-12298.

1e799d61

[SPARK-11978][ML] Move dataset_example.py to examples/ml and rename to dataframe_example.py · a0ff6d16

Yanbo Liang authored 9 years ago

Since ```Dataset``` has a new meaning in Spark 1.6, we should rename it to avoid confusion.
#9873 finished the work of Scala example, here we focus on the Python one.
Move dataset_example.py to ```examples/ml``` and rename to ```dataframe_example.py```.
BTW, fix minor missing issues of #9873.
cc mengxr

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #9957 from yanboliang/SPARK-11978.

a0ff6d16

[SPARK-12217][ML] Document invalid handling for StringIndexer · aea676ca

BenFradet authored 9 years ago

Added a paragraph regarding StringIndexer#setHandleInvalid to the ml-features documentation.

I wonder if I should also add a snippet to the code example, input welcome.

Author: BenFradet <benjamin.fradet@gmail.com>

Closes #10257 from BenFradet/SPARK-12217.

aea676ca

[SPARK-11497][MLLIB][PYTHON] PySpark RowMatrix Constructor Has Type Erasure Issue · 1b822038

Mike Dusenberry authored 9 years ago

As noted in PR #9441, implementing `tallSkinnyQR` uncovered a bug with our PySpark `RowMatrix` constructor. As discussed on the dev list [here](http://apache-spark-developers-list.1001551.n3.nabble.com/K-Means-And-Class-Tags-td10038.html), there appears to be an issue with type erasure with RDDs coming from Java, and by extension from PySpark. Although we are attempting to construct a `RowMatrix` from an `RDD[Vector]` in [PythonMLlibAPI](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala#L1115), the `Vector` type is erased, resulting in an `RDD[Object]`. Thus, when calling Scala's `tallSkinnyQR` from PySpark, we get a Java `ClassCastException` in which an `Object` cannot be cast to a Spark `Vector`. As noted in the aforementioned dev list thread, this issue was also encountered with `DecisionTrees`, and the fix involved an explicit `retag` of the RDD with a `Vector` type. `IndexedRowMatrix` and `CoordinateMatrix` do not appear to have this issue likely due to their related helper functions in `PythonMLlibAPI` creating the RDDs explicitly from DataFrames with pattern matching, thus preserving the types.

This PR currently contains that retagging fix applied to the `createRowMatrix` helper function in `PythonMLlibAPI`. This PR blocks #9441, so once this is merged, the other can be rebased.

cc holdenk

Author: Mike Dusenberry <mwdusenb@us.ibm.com>

Closes #9458 from dusenberrymw/SPARK-11497_PySpark_RowMatrix_Constructor_Has_Type_Erasure_Issue.

1b822038