Commits · 38d9795a4fa07086d65ff705ce86648345618736 · cs525-sp18-g07 / spark

Dec 16, 2015

[SPARK-10248][CORE] track exceptions in dagscheduler event loop in tests · 38d9795a

Imran Rashid authored 9 years ago

`DAGSchedulerEventLoop` normally only logs errors (so it can continue to process more events, from other jobs).  However, this is not desirable in the tests -- the tests should be able to easily detect any exception, and also shouldn't silently succeed if there is an exception.

This was suggested by mateiz on https://github.com/apache/spark/pull/7699.  It may have already turned up an issue in "zero split job".

Author: Imran Rashid <irashid@cloudera.com>

Closes #8466 from squito/SPARK-10248.

38d9795a

MAINTENANCE: Automated closing of pull requests. · ce5fd400

Andrew Or authored 9 years ago

This commit exists to close the following pull requests on Github:

Closes #1217 (requested by ankurdave, srowen)
Closes #4650 (requested by andrewor14)
Closes #5307 (requested by vanzin)
Closes #5664 (requested by andrewor14)
Closes #5713 (requested by marmbrus)
Closes #5722 (requested by andrewor14)
Closes #6685 (requested by srowen)
Closes #7074 (requested by srowen)
Closes #7119 (requested by andrewor14)
Closes #7997 (requested by jkbradley)
Closes #8292 (requested by srowen)
Closes #8975 (requested by andrewor14, vanzin)
Closes #8980 (requested by andrewor14, davies)

ce5fd400

[MINOR] Add missing interpolation in NettyRPCEnv · 861549ac

Andrew Or authored 9 years ago

```
Exception in thread "main" org.apache.spark.rpc.RpcTimeoutException:
Cannot receive any reply in ${timeout.duration}. This timeout is controlled by spark.rpc.askTimeout
	at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
	at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
	at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
```

Author: Andrew Or <andrew@databricks.com>

Closes #10334 from andrewor14/rpc-typo.

861549ac

[SPARK-12380] [PYSPARK] use SQLContext.getOrCreate in mllib · 27b98e99

Davies Liu authored 9 years ago

MLlib should use SQLContext.getOrCreate() instead of creating new SQLContext.

Author: Davies Liu <davies@databricks.com>

Closes #10338 from davies/create_context.

27b98e99

[SPARK-9690][ML][PYTHON] pyspark CrossValidator random seed · 3a44aebd

Martin Menestret authored 9 years ago

Extend CrossValidator with HasSeed in PySpark.

This PR replaces [https://github.com/apache/spark/pull/7997]

CC: yanboliang thunterdb mmenestret  Would one of you mind taking a look?  Thanks!

Author: Joseph K. Bradley <joseph@databricks.com>
Author: Martin MENESTRET <mmenestret@ippon.fr>

Closes #10268 from jkbradley/pyspark-cv-seed.

3a44aebd

[SPARK-11677][SQL] ORC filter tests all pass if filters are actually not pushed down. · 9657ee87

hyukjinkwon authored 9 years ago

Currently ORC filters are not tested properly. All the tests pass even if the filters are not pushed down or disabled. In this PR, I add some logics for this.
Since ORC does not filter record by record fully, this checks the count of the result and if it contains the expected values.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #9687 from HyukjinKwon/SPARK-11677.

9657ee87

[SPARK-12164][SQL] Decode the encoded values and then display · edf65cd9

gatorsmile authored 9 years ago

Based on the suggestions from marmbrus cloud-fan in https://github.com/apache/spark/pull/10165 , this PR is to print the decoded values(user objects) in `Dataset.show`
```scala
    implicit val kryoEncoder = Encoders.kryo[KryoClassData]
    val ds = Seq(KryoClassData("a", 1), KryoClassData("b", 2), KryoClassData("c", 3)).toDS()
    ds.show(20, false);
```
The current output is like
```
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|value                                                                                                                                                                                 |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|[1, 0, 111, 114, 103, 46, 97, 112, 97, 99, 104, 101, 46, 115, 112, 97, 114, 107, 46, 115, 113, 108, 46, 75, 114, 121, 111, 67, 108, 97, 115, 115, 68, 97, 116, -31, 1, 1, -126, 97, 2]|
|[1, 0, 111, 114, 103, 46, 97, 112, 97, 99, 104, 101, 46, 115, 112, 97, 114, 107, 46, 115, 113, 108, 46, 75, 114, 121, 111, 67, 108, 97, 115, 115, 68, 97, 116, -31, 1, 1, -126, 98, 4]|
|[1, 0, 111, 114, 103, 46, 97, 112, 97, 99, 104, 101, 46, 115, 112, 97, 114, 107, 46, 115, 113, 108, 46, 75, 114, 121, 111, 67, 108, 97, 115, 115, 68, 97, 116, -31, 1, 1, -126, 99, 6]|
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
```
After the fix, it will be like the below if and only if the users override the `toString` function in the class `KryoClassData`
```scala
override def toString: String = s"KryoClassData($a, $b)"
```
```
+-------------------+
|value              |
+-------------------+
|KryoClassData(a, 1)|
|KryoClassData(b, 2)|
|KryoClassData(c, 3)|
+-------------------+
```

If users do not override the `toString` function, the results will be like
```
+---------------------------------------+
|value                                  |
+---------------------------------------+
|org.apache.spark.sql.KryoClassData68ef|
|org.apache.spark.sql.KryoClassData6915|
|org.apache.spark.sql.KryoClassData693b|
+---------------------------------------+
```

Question: Should we add another optional parameter in the function `show`? It will decide if the function `show` will display the hex values or the object values?

Author: gatorsmile <gatorsmile@gmail.com>

Closes #10215 from gatorsmile/showDecodedValue.

edf65cd9

[SPARK-12320][SQL] throw exception if the number of fields does not line up for Tuple encoder · a783a8ed
Wenchen Fan authored 9 years ago
```
Author: Wenchen Fan <wenchen@databricks.com>

Closes #10293 from cloud-fan/err-msg.
```
a783a8ed

[SPARK-12364][ML][SPARKR] Add ML example for SparkR · 1a8b2a17

Yanbo Liang authored 9 years ago

We have DataFrame example for SparkR, we also need to add ML example under ```examples/src/main/r```.

cc mengxr jkbradley shivaram

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10324 from yanboliang/spark-12364.

1a8b2a17

[SPARK-11608][MLLIB][DOC] Added migration guide for MLlib 1.6 · 8148cc7a

Joseph K. Bradley authored 9 years ago

No known breaking changes, but some deprecations and changes of behavior.

CC: mengxr

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #10235 from jkbradley/mllib-guide-update-1.6.

8148cc7a

[SPARK-12361][PYSPARK][TESTS] Should set PYSPARK_DRIVER_PYTHON before Python tests · 6a880afa

Jeff Zhang authored 9 years ago

Although this patch still doesn't solve the issue why the return code is 0 (see JIRA description), it resolves the issue of python version mismatch.

Author: Jeff Zhang <zjffdu@apache.org>

Closes #10322 from zjffdu/SPARK-12361.

6a880afa

[SPARK-12309][ML] Use sqlContext from MLlibTestSparkContext for spark.ml test suites · d252b2d5

Yanbo Liang authored 9 years ago

Use ```sqlContext``` from ```MLlibTestSparkContext``` rather than creating new one for spark.ml test suites. I have checked thoroughly and found there are four test cases need to update.

cc mengxr jkbradley

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10279 from yanboliang/spark-12309.

d252b2d5

[SPARK-9694][ML] Add random seed Param to Scala CrossValidator · 860dc7f2

Yanbo Liang authored 9 years ago

Add random seed Param to Scala CrossValidator

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #9108 from yanboliang/spark-9694.

860dc7f2

[SPARK-6518][MLLIB][EXAMPLE][DOC] Add example code and user guide for bisecting k-means · 7b6dc29d

Yu ISHIKAWA authored 9 years ago

This PR includes only an example code in order to finish it quickly.
I'll send another PR for the docs soon.

Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #9952 from yu-iskw/SPARK-6518.

7b6dc29d

[SPARK-12345][MESOS] Filter SPARK_HOME when submitting Spark jobs with Mesos cluster mode. · ad8c1f0b

Timothy Chen authored 9 years ago

SPARK_HOME is now causing problem with Mesos cluster mode since spark-submit script has been changed recently to take precendence when running spark-class scripts to look in SPARK_HOME if it's defined.

We should skip passing SPARK_HOME from the Spark client in cluster mode with Mesos, since Mesos shouldn't use this configuration but should use spark.executor.home instead.

Author: Timothy Chen <tnachen@gmail.com>

Closes #10332 from tnachen/scheduler_ui.

ad8c1f0b

[SPARK-12215][ML][DOC] User guide section for KMeans in spark.ml · 26d70bd2
Yu ISHIKAWA authored 9 years ago
```
cc jkbradley

Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #10244 from yu-iskw/SPARK-12215.
```
26d70bd2

[SPARK-12310][SPARKR] Add write.json and write.parquet for SparkR · 22f6cd86

Yanbo Liang authored 9 years ago

Add ```write.json``` and ```write.parquet``` for SparkR, and deprecated ```saveAsParquetFile```.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10281 from yanboliang/spark-12310.

22f6cd86

[SPARK-12318][SPARKR] Save mode in SparkR should be error by default · 2eb5af5f
Jeff Zhang authored 9 years ago
```
shivaram  Please help review.

Author: Jeff Zhang <zjffdu@apache.org>

Closes #10290 from zjffdu/SPARK-12318.
```
2eb5af5f

[SPARK-8745] [SQL] remove GenerateProjection · 54c512ba

Davies Liu authored 9 years ago

cc rxin

Author: Davies Liu <davies@databricks.com>

Closes #10316 from davies/remove_generate_projection.

54c512ba

[SPARK-12324][MLLIB][DOC] Fixes the sidebar in the ML documentation · a6325fc4

Timothy Hunter authored 9 years ago

This fixes the sidebar, using a pure CSS mechanism to hide it when the browser's viewport is too narrow.
Credit goes to the original author Titan-C (mentioned in the NOTICE).

Note that I am not a CSS expert, so I can only address comments up to some extent.

Default view:
<img width="936" alt="screen shot 2015-12-14 at 12 46 39 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793597/6d1d6eda-a261-11e5-836b-6eb2054e9054.png">

When collapsed manually by the user:
<img width="1004" alt="screen shot 2015-12-14 at 12 54 02 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793669/c991989e-a261-11e5-8bf6-aecf3bdb6319.png">

Disappears when column is too narrow:
<img width="697" alt="screen shot 2015-12-14 at 12 47 22 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793607/7754dbcc-a261-11e5-8b15-e0d074b0e47c.png">

Can still be opened by the user if necessary:
<img width="651" alt="screen shot 2015-12-14 at 12 51 15 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793612/7bf82968-a261-11e5-9cc3-e827a7a6b2b0.png">

Author: Timothy Hunter <timhunter@databricks.com>

Closes #10297 from thunterdb/12324.

a6325fc4

Revert "[SPARK-12105] [SQL] add convenient show functions" · 1a3d0cd9
Reynold Xin authored 9 years ago
```
This reverts commit 31b39101.
```
1a3d0cd9
Revert "[HOTFIX] Compile error from commit 31b39101" · 18ea11c3
Reynold Xin authored 9 years ago
```
This reverts commit 840bd2e0.
```
18ea11c3
Style fix for the previous 3 JDBC filter push down commits. · 554d840a
Reynold Xin authored 9 years ago

554d840a

[SPARK-12315][SQL] isnotnull operator not pushed down for JDBC datasource. · 2aad2d37

hyukjinkwon authored 9 years ago

https://issues.apache.org/jira/browse/SPARK-12315
`IsNotNull` filter is not being pushed down for JDBC datasource.

It looks it is SQL standard according to [SQL-92](http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt), SQL:1999, [SQL:2003](http://www.wiscorp.com/sql_2003_standard.zip) and [SQL:201x](http://www.wiscorp.com/sql20nn.zip) and I believe most databases support this.

In this PR, I simply added the case for `IsNotNull` filter to produce a proper filter string.

Author: hyukjinkwon <gurwls223@gmail.com>

This patch had conflicts when merged, resolved by
Committer: Reynold Xin <rxin@databricks.com>

Closes #10287 from HyukjinKwon/SPARK-12315.

2aad2d37

[SPARK-12314][SQL] isnull operator not pushed down for JDBC datasource. · 7f443a68

hyukjinkwon authored 9 years ago

https://issues.apache.org/jira/browse/SPARK-12314
`IsNull` filter is not being pushed down for JDBC datasource.

It looks it is SQL standard according to [SQL-92](http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt), SQL:1999, [SQL:2003](http://www.wiscorp.com/sql_2003_standard.zip) and [SQL:201x](http://www.wiscorp.com/sql20nn.zip) and I believe most databases support this.

In this PR, I simply added the case for `IsNull` filter to produce a proper filter string.

Author: hyukjinkwon <gurwls223@gmail.com>

This patch had conflicts when merged, resolved by
Committer: Reynold Xin <rxin@databricks.com>

Closes #10286 from HyukjinKwon/SPARK-12314.

7f443a68

[SPARK-12249][SQL] JDBC non-equality comparison operator not pushed down. · 0f6936b5

hyukjinkwon authored 9 years ago

https://issues.apache.org/jira/browse/SPARK-12249
Currently `!=` operator is not pushed down correctly.
I simply added a case for this.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #10233 from HyukjinKwon/SPARK-12249.

0f6936b5

Dec 15, 2015

[SPARK-12304][STREAMING] Make Spark Streaming web UI display more fri… · d52bf47e

proflin authored 9 years ago

…endly Receiver graphs

Currently, the Spark Streaming web UI uses the same maxY when displays 'Input Rate Times& Histograms' and 'Per-Receiver Times& Histograms'.

This may lead to somewhat un-friendly graphs: once we have tens of Receivers or more, every 'Per-Receiver Times' line almost hits the ground.

This issue proposes to calculate a new maxY against the original one, which is shared among all the `Per-Receiver Times& Histograms' graphs.

Before:
![before-5](https://cloud.githubusercontent.com/assets/15843379/11761362/d790c356-a0fa-11e5-860e-4b834603de1d.png)

After:
![after-5](https://cloud.githubusercontent.com/assets/15843379/11761361/cfabf692-a0fa-11e5-97d0-4ad124aaca2a.png)

Author: proflin <proflin.me@gmail.com>

Closes #10318 from proflin/SPARK-12304.

d52bf47e

[SPARK-4117][YARN] Spark on Yarn handle AM being told command from RM · ca0690b5

Devaraj K authored 9 years ago

Spark on Yarn handle AM being told command from RM

When RM throws ApplicationAttemptNotFoundException for allocate
invocation, making the ApplicationMaster to finish immediately without any
retries.

Author: Devaraj K <devaraj@apache.org>

Closes #10129 from devaraj-kavali/SPARK-4117.

ca0690b5

[SPARK-10477][SQL] using DSL in ColumnPruningSuite to improve readability · a89e8b61
Wenchen Fan authored 9 years ago
```
Author: Wenchen Fan <cloud0fan@outlook.com>

Closes #8645 from cloud-fan/test.
```
a89e8b61

[SPARK-12062][CORE] Change Master to asyc rebuild UI when application completes · c5b6b398

Bryan Cutler authored 9 years ago

This change builds the event history of completed apps asynchronously so the RPC thread will not be blocked and allow new workers to register/remove if the event log history is very large and takes a long time to rebuild.

Author: Bryan Cutler <bjcutler@us.ibm.com>

Closes #10284 from BryanCutler/async-MasterUI-SPARK-12062.

c5b6b398

[SPARK-9886][CORE] Fix to use ShutdownHookManager in · 8a215d23

Naveen authored 9 years ago

ExternalBlockStore.scala

Author: Naveen <naveenminchu@gmail.com>

Closes #10313 from naveenminchu/branch-fix-SPARK-9886.

8a215d23

[SPARK-10123][DEPLOY] Support specifying deploy mode from configuration · 63ccdef8

jerryshao authored 9 years ago

Please help to review, thanks a lot.

Author: jerryshao <sshao@hortonworks.com>

Closes #10195 from jerryshao/SPARK-10123.

63ccdef8

[SPARK-9026][SPARK-4514] Modifications to JobWaiter, FutureAction, and... · 765a4884

Richard W. Eggert II authored 9 years ago

[SPARK-9026][SPARK-4514] Modifications to JobWaiter, FutureAction, and AsyncRDDActions to support non-blocking operation

These changes rework the implementations of `SimpleFutureAction`, `ComplexFutureAction`, `JobWaiter`, and `AsyncRDDActions` such that asynchronous callbacks on the generated `Futures` NEVER block waiting for a job to complete. A small amount of mutex synchronization is necessary to protect the internal fields that manage cancellation, but these locks are only held very briefly and in practice should almost never cause any blocking to occur. The existing blocking APIs of these classes are retained, but they simply delegate to the underlying non-blocking API and `Await` the results with indefinite timeouts.

Associated JIRA ticket: https://issues.apache.org/jira/browse/SPARK-9026
Also fixes: https://issues.apache.org/jira/browse/SPARK-4514

This pull request contains all my own original work, which I release to the Spark project under its open source license.

Author: Richard W. Eggert II <richard.eggert@gmail.com>

Closes #9264 from reggert/fix-futureaction.

765a4884

[SPARK-9516][UI] Improvement of Thread Dump Page · a63d9edc

CodingCat authored 9 years ago

https://issues.apache.org/jira/browse/SPARK-9516

- [x] new look of Thread Dump Page

- [x] click column title to sort

- [x] grep

- [x] search as you type

squito JoshRosen It's ready for the review now

Author: CodingCat <zhunansjtu@gmail.com>

Closes #7910 from CodingCat/SPARK-9516.

a63d9edc

[SPARK-12351][MESOS] Add documentation about submitting Spark with mesos cluster mode. · c2de99a7

Timothy Chen authored 9 years ago

Adding more documentation about submitting jobs with mesos cluster mode.

Author: Timothy Chen <tnachen@gmail.com>

Closes #10086 from tnachen/mesos_supervise_docs.

c2de99a7

[SPARK-12130] Replace shuffleManagerClass with shortShuffleMgrNames in ExternalShuffleBlockResolver · 369127f0

Lianhui Wang authored 9 years ago

Replace shuffleManagerClassName with shortShuffleMgrName is to reduce time of string's comparison. and put sort's comparison on the front. cc JoshRosen andrewor14

Author: Lianhui Wang <lianhuiwang09@gmail.com>

Closes #10131 from lianhuiwang/spark-12130.

369127f0

[SPARK-12056][CORE] Part 2 Create a TaskAttemptContext only after calling setConf · f725b2ec

tedyu authored 9 years ago

This is continuation of SPARK-12056 where change is applied to SqlNewHadoopRDD.scala

andrewor14
FYI

Author: tedyu <yuzhihong@gmail.com>

Closes #10164 from tedyu/master.

f725b2ec

[HOTFIX] Compile error from commit 31b39101 · 840bd2e0
Andrew Or authored 9 years ago

840bd2e0
[SPARK-12105] [SQL] add convenient show functions · 31b39101
Jean-Baptiste Onofré authored 9 years ago
```
Author: Jean-Baptiste Onofré <jbonofre@apache.org>

Closes #10130 from jbonofre/SPARK-12105.
```
31b39101

[SPARK-12236][SQL] JDBC filter tests all pass if filters are not really pushed down · 28112657

hyukjinkwon authored 9 years ago

https://issues.apache.org/jira/browse/SPARK-12236
Currently JDBC filters are not tested properly. All the tests pass even if the filters are not pushed down due to Spark-side filtering.

In this PR,
Firstly, I corrected the tests to properly check the pushed down filters by removing Spark-side filtering.
Also, `!=` was being tested which is actually not pushed down. So I removed them.
Lastly, I moved the `stripSparkFilter()` function to `SQLTestUtils` as this functions would be shared for all tests for pushed down filters. This function would be also shared with ORC datasource as the filters for that are also not being tested properly.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #10221 from HyukjinKwon/SPARK-12236.

28112657