Commits · 9fd7a2f0247ed6cea0e8dbcdd2b24f41200b3e24 · cs525-sp18-g07 / spark

Jan 04, 2016

[SPARK-10359][PROJECT-INFRA] Use more random number in... · 9fd7a2f0

Josh Rosen authored 9 years ago

[SPARK-10359][PROJECT-INFRA] Use more random number in dev/test-dependencies.sh; fix version switching

This patch aims to fix another potential source of flakiness in the `dev/test-dependencies.sh` script.

pwendell's original patch and my version used `$(date +%s | tail -c6)` to generate a suffix to use when installing temporary Spark versions into the local Maven cache, but this value only changes once per second and thus is highly collision-prone when concurrent builds launch on AMPLab Jenkins. In order to reduce the potential for conflicts, this patch updates the script to call Python's random number generator instead.

I also fixed a bug in how we captured the original project version; the bug was causing the exit handler code to fail.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #10558 from JoshRosen/build-dep-tests-round-3.

9fd7a2f0

[SPARK-12612][PROJECT-INFRA] Add missing Hadoop profiles to dev/run-tests-*.py scripts and dev/deps · 0d165ec2

Josh Rosen authored 9 years ago

There are a couple of places in the `dev/run-tests-*.py` scripts which deal with Hadoop profiles, but the set of profiles that they handle does not include all Hadoop profiles defined in our POM. Similarly, the `hadoop-2.2` and `hadoop-2.6` profiles were missing from `dev/deps`.

This patch updates these scripts to include all four Hadoop profiles defined in our POM.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #10565 from JoshRosen/add-missing-hadoop-profiles-in-test-scripts.

0d165ec2

Jan 03, 2016

[SPARK-12562][SQL] DataFrame.write.format(text) requires the column name to be called value · 84f8492c
Xiu Guo authored 9 years ago
```
Author: Xiu Guo <xguo27@gmail.com>

Closes #10515 from xguo27/SPARK-12562.
```
84f8492c

[SPARK-12611][SQL][PYSPARK][TESTS] Fix test_infer_schema_to_local · 13dab9c3

Holden Karau authored 9 years ago

Previously (when the PR was first created) not specifying b= explicitly was fine (and treated as default null) - instead be explicit about b being None in the test.

Author: Holden Karau <holden@us.ibm.com>

Closes #10564 from holdenk/SPARK-12611-fix-test-infer-schema-local.

13dab9c3

[SPARK-12537][SQL] Add option to accept quoting of all character backslash quoting mechanism · b8410ff9

Cazen authored 9 years ago

We can provides the option to choose JSON parser can be enabled to accept quoting of all character or not.

Author: Cazen <Cazen@korea.com>
Author: Cazen Lee <cazen.lee@samsung.com>
Author: Cazen Lee <Cazen@korea.com>
Author: cazen.lee <cazen.lee@samsung.com>

Closes #10497 from Cazen/master.

b8410ff9

Update MimaExcludes now Spark 1.6 is in Maven. · 7b92922f
Reynold Xin authored 9 years ago
```
Author: Reynold Xin <rxin@databricks.com>

Closes #10561 from rxin/update-mima.
```
7b92922f

[SPARK-12533][SQL] hiveContext.table() throws the wrong exception · c82924d5

thomastechs authored 9 years ago

Avoiding the the No such table exception and throwing analysis exception as per the bug: SPARK-12533

Author: thomastechs <thomas.sebastian@tcs.com>

Closes #10529 from thomastechs/topic-branch.

c82924d5

[SPARK-12327][SPARKR] fix code for lintr warning for commented code · c3d50560
felixcheung authored 9 years ago
```
shivaram

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #10408 from felixcheung/rcodecomment.
```
c3d50560
Revert "Revert "[SPARK-12286][SPARK-12290][SPARK-12294][SPARK-12284][SQL] always output UnsafeRow"" · 6c5bbd62
Reynold Xin authored 9 years ago
```
This reverts commit 44ee920f.
```
6c5bbd62

[SPARK-12599][MLLIB][SQL] Remove the use of callUDF in MLlib · 513e3b09

Reynold Xin authored 9 years ago

callUDF has been deprecated. However, we do not have an alternative for users to specify the output data type without type tags. This pull request introduced a new API for that, and replaces the invocation of the deprecated callUDF with that.

Author: Reynold Xin <rxin@databricks.com>

Closes #10547 from rxin/SPARK-12599.

513e3b09

Jan 02, 2016

[SPARK-12481][CORE][STREAMING][SQL] Remove usage of Hadoop deprecated APIs and... · 15bd7362

Sean Owen authored 9 years ago

[SPARK-12481][CORE][STREAMING][SQL] Remove usage of Hadoop deprecated APIs and reflection that supported 1.x

Remove use of deprecated Hadoop APIs now that 2.2+ is required

Author: Sean Owen <sowen@cloudera.com>

Closes #10446 from srowen/SPARK-12481.

15bd7362

[SPARK-10180][SQL] JDBC datasource are not processing EqualNullSafe filter · 94f7a12b

hyukjinkwon authored 9 years ago

This PR is followed by https://github.com/apache/spark/pull/8391.
Previous PR fixes JDBCRDD to support null-safe equality comparison for JDBC datasource. This PR fixes the problem that it can actually return null as a result of the comparison resulting error as using the value of that comparison.

Author: hyukjinkwon <gurwls223@gmail.com>
Author: HyukjinKwon <gurwls223@gmail.com>

Closes #8743 from HyukjinKwon/SPARK-10180.

94f7a12b

[SPARK-12362][SQL][WIP] Inline Hive Parser · 970635a9

Herman van Hovell authored 9 years ago

This PR inlines the Hive SQL parser in Spark SQL.

The previous (merged) incarnation of this PR passed all tests, but had and still has problems with the build. These problems are caused by a the fact that - for some reason - in some cases the ANTLR generated code is not included in the compilation fase.

This PR is a WIP and should not be merged until we have sorted out the build issues.

Author: Herman van Hovell <hvanhovell@questtec.nl>
Author: Nong Li <nong@databricks.com>
Author: Nong Li <nongli@gmail.com>

Closes #10525 from hvanhovell/SPARK-12362.

970635a9

Jan 01, 2016

Revert "[SPARK-12286][SPARK-12290][SPARK-12294][SPARK-12284][SQL] always output UnsafeRow" · 44ee920f
Reynold Xin authored 9 years ago
```
This reverts commit 0da7bd50.
```
44ee920f

[SPARK-12286][SPARK-12290][SPARK-12294][SPARK-12284][SQL] always output UnsafeRow · 0da7bd50

Davies Liu authored 9 years ago

It's confusing that some operator output UnsafeRow but some not, easy to make mistake.

This PR change to only output UnsafeRow for all the operators (SparkPlan), removed the rule to insert Unsafe/Safe conversions. For those that can't output UnsafeRow directly, added UnsafeProjection into them.

Closes #10330

cc JoshRosen rxin

Author: Davies Liu <davies@databricks.com>

Closes #10511 from davies/unsafe_row.

0da7bd50

Disable test-dependencies.sh. · 6c20b3c0
Reynold Xin authored 9 years ago

6c20b3c0

[SPARK-12592][SQL][TEST] Don't mute Spark loggers in TestHive.reset() · 01a29866

Cheng Lian authored 9 years ago

There's a hack done in `TestHive.reset()`, which intended to mute noisy Hive loggers. However, Spark testing loggers are also muted.

Author: Cheng Lian <lian@databricks.com>

Closes #10540 from liancheng/spark-12592.dont-mute-spark-loggers.

01a29866

[SPARK-12409][SPARK-12387][SPARK-12391][SQL] Refactor filter pushdown for... · ad5b7cfc

Liang-Chi Hsieh authored 9 years ago

[SPARK-12409][SPARK-12387][SPARK-12391][SQL] Refactor filter pushdown for JDBCRDD and add few filters

This patch refactors the filter pushdown for JDBCRDD and also adds few filters.

Added filters are basically from #10468 with some refactoring. Test cases are from #10468.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #10470 from viirya/refactor-jdbc-filter.

ad5b7cfc

[SPARK-3873][MLLIB] Import order fixes. · a59a357c

Marcelo Vanzin authored 9 years ago

A slight adjustment to the checker configuration was needed; there is
a handful of warnings still left, but those are because of a bug in
the checker that I'll fix separately (before enabling errors for the
checker, of course).

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #10535 from vanzin/SPARK-3873-mllib.

a59a357c

[SPARK-11743][SQL] Move the test for arrayOfUDT · c9dbfcc6

Liang-Chi Hsieh authored 9 years ago

A following pr for #9712. Move the test for arrayOfUDT.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #10538 from viirya/move-udt-test.

c9dbfcc6

Dec 31, 2015

[SPARK-10359][PROJECT-INFRA] Multiple fixes to dev/test-dependencies.sh script · 5adec63a

Josh Rosen authored 9 years ago

This patch includes multiple fixes for the `dev/test-dependencies.sh` script (which was introduced in #10461):

- Use `build/mvn --force` instead of `mvn` in one additional place.
- Explicitly set a zero exit code on success.
- Set `LC_ALL=C` to make `sort` results agree across machines (see https://stackoverflow.com/questions/28881/).
- Set `should_run_build_tests=True` for `build` module (this somehow got lost).

Author: Josh Rosen <joshrosen@databricks.com>

Closes #10543 from JoshRosen/dep-script-fixes.

5adec63a

[SPARK-3873][STREAMING] Import order fixes for streaming. · efb10cc9

Marcelo Vanzin authored 9 years ago

Also included a few miscelaneous other modules that had very few violations.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #10532 from vanzin/SPARK-3873-streaming.

efb10cc9

[SPARK-12039][SQL] Re-enable HiveSparkSubmitSuite's SPARK-9757 Persist Parquet... · 5cdecb18

Yin Huai authored 9 years ago

[SPARK-12039][SQL] Re-enable HiveSparkSubmitSuite's SPARK-9757 Persist Parquet relation with decimal column

https://issues.apache.org/jira/browse/SPARK-12039

since we do not support hadoop1, we can re-enable this test in master.

Author: Yin Huai <yhuai@databricks.com>

Closes #10533 from yhuai/SPARK-12039-enable.

5cdecb18

[SPARK-7995][SPARK-6280][CORE] Remove AkkaRpcEnv and remove systemName from setupEndpointRef · 4f5a24d7

Shixiong Zhu authored 9 years ago

### Remove AkkaRpcEnv

Keep `SparkEnv.actorSystem` because Streaming still uses it. Will remove it and AkkaUtils after refactoring Streaming actorStream API.

### Remove systemName
There are 2 places using `systemName`:
* `RpcEnvConfig.name`. Actually, although it's used as `systemName` in `AkkaRpcEnv`, `NettyRpcEnv` uses it as the service name to output the log `Successfully started service *** on port ***`. Since the service name in log is useful, I keep `RpcEnvConfig.name`.
* `def setupEndpointRef(systemName: String, address: RpcAddress, endpointName: String)`. Each `ActorSystem` has a `systemName`. Akka requires `systemName` in its URI and will refuse a connection if `systemName` is not matched. However, `NettyRpcEnv` doesn't use it. So we can remove `systemName` from `setupEndpointRef` since we are removing `AkkaRpcEnv`.

### Remove RpcEnv.uriOf

`uriOf` exists because Akka uses different URI formats for with and without authentication, e.g., `akka.ssl.tcp...` and `akka.tcp://...`. But `NettyRpcEnv` uses the same format. So it's not necessary after removing `AkkaRpcEnv`.

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #10459 from zsxwing/remove-akka-rpc-env.

4f5a24d7

[SPARK-12585] [SQL] move numFields to constructor of UnsafeRow · e6c77874

Davies Liu authored 9 years ago

Right now, numFields will be passed in by pointTo(), then bitSetWidthInBytes is calculated, making pointTo() a little bit heavy.

It should be part of constructor of UnsafeRow.

Author: Davies Liu <davies@databricks.com>

Closes #10528 from davies/numFields.

e6c77874

Dec 30, 2015

House cleaning: close old pull requests. · 93b52abc

Reynold Xin authored 9 years ago

Closes #5400
Closes #5408
Closes #5423
Closes #5668
Closes #6757
Closes #6745
Closes #6613

93b52abc

Closes #10386 since it was superseded by #10468. · c642c3a2
Reynold Xin authored 9 years ago

c642c3a2

House cleaning: close open pull requests created before June 1st, 2015 · 7b4452ba

Reynold Xin authored 9 years ago

Closes #5358
Closes #3744
Closes #3677
Closes #3536
Closes #3249
Closes #3221
Closes #2446
Closes #3794
Closes #3815
Closes #3816
Closes #3866
Closes #4286
Closes #5184
Closes #5170
Closes #5142
Closes #5025
Closes #5005
Closes #4897
Closes #4887
Closes #4849
Closes #4632
Closes #4622
Closes #4456
Closes #4449
Closes #4417
Closes #5483
Closes #5325
Closes #6545
Closes #6449
Closes #6433
Closes #6416
Closes #6403
Closes #6386
Closes #6263
Closes #6245
Closes #6213
Closes #6155
Closes #6133
Closes #6018
Closes #5978
Closes #5869
Closes #5852
Closes #5848
Closes #5754
Closes #5598
Closes #5503
Closes #4380

7b4452ba

[SPARK-12561] Remove JobLogger in Spark 2.0. · be33a0cd

Reynold Xin authored 9 years ago

It was research code and has been deprecated since 1.0.0. No one really uses it since they can just use event logging.

Author: Reynold Xin <rxin@databricks.com>

Closes #10530 from rxin/SPARK-12561.

be33a0cd

[SPARK-3873][GRAPHX] Import order fixes. · 9140d907

Marcelo Vanzin authored 9 years ago

There's one warning left, caused by a bug in the checker.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #10537 from vanzin/SPARK-3873-graphx.

9140d907

[SPARK-3873][YARN] Fix import ordering. · fd333331
Marcelo Vanzin authored 9 years ago
```
Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #10536 from vanzin/SPARK-3873-yarn.
```
fd333331

[SPARK-12588] Remove HttpBroadcast in Spark 2.0. · ee8f8d31

Reynold Xin authored 9 years ago

We switched to TorrentBroadcast in Spark 1.1, and HttpBroadcast has been undocumented since then. It's time to remove it in Spark 2.0.

Author: Reynold Xin <rxin@databricks.com>

Closes #10531 from rxin/SPARK-12588.

ee8f8d31

[SPARK-8641][SPARK-12455][SQL] Native Spark Window functions - Follow-up (docs & tests) · f76ee109

Herman van Hovell authored 9 years ago

This PR is a follow-up for PR https://github.com/apache/spark/pull/9819. It adds documentation for the window functions and a couple of NULL tests.

The documentation was largely based on the documentation in (the source of)  Hive and Presto:
* https://prestodb.io/docs/current/functions/window.html
* https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics

I am not sure if we need to add the licenses of these two projects to the licenses directory. They are both under the ASL. srowen any thoughts?

cc yhuai

Author: Herman van Hovell <hvanhovell@questtec.nl>

Closes #10402 from hvanhovell/SPARK-8641-docs.

f76ee109

[SPARK-12399] Display correct error message when accessing REST API with an unknown app Id · b2442979

Carson Wang authored 9 years ago

I got an exception when accessing the below REST API with an unknown application Id.
`http://<server-url>:18080/api/v1/applications/xxx/jobs`
Instead of an exception, I expect an error message "no such app: xxx" which is a similar error message when I access `/api/v1/applications/xxx`
```
org.spark-project.guava.util.concurrent.UncheckedExecutionException: java.util.NoSuchElementException: no app with key xxx
	at org.spark-project.guava.cache.LocalCache$Segment.get(LocalCache.java:2263)
	at org.spark-project.guava.cache.LocalCache.get(LocalCache.java:4000)
	at org.spark-project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
	at org.spark-project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
	at org.apache.spark.deploy.history.HistoryServer.getSparkUI(HistoryServer.scala:116)
	at org.apache.spark.status.api.v1.UIRoot$class.withSparkUI(ApiRootResource.scala:226)
	at org.apache.spark.deploy.history.HistoryServer.withSparkUI(HistoryServer.scala:46)
	at org.apache.spark.status.api.v1.ApiRootResource.getJobs(ApiRootResource.scala:66)
```

Author: Carson Wang <carson.wang@intel.com>

Closes #10352 from carsonwang/unknownAppFix.

b2442979

[SPARK-12409][SPARK-12387][SPARK-12391][SQL] Support AND/OR/IN/LIKE push-down filters for JDBC · 5c2682b0

Takeshi YAMAMURO authored 9 years ago

This is rework from #10386 and add more tests and LIKE push-down support.

Author: Takeshi YAMAMURO <linguin.m.s@gmail.com>

Closes #10468 from maropu/SupportMorePushdownInJdbc.

5c2682b0

[SPARK-10359] Enumerate dependencies in a file and diff against it for new pull requests · 27a42c71

Josh Rosen authored 9 years ago

This patch adds a new build check which enumerates Spark's resolved runtime classpath and saves it to a file, then diffs against that file to detect whether pull requests have introduced dependency changes. The aim of this check is to make it simpler to reason about whether pull request which modify the build have introduced new dependencies or changed transitive dependencies in a way that affects the final classpath.

This supplants the checks added in SPARK-4123 / #5093, which are currently disabled due to bugs.

This patch is based on pwendell's work in #8531.

Closes #8531.

Author: Josh Rosen <joshrosen@databricks.com>
Author: Patrick Wendell <patrick@databricks.com>

Closes #10461 from JoshRosen/SPARK-10359.

27a42c71

[SPARK-12300] [SQL] [PYSPARK] fix schema inferance on local collections · d1ca634d

Holden Karau authored 9 years ago

Current schema inference for local python collections halts as soon as there are no NullTypes. This is different than when we specify a sampling ratio of 1.0 on a distributed collection. This could result in incomplete schema information.

Author: Holden Karau <holden@us.ibm.com>

Closes #10275 from holdenk/SPARK-12300-fix-schmea-inferance-on-local-collections.

d1ca634d

[SPARK-12495][SQL] use true as default value for propagateNull in NewInstance · aa48164a

Wenchen Fan authored 9 years ago

Most of cases we should propagate null when call `NewInstance`, and so far there is only one case we should stop null propagation: create product/java bean. So I think it makes more sense to propagate null by dafault.

This also fixes a bug when encode null array/map, which is firstly discovered in https://github.com/apache/spark/pull/10401

Author: Wenchen Fan <wenchen@databricks.com>

Closes #10443 from cloud-fan/encoder.

aa48164a

[SPARK-12263][DOCS] IllegalStateException: Memory can't be 0 for SPARK_WORKER_MEMORY without unit · 932cf442

Neelesh Srinivas Salian authored 9 years ago

Updated the Worker Unit IllegalStateException message to indicate no values less than 1MB instead of 0 to help solve this.
Requesting review

Author: Neelesh Srinivas Salian <nsalian@cloudera.com>

Closes #10483 from nssalian/SPARK-12263.

932cf442

Revert "[SPARK-12362][SQL][WIP] Inline Hive Parser" · 27af6157
Reynold Xin authored 9 years ago
```
This reverts commit b600bccf due to non-deterministic build breaks.
```
27af6157