Commits · 505b927cb7ff037adb797b9c3b9ecac3f885b7c8 · cs525-sp18-g07 / spark

Oct 30, 2016

[SPARK-16312][FOLLOW-UP][STREAMING][KAFKA][DOC] Add java code snippet for... · 505b927c

Liwei Lin authored 8 years ago

[SPARK-16312][FOLLOW-UP][STREAMING][KAFKA][DOC] Add java code snippet for Kafka 0.10 integration doc

## What changes were proposed in this pull request?

added java code snippet for Kafka 0.10 integration doc

## How was this patch tested?

SKIP_API=1 jekyll build

## Screenshot

![kafka-doc](https://cloud.githubusercontent.com/assets/15843379/19826272/bf0d8a4c-9db8-11e6-9e40-1396723df4bc.png)

Author: Liwei Lin <lwlin7@gmail.com>

Closes #15679 from lw-lin/kafka-010-examples.

Unverified

505b927c

Oct 28, 2016

[SPARK-18167][SQL] Add debug code for SQLQuerySuite flakiness when metastore... · d2d438d1

Eric Liang authored 8 years ago

[SPARK-18167][SQL] Add debug code for SQLQuerySuite flakiness when metastore partition pruning is enabled

## What changes were proposed in this pull request?

org.apache.spark.sql.hive.execution.SQLQuerySuite is flaking when hive partition pruning is enabled.
Based on the stack traces, it seems to be an old issue where Hive fails to cast a numeric partition column ("Invalid character string format for type DECIMAL"). There are two possibilities here: either we are somehow corrupting the partition table to have non-decimal values in that column, or there is a transient issue with Derby.

This PR logs the result of the retry when this exception is encountered, so we can confirm what is going on.

## How was this patch tested?

n/a

cc yhuai

Author: Eric Liang <ekl@databricks.com>

Closes #15676 from ericl/spark-18167.

d2d438d1

[SPARK-18164][SQL] ForeachSink should fail the Spark job if `process` throws exception · 59cccbda

Shixiong Zhu authored 8 years ago

## What changes were proposed in this pull request?

Fixed the issue that ForeachSink didn't rethrow the exception.

## How was this patch tested?

The fixed unit test.

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #15674 from zsxwing/foreach-sink-error.

59cccbda

[SPARK-5992][ML] Locality Sensitive Hashing · ac26e9cf

Yunni authored 8 years ago

## What changes were proposed in this pull request?

Implement Locality Sensitive Hashing along with approximate nearest neighbors and approximate similarity join based on the [design doc](https://docs.google.com/document/d/1D15DTDMF_UWTTyWqXfG7y76iZalky4QmifUYQ6lH5GM/edit).

Detailed changes are as follows:
(1) Implement abstract LSH, LSHModel classes as Estimator-Model
(2) Implement approxNearestNeighbors and approxSimilarityJoin in the abstract LSHModel
(3) Implement Random Projection as LSH subclass for Euclidean distance, Min Hash for Jaccard Distance
(4) Implement unit test utility methods including checkLshProperty, checkNearestNeighbor and checkSimilarityJoin

Things that will be implemented in a follow-up PR:
 - Bit Sampling for Hamming Distance, SignRandomProjection for Cosine Distance
 - PySpark Integration for the scala classes and methods.

## How was this patch tested?
Unit test is implemented for all the implemented classes and algorithms. A scalability test on Uber's dataset was performed internally.

Tested the methods on [WEX dataset](https://aws.amazon.com/items/2345) from AWS, with the steps and results [here](https://docs.google.com/document/d/19BXg-67U83NVB3M0I84HVBVg3baAVaESD_mrg_-vLro/edit).

## References
Gionis, Aristides, Piotr Indyk, and Rajeev Motwani. "Similarity search in high dimensions via hashing." VLDB 7 Sep. 1999: 518-529.
Wang, Jingdong et al. "Hashing for similarity search: A survey." arXiv preprint arXiv:1408.2927 (2014).

Author: Yunni <Euler57721@gmail.com>
Author: Yun Ni <yunn@uber.com>

Closes #15148 from Yunni/SPARK-5992-yunn-lsh.

ac26e9cf

[SPARK-18133][EXAMPLES][ML] Python ML Pipeline Example has syntax e… · e9746f87

Jagadeesan authored 8 years ago

## What changes were proposed in this pull request?

In Python 3, there is only one integer type (i.e., int), which mostly behaves like the long type in Python 2. Since Python 3 won't accept "L", so removed "L" in all examples.

## How was this patch tested?

Unit tests.

…rrors]

Author: Jagadeesan <as2@us.ibm.com>

Closes #15660 from jagadeesanas2/SPARK-18133.

e9746f87

[SPARK-18109][ML] Add instrumentation to GMM · 569788a5

Zheng RuiFeng authored 8 years ago

## What changes were proposed in this pull request?

Add instrumentation to GMM

## How was this patch tested?

Test in spark-shell

Author: Zheng RuiFeng <ruifengz@foxmail.com>

Closes #15636 from zhengruifeng/gmm_instr.

569788a5

Oct 27, 2016

[SPARK-18121][SQL] Unable to query global temp views when hive support is enabled · ab5f938b

Sunitha Kambhampati authored 8 years ago

## What changes were proposed in this pull request?

Issue:
Querying on a global temp view throws Table or view not found exception.

Fix:
Update the lookupRelation in HiveSessionCatalog to check for global temp views similar to the SessionCatalog.lookupRelation.

Before fix:
Querying on a global temp view ( for. e.g.:  select * from global_temp.v1)  throws Table or view not found exception

After fix:
Query succeeds and returns the right result.

## How was this patch tested?
- Two unit tests are added to check for global temp view for the code path when hive support is enabled.
- Regression unit tests were run successfully. ( build/sbt -Phive hive/test, build/sbt sql/test, build/sbt catalyst/test)

Author: Sunitha Kambhampati <skambha@us.ibm.com>

Closes #15649 from skambha/lookuprelationChanges.

ab5f938b

[SPARK-17970][SQL] store partition spec in metastore for data source table · ccb11543

Eric Liang authored 8 years ago

## What changes were proposed in this pull request?

We should follow hive table and also store partition spec in metastore for data source table.
This brings 2 benefits:

1. It's more flexible to manage the table data files, as users can use `ADD PARTITION`, `DROP PARTITION` and `RENAME PARTITION`
2. We don't need to cache all file status for data source table anymore.

## How was this patch tested?

existing tests.

Author: Eric Liang <ekl@databricks.com>
Author: Michael Allman <michael@videoamp.com>
Author: Eric Liang <ekhliang@gmail.com>
Author: Wenchen Fan <wenchen@databricks.com>

Closes #15515 from cloud-fan/partition.

ccb11543

[SPARK-16963][SQL] Fix test "StreamExecution metadata garbage collection" · 79fd0cc0

Shixiong Zhu authored 8 years ago

## What changes were proposed in this pull request?

A follow up PR for #14553 to fix the flaky test. It's flaky because the file list API doesn't guarantee any order of the return list.

## How was this patch tested?

Jenkins

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #15661 from zsxwing/fix-StreamingQuerySuite.

79fd0cc0

[SPARK-17219][ML] enhanced NaN value handling in Bucketizer · 0b076d4c

VinceShieh authored 8 years ago

## What changes were proposed in this pull request?

This PR is an enhancement of PR with commit ID:57dc326b.
NaN is a special type of value which is commonly seen as invalid. But We find that there are certain cases where NaN are also valuable, thus need special handling. We provided user when dealing NaN values with 3 options, to either reserve an extra bucket for NaN values, or remove the NaN values, or report an error, by setting handleNaN "keep", "skip", or "error"(default) respectively.

'''Before:
val bucketizer: Bucketizer = new Bucketizer()
          .setInputCol("feature")
          .setOutputCol("result")
          .setSplits(splits)
'''After:
val bucketizer: Bucketizer = new Bucketizer()
          .setInputCol("feature")
          .setOutputCol("result")
          .setSplits(splits)
          .setHandleNaN("keep")

## How was this patch tested?
Tests added in QuantileDiscretizerSuite, BucketizerSuite and DataFrameStatSuite

Signed-off-by: VinceShieh <vincent.xieintel.com>

Author: VinceShieh <vincent.xie@intel.com>
Author: Vincent Xie <vincent.xie@intel.com>
Author: Joseph K. Bradley <joseph@databricks.com>

Closes #15428 from VinceShieh/spark-17219_followup.

0b076d4c

[SPARK-17813][SQL][KAFKA] Maximum data per trigger · 10423258

cody koeninger authored 8 years ago

## What changes were proposed in this pull request?

maxOffsetsPerTrigger option for rate limiting, proportionally based on volume of different topicpartitions.

## How was this patch tested?

Added unit test

Author: cody koeninger <cody@koeninger.org>

Closes #15527 from koeninger/SPARK-17813.

10423258

[SPARK-CORE][TEST][MINOR] Fix the wrong comment in test · 701a9d36

wm624@hotmail.com authored 8 years ago

## What changes were proposed in this pull request?

While learning core scheduler code, I found two lines of wrong comments. This PR simply corrects the comments.

## How was this patch tested?

Author: wm624@hotmail.com <wm624@hotmail.com>

Closes #15631 from wangmiao1981/Rbug.

Unverified

701a9d36

[SQL][DOC] updating doc for JSON source to link to jsonlines.org · 44c8bfda

Felix Cheung authored 8 years ago

## What changes were proposed in this pull request?

API and programming guide doc changes for Scala, Python and R.

## How was this patch tested?

manual test

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #15629 from felixcheung/jsondoc.

44c8bfda

[SPARK-17157][SPARKR][FOLLOW-UP] doc fixes · 1dbe9896

Felix Cheung authored 8 years ago

## What changes were proposed in this pull request?

a couple of small late finding fixes for doc

## How was this patch tested?

manually
wangmiao1981

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #15650 from felixcheung/logitfix.

1dbe9896

[SPARK-18132] Fix checkstyle · d3b4831d

Yin Huai authored 8 years ago

This PR fixes checkstyle.

Author: Yin Huai <yhuai@databricks.com>

Closes #15656 from yhuai/fix-format.

d3b4831d

[SPARK-18009][SQL] Fix ClassCastException while calling toLocalIterator() on... · dd4f088c

Dilip Biswal authored 8 years ago

[SPARK-18009][SQL] Fix ClassCastException while calling toLocalIterator() on dataframe produced by RunnableCommand

## What changes were proposed in this pull request?
A short code snippet that uses toLocalIterator() on a dataframe produced by a RunnableCommand
reproduces the problem. toLocalIterator() is called by thriftserver when
`spark.sql.thriftServer.incrementalCollect`is set to handle queries producing large result
set.

**Before**
```SQL
scala> spark.sql("show databases")
res0: org.apache.spark.sql.DataFrame = [databaseName: string]

scala> res0.toLocalIterator()
16/10/26 03:00:24 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericInternalRow cannot be cast to org.apache.spark.sql.catalyst.expressions.UnsafeRow
```

**After**
```SQL
scala> spark.sql("drop database databases")
res30: org.apache.spark.sql.DataFrame = []

scala> spark.sql("show databases")
res31: org.apache.spark.sql.DataFrame = [databaseName: string]

scala> res31.toLocalIterator().asScala foreach println
[default]
[parquet]
```
## How was this patch tested?
Added a test in DDLSuite

Author: Dilip Biswal <dbiswal@us.ibm.com>

Closes #15642 from dilipbiswal/SPARK-18009.

dd4f088c

Oct 26, 2016

[SPARK-17770][CATALYST] making ObjectType public · f1aeed8b

ALeksander Eskilson authored 8 years ago

## What changes were proposed in this pull request?

In order to facilitate the writing of additional Encoders, I proposed opening up the ObjectType SQL DataType. This DataType is used extensively in the JavaBean Encoder, but would also be useful in writing other custom encoders.

As mentioned by marmbrus, it is understood that the Expressions API is subject to potential change.

## How was this patch tested?

The change only affects the visibility of the ObjectType class, and the existing SQL test suite still runs without error.

Author: ALeksander Eskilson <alek.eskilson@cerner.com>

Closes #15453 from bdrillard/master.

f1aeed8b

[SPARK-16963][STREAMING][SQL] Changes to Source trait and related implementation classes · 5b27598f

frreiss authored 8 years ago

## What changes were proposed in this pull request?

This PR contains changes to the Source trait such that the scheduler can notify data sources when it is safe to discard buffered data. Summary of changes:
* Added a method `commit(end: Offset)` that tells the Source that is OK to discard all offsets up `end`, inclusive.
* Changed the semantics of a `None` value for the `getBatch` method to mean "from the very beginning of the stream"; as opposed to "all data present in the Source's buffer".
* Added notes that the upper layers of the system will never call `getBatch` with a start value less than the last value passed to `commit`.
* Added a `lastCommittedOffset` method to allow the scheduler to query the status of each Source on restart. This addition is not strictly necessary, but it seemed like a good idea -- Sources will be maintaining their own persistent state, and there may be bugs in the checkpointing code.
* The scheduler in `StreamExecution.scala` now calls `commit` on its stream sources after marking each batch as complete in its checkpoint.
* `MemoryStream` now cleans committed batches out of its internal buffer.
* `TextSocketSource` now cleans committed batches from its internal buffer.

## How was this patch tested?
Existing regression tests already exercise the new code.

Author: frreiss <frreiss@us.ibm.com>

Closes #14553 from frreiss/fred-16963.

5b27598f

[SPARK-18126][SPARK-CORE] getIteratorZipWithIndex accepts negative value as index · a76846cf

Miao Wang authored 8 years ago

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

`Utils.getIteratorZipWithIndex` was added to deal with number of records > 2147483647 in one partition.

method `getIteratorZipWithIndex` accepts `startIndex` < 0, which leads to negative index.

This PR just adds a defensive check on `startIndex` to make sure it is >= 0.

## How was this patch tested?

Add a new unit test.

Author: Miao Wang <miaowang@Miaos-MacBook-Pro.local>

Closes #15639 from wangmiao1981/zip.

a76846cf

[SPARK-17157][SPARKR] Add multiclass logistic regression SparkR Wrapper · 29cea8f3

wm624@hotmail.com authored 8 years ago

## What changes were proposed in this pull request?

As we discussed in #14818, I added a separate R wrapper spark.logit for logistic regression.

This single interface supports both binary and multinomial logistic regression. It also has "predict" and "summary" for binary logistic regression.

## How was this patch tested?

New unit tests are added.

Author: wm624@hotmail.com <wm624@hotmail.com>

Closes #15365 from wangmiao1981/glm.

29cea8f3

[SPARK-18094][SQL][TESTS] Move group analytics test cases from `SQLQuerySuite`... · 5b7d403c

jiangxingbo authored 8 years ago

[SPARK-18094][SQL][TESTS] Move group analytics test cases from `SQLQuerySuite` into a query file test.

## What changes were proposed in this pull request?

Currently we have several test cases for group analytics(ROLLUP/CUBE/GROUPING SETS) in `SQLQuerySuite`, should better move them into a query file test.
The following test cases are moved to `group-analytics.sql`:
```
test("rollup")
test("grouping sets when aggregate functions containing groupBy columns")
test("cube")
test("grouping sets")
test("grouping and grouping_id")
test("grouping and grouping_id in having")
test("grouping and grouping_id in sort")
```

This is followup work of #15582

## How was this patch tested?

Modified query file `group-analytics.sql`, which will be tested by `SQLQueryTestSuite`.

Author: jiangxingbo <jiangxb1987@gmail.com>

Closes #15624 from jiangxb1987/group-analytics-test.

5b7d403c

[SPARK-14300][DOCS][MLLIB] Scala MLlib examples code merge and clean up · dcdda197

Xin Ren authored 8 years ago

## What changes were proposed in this pull request?

https://issues.apache.org/jira/browse/SPARK-14300

Duplicated code found in scala/examples/mllib, below all deleted in this PR:

- DenseGaussianMixture.scala
- StreamingLinearRegression.scala

## delete reasons:

#### delete: mllib/DenseGaussianMixture.scala

- duplicate of mllib/GaussianMixtureExample

#### delete: mllib/StreamingLinearRegression.scala

- duplicate of mllib/StreamingLinearRegressionExample

When merging and cleaning those code, be sure not disturb the previous example on and off blocks.

## How was this patch tested?

Test with `SKIP_API=1 jekyll` manually to make sure that works well.

Author: Xin Ren <iamshrek@126.com>

Closes #12195 from keypointt/SPARK-14300.

dcdda197

[SPARK-17961][SPARKR][SQL] Add storageLevel to DataFrame for SparkR · fb0a8a8d

WeichenXu authored 8 years ago

## What changes were proposed in this pull request?

Add storageLevel to DataFrame for SparkR.
This is similar to this RP:  https://github.com/apache/spark/pull/13780

but in R I do not make a class for `StorageLevel`
but add a method `storageToString`

## How was this patch tested?

test added.

Author: WeichenXu <WeichenXu123@outlook.com>

Closes #15516 from WeichenXu123/storageLevel_df_r.

fb0a8a8d

[MINOR][ML] Refactor clustering summary. · ea3605e8

Yanbo Liang authored 8 years ago

## What changes were proposed in this pull request?
Abstract ```ClusteringSummary``` from ```KMeansSummary```, ```GaussianMixtureSummary``` and ```BisectingSummary```, and eliminate duplicated pieces of code.

## How was this patch tested?
Existing tests.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #15555 from yanboliang/clustering-summary.

ea3605e8

[SPARK-18104][DOC] Don't build KafkaSource doc · 7d10631c

Shixiong Zhu authored 8 years ago

## What changes were proposed in this pull request?

Don't need to build doc for KafkaSource because the user should use the data source APIs to use KafkaSource. All KafkaSource APIs are internal.

## How was this patch tested?

Verified manually.

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #15630 from zsxwing/kafka-unidoc.

7d10631c

[SPARK-18063][SQL] Failed to infer constraints over multiple aliases · fa7d9d70

jiangxingbo authored 8 years ago

## What changes were proposed in this pull request?

The `UnaryNode.getAliasedConstraints` function fails to replace all expressions by their alias where constraints contains more than one expression to be replaced.
For example:
```
val tr = LocalRelation('a.int, 'b.string, 'c.int)
val multiAlias = tr.where('a === 'c + 10).select('a.as('x), 'c.as('y))
multiAlias.analyze.constraints
```
currently outputs:
```
ExpressionSet(Seq(
    IsNotNull(resolveColumn(multiAlias.analyze, "x")),
    IsNotNull(resolveColumn(multiAlias.analyze, "y"))
)
```
The constraint `resolveColumn(multiAlias.analyze, "x") === resolveColumn(multiAlias.analyze, "y") + 10)` is missing.

## How was this patch tested?

Add new test cases in `ConstraintPropagationSuite`.

Author: jiangxingbo <jiangxb1987@gmail.com>

Closes #15597 from jiangxb1987/alias-constraints.

fa7d9d70

[SPARK-13747][SQL] Fix concurrent executions in ForkJoinPool for SQL · 7ac70e7b

Shixiong Zhu authored 8 years ago

## What changes were proposed in this pull request?

Calling `Await.result` will allow other tasks to be run on the same thread when using ForkJoinPool. However, SQL uses a `ThreadLocal` execution id to trace Spark jobs launched by a query, which doesn't work perfectly in ForkJoinPool.

This PR just uses `Awaitable.result` instead to  prevent ForkJoinPool from running other tasks in the current waiting thread.

## How was this patch tested?

Jenkins

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #15520 from zsxwing/SPARK-13747.

7ac70e7b

[SPARK-17748][FOLLOW-UP][ML] Reorg variables of WeightedLeastSquares. · 312ea3f7

Yanbo Liang authored 8 years ago

## What changes were proposed in this pull request?
This is follow-up work of #15394.
Reorg some variables of ```WeightedLeastSquares``` and fix one minor issue of ```WeightedLeastSquaresSuite```.

## How was this patch tested?
Existing tests.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #15621 from yanboliang/spark-17748.

312ea3f7

[SPARK-18093][SQL] Fix default value test in SQLConfSuite to work rega… · 4bee9540

Mark Grover authored 8 years ago

…rdless of warehouse dir's existence

## What changes were proposed in this pull request?
Appending a trailing slash, if there already isn't one for the
sake comparison of the two paths. It doesn't take away from
the essence of the check, but removes any potential mismatch
due to lack of trailing slash.

## How was this patch tested?
Ran unit tests and they passed.

Author: Mark Grover <mark@apache.org>

Closes #15623 from markgrover/spark-18093.

4bee9540

[SPARK-17733][SQL] InferFiltersFromConstraints rule never terminates for query · 3c023570

jiangxingbo authored 8 years ago

## What changes were proposed in this pull request?

The function `QueryPlan.inferAdditionalConstraints` and `UnaryNode.getAliasedConstraints` can produce a non-converging set of constraints for recursive functions. For instance, if we have two constraints of the form(where a is an alias):
`a = b, a = f(b, c)`
Applying both these rules in the next iteration would infer:
`f(b, c) = f(f(b, c), c)`
This process repeated, the iteration won't converge and the set of constraints will grow larger and larger until OOM.

~~To fix this problem, we collect alias from expressions and skip infer constraints if we are to transform an `Expression` to another which contains it.~~
To fix this problem, we apply additional check in `inferAdditionalConstraints`, when it's possible to generate recursive constraints, we skip generate that.

## How was this patch tested?

Add new testcase in `SQLQuerySuite`/`InferFiltersFromConstraintsSuite`.

Author: jiangxingbo <jiangxb1987@gmail.com>

Closes #15319 from jiangxb1987/constraints.

3c023570

[SPARK-17802] Improved caller context logging. · 402205dd

Shuai Lin authored 8 years ago

## What changes were proposed in this pull request?

[SPARK-16757](https://issues.apache.org/jira/browse/SPARK-16757) sets the hadoop `CallerContext` when calling hadoop/hdfs apis to make spark applications more diagnosable in hadoop/hdfs logs. However, the `org.apache.hadoop.ipc.CallerContext` class is only added since [hadoop 2.8](https://issues.apache.org/jira/browse/HDFS-9184), which is not officially releaed yet. So each time `utils.CallerContext.setCurrentContext()` is called (e.g [when a task is created](https://github.com/apache/spark/blob/b678e46/core/src/main/scala/org/apache/spark/scheduler/Task.scala#L95-L96)), a "java.lang.ClassNotFoundException: org.apache.hadoop.ipc.CallerContext"
error is logged, which pollutes the spark logs when there are lots of tasks.

This patch improves this behaviour by only logging the `ClassNotFoundException` once.

## How was this patch tested?

Existing tests.

Author: Shuai Lin <linshuai2012@gmail.com>

Closes #15377 from lins05/spark-17802-improve-callercontext-logging.

Unverified

402205dd

[SPARK-4411][WEB UI] Add "kill" link for jobs in the UI · 5d0f81da

Alex Bozarth authored 8 years ago

## What changes were proposed in this pull request?

Currently users can kill stages via the web ui but not jobs directly (jobs are killed if one of their stages is). I've added the ability to kill jobs via the web ui. This code change is based on #4823 by lianhuiwang and updated to work with the latest code matching how stages are currently killed. In general I've copied the kill stage code warning and note comments and all. I also updated applicable tests and documentation.

## How was this patch tested?

Manually tested and dev/run-tests

![screen shot 2016-10-11 at 4 49 43 pm](https://cloud.githubusercontent.com/assets/13952758/19292857/12f1b7c0-8fd4-11e6-8982-210249f7b697.png)

Author: Alex Bozarth <ajbozart@us.ibm.com>
Author: Lianhui Wang <lianhuiwang09@gmail.com>

Closes #15441 from ajbozarth/spark4411.

Unverified

5d0f81da

[SPARK-18027][YARN] .sparkStaging not clean on RM ApplicationNotFoundException · 29781364

Sean Owen authored 8 years ago

## What changes were proposed in this pull request?

Cleanup YARN staging dir on all `KILLED`/`FAILED` paths in `monitorApplication`

## How was this patch tested?

Existing tests

Author: Sean Owen <sowen@cloudera.com>

Closes #15598 from srowen/SPARK-18027.

Unverified

29781364

[SPARK-18022][SQL] java.lang.NullPointerException instead of real exception when saving DF to MySQL · 6c7d094e

Sean Owen authored 8 years ago

## What changes were proposed in this pull request?

On null next exception in JDBC, don't init it as cause or suppressed

## How was this patch tested?

Existing tests

Author: Sean Owen <sowen@cloudera.com>

Closes #15599 from srowen/SPARK-18022.

Unverified

6c7d094e

[SPARK-17693][SQL] Fixed Insert Failure To Data Source Tables when the Schema has the Comment Field · 93b8ad18

gatorsmile authored 8 years ago

### What changes were proposed in this pull request?
```SQL
CREATE TABLE tab1(col1 int COMMENT 'a', col2 int) USING parquet
INSERT INTO TABLE tab1 SELECT 1, 2
```
The insert attempt will fail if the target table has a column with comments. The error is strange to the external users:
```
assertion failed: No plan for InsertIntoTable Relation[col1#15,col2#16] parquet, false, false
+- Project [1 AS col1#19, 2 AS col2#20]
   +- OneRowRelation$
```

This PR is to fix the above bug by checking the metadata when comparing the schema between the table and the query. If not matched, we also copy the metadata. This is an alternative to https://github.com/apache/spark/pull/15266

### How was this patch tested?
Added a test case

Author: gatorsmile <gatorsmile@gmail.com>

Closes #15615 from gatorsmile/insertDataSourceTableWithCommentSolution2.

93b8ad18

Oct 25, 2016

[SPARK-18007][SPARKR][ML] update SparkR MLP - add initalWeights parameter · 12b3e8d2

WeichenXu authored 8 years ago

## What changes were proposed in this pull request?

update SparkR MLP, add initalWeights parameter.

## How was this patch tested?

test added.

Author: WeichenXu <WeichenXu123@outlook.com>

Closes #15552 from WeichenXu123/mlp_r_add_initialWeight_param.

12b3e8d2

[SPARK-16988][SPARK SHELL] spark history server log needs to be fixed to show... · c329a568

hayashidac authored 8 years ago

[SPARK-16988][SPARK SHELL] spark history server log needs to be fixed to show https url when ssl is enabled

spark history server log needs to be fixed to show https url when ssl is enabled

Author: chie8842 <chie@chie-no-Mac-mini.local>

Closes #15611 from hayashidac/SPARK-16988.

c329a568

[SPARK-18019][ML] Add instrumentation to GBTs · 2c7394ad

sethah authored 8 years ago

## What changes were proposed in this pull request?

Add instrumentation for logging in ML GBT, part of umbrella ticket [SPARK-14567](https://issues.apache.org/jira/browse/SPARK-14567)

## How was this patch tested?

Tested locally:

````
16/10/20 10:24:51 INFO Instrumentation: GBTRegressor-gbtr_2b460d3e2e93-1207021668-45: training: numPartitions=1 storageLevel=StorageLevel(1 replicas)
16/10/20 10:24:51 INFO Instrumentation: GBTRegressor-gbtr_2b460d3e2e93-1207021668-45: {"maxIter":1}
16/10/20 10:24:51 INFO Instrumentation: GBTRegressor-gbtr_2b460d3e2e93-1207021668-45: {"numFeatures":2}
16/10/20 10:24:51 INFO Instrumentation: GBTRegressor-gbtr_2b460d3e2e93-1207021668-45: {"numClasses":0}
...
16/10/20 15:54:21 INFO Instrumentation: GBTRegressor-gbtr_065fad465377-1922077832-22: training finished
````

Author: sethah <seth.hendrickson16@gmail.com>

Closes #15574 from sethah/gbt_instr.

2c7394ad

[SPARK-18070][SQL] binary operator should not consider nullability when comparing input types · a21791e3

Wenchen Fan authored 8 years ago

## What changes were proposed in this pull request?

Binary operator requires its inputs to be of same type, but it should not consider nullability, e.g. `EqualTo` should be able to compare an element-nullable array and an element-non-nullable array.

## How was this patch tested?

a regression test in `DataFrameSuite`

Author: Wenchen Fan <wenchen@databricks.com>

Closes #15606 from cloud-fan/type-bug.

a21791e3

[SPARK-18010][CORE] Reduce work performed for building up the application list... · c5fe3dd4

Vinayak authored 8 years ago

[SPARK-18010][CORE] Reduce work performed for building up the application list for the History Server app list UI page

## What changes were proposed in this pull request?
allow ReplayListenerBus to skip deserialising and replaying certain events using an inexpensive check of the event log entry. Use this to ensure that when event log replay is triggered for building the application list, we get the ReplayListenerBus to skip over all but the few events needed for our immediate purpose. Refer [SPARK-18010] for the motivation behind this change.

## How was this patch tested?

Tested with existing HistoryServer and ReplayListener unit test suites. All tests pass.

Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before opening a pull request.

Author: Vinayak <vijoshi5@in.ibm.com>

Closes #15556 from vijoshi/SAAS-467_master.

c5fe3dd4