Commits · 768b3d623c29eaf960be096845b7c421f8a3ba36 · cs525-sp18-g07 / spark

Apr 12, 2016

[SPARK-14579][SQL] Fix a race condition in StreamExecution.processAllAvailable · 768b3d62

Shixiong Zhu authored 8 years ago

## What changes were proposed in this pull request?

There is a race condition in `StreamExecution.processAllAvailable`. Here is an execution order to reproduce it.

| Time        |Thread 1           | MicroBatchThread  |
|:-------------:|:-------------:|:-----:|
| 1 | |  `dataAvailable in constructNextBatch` returns false  |
| 2 | addData(newData)      |   |
| 3 | `noNewData = false` in  processAllAvailable |  |
| 4 | | noNewData = true |
| 5 | `noNewData` is true so just return | |

The root cause is that `checking dataAvailable and change noNewData to true` is not atomic. This PR puts these two actions into `synchronized` to make sure they are atomic.

In addition, this PR also has the following changes:

- Make `committedOffsets` and `availableOffsets` volatile to make sure they can be seen in other threads.
- Copy the reference of `availableOffsets` to a local variable so that `sourceStatuses` can use a snapshot of `availableOffsets`.

## How was this patch tested?

Existing unit tests.

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #12339 from zsxwing/race-condition.

768b3d62

[SPARK-14578] [SQL] Fix codegen for CreateExternalRow with nested wide schema · 372baf04

Davies Liu authored 8 years ago

## What changes were proposed in this pull request?

The wide schema, the expression of fields will be splitted into multiple functions, but the variable for loopVar can't be accessed in splitted functions, this PR change them as class member.

## How was this patch tested?

Added regression test.

Author: Davies Liu <davies@databricks.com>

Closes #12338 from davies/nested_row.

372baf04

[SPARK-14363] Fix executor OOM due to memory leak in the Sorter · d187e7de

Sital Kedia authored 8 years ago

## What changes were proposed in this pull request?

Fix memory leak in the Sorter. When the UnsafeExternalSorter spills the data to disk, it does not free up the underlying pointer array. As a result, we see a lot of executor OOM and also memory under utilization.
This is a regression partially introduced in PR https://github.com/apache/spark/pull/9241

## How was this patch tested?

Tested by running a job and observed around 30% speedup after this change.

Author: Sital Kedia <skedia@fb.com>

Closes #12285 from sitalkedia/executor_oom.

d187e7de

[SPARK-14547] Avoid DNS resolution for reusing connections · c439d88e

Reynold Xin authored 8 years ago

## What changes were proposed in this pull request?
This patch changes the connection creation logic in the network client module to avoid DNS resolution when reusing connections.

## How was this patch tested?
Testing in production. This is too difficult to test in isolation (for high fidelity unit tests, we'd need to change the DNS resolution behavior in the JVM).

Author: Reynold Xin <rxin@databricks.com>

Closes #12315 from rxin/SPARK-14547.

c439d88e

[SPARK-14544] [SQL] improve performance of SQL UI tab · 1ef5f8cf

Davies Liu authored 8 years ago

## What changes were proposed in this pull request?

This PR improve the performance of SQL UI by:

1) remove the details column in all executions page (the first page in SQL tab). We can check the details by enter the execution page.
2) break-all is super slow in Chrome recently, so switch to break-word.
3) Using "display: none" to hide a block.
4) using one js closure for  for all the executions, not one for each.
5) remove the height limitation of details, don't need to scroll it in the tiny window.

## How was this patch tested?

Exists tests.

![ui](https://cloud.githubusercontent.com/assets/40902/14445712/68d7b258-0004-11e6-9b48-5d329b05d165.png)

Author: Davies Liu <davies@databricks.com>

Closes #12311 from davies/ui_perf.

1ef5f8cf

[SPARK-14513][CORE] Fix threads left behind after stopping SparkContext · 3e53de4b

Terence Yim authored 8 years ago

## What changes were proposed in this pull request?

Shutting down `QueuedThreadPool` used by Jetty `Server` to avoid threads leakage after SparkContext is stopped.

Note: If this fix is going to apply to the `branch-1.6`, one more patch on the `NettyRpcEnv` class is needed so that the `NettyRpcEnv._fileServer.shutdown` is called in the `NettyRpcEnv.cleanup` method. This is due to the removal of `_fileServer` field in the `NettyRpcEnv` class in the master branch. Please advice if a second PR is necessary for bring this fix back to `branch-1.6`

## How was this patch tested?

Ran the ./dev/run-tests locally

Author: Terence Yim <terence@cask.co>

Closes #12318 from chtyim/fixes/SPARK-14513-thread-leak.

3e53de4b

[SPARK-14414][SQL] improve the error message class hierarchy · bcd20762

bomeng authored 8 years ago

## What changes were proposed in this pull request?

Before we are using `AnalysisException`, `ParseException`, `NoSuchFunctionException` etc when a parsing error encounters. I am trying to make it consistent and also **minimum** code impact to the current implementation by changing the class hierarchy.
1. `NoSuchItemException` is removed, since it is an abstract class and it just simply takes a message string.
2. `NoSuchDatabaseException`, `NoSuchTableException`, `NoSuchPartitionException` and `NoSuchFunctionException` now extends `AnalysisException`, as well as `ParseException`, they are all under `AnalysisException` umbrella, but you can also determine how to use them in a granular way.

## How was this patch tested?
The existing test cases should cover this patch.

Author: bomeng <bmeng@us.ibm.com>

Closes #12314 from bomeng/SPARK-14414.

bcd20762

[SPARK-14562] [SQL] improve constraints propagation in Union · 85e68b4b

Davies Liu authored 8 years ago

## What changes were proposed in this pull request?

Currently, Union only takes intersect of the constraints from it's children, all others are dropped, we should try to merge them together.

This PR try to merge the constraints that have the same reference but came from different children, for example: `a > 10` and `a < 100` could be merged as `a > 10 || a < 100`.

## How was this patch tested?

Added more cases in existing test.

Author: Davies Liu <davies@databricks.com>

Closes #12328 from davies/union_const.

85e68b4b

[SPARK-14556][SQL] Code clean-ups for package o.a.s.sql.execution.streaming.state · 852bbc6c

Liwei Lin authored 8 years ago

## What changes were proposed in this pull request?

- `StateStoreConf.**max**DeltasForSnapshot` was renamed to `StateStoreConf.**min**DeltasForSnapshot`
- some state switch checks were added
- improved consistency between method names and string literals
- other comments & typo fix

## How was this patch tested?

N/A

Author: Liwei Lin <lwlin7@gmail.com>

Closes #12323 from lw-lin/streaming-state-clean-up.

852bbc6c

[SPARK-14147][ML][SPARKR] SparkR predict should not output feature column · 111a6247

Yanbo Liang authored 8 years ago

## What changes were proposed in this pull request?
SparkR does not support type of vector which is the default type of feature column in ML. R predict also does not output intermediate feature column. So SparkR ```predict``` should not output feature column. In this PR, I only fix this issue for ```naiveBayes``` and ```survreg```. ```kmeans``` has the right code route already and  ```glm``` will be fixed at SparkRWrapper refactor(#12294).

## How was this patch tested?
No new tests.

cc mengxr shivaram

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #11958 from yanboliang/spark-14147.

111a6247

[SPARK-14563][ML] use a random table name instead of __THIS__ in SQLTransformer · 1995c2e6

Xiangrui Meng authored 8 years ago

## What changes were proposed in this pull request?

Use a random table name instead of `__THIS__` in SQLTransformer, and add a test for `transformSchema`. The problems of using `__THIS__` are:

* It doesn't work under HiveContext (in Spark 1.6)
* Race conditions

## How was this patch tested?

* Manual test with HiveContext.
* Added a unit test for `transformSchema` to improve coverage.

cc: yhuai

Author: Xiangrui Meng <meng@databricks.com>

Closes #12330 from mengxr/SPARK-14563.

1995c2e6

[SPARK-13597][PYSPARK][ML] Python API for GeneralizedLinearRegression · 7f024c47

Kai Jiang authored 8 years ago

## What changes were proposed in this pull request?

Python API for GeneralizedLinearRegression
JIRA: https://issues.apache.org/jira/browse/SPARK-13597

## How was this patch tested?

The patch is tested with Python doctest.

Author: Kai Jiang <jiangkai@gmail.com>

Closes #11468 from vectorijk/spark-13597.

7f024c47

[SPARK-13322][ML] AFTSurvivalRegression supports feature standardization · 101663f1

Yanbo Liang authored 8 years ago

## What changes were proposed in this pull request?
AFTSurvivalRegression should support feature standardization, it will improve the convergence rate.
Test the convergence rate on the [Ovarian](https://stat.ethz.ch/R-manual/R-devel/library/survival/html/ovarian.html) data which is standard data comes with Survival library in R,
* without standardization(before this PR) -> 74 iterations.
* with standardization(after this PR) -> 38 iterations.

But after this fix, with or without ```standardization``` will converge to the same solution. It means that ```standardization = false``` will run the same code route as ```standardization = true```. Because if the features are not standardized at all, it will result convergency issue when the features have very different scales. This behavior is the same as ML [```LinearRegression``` and ```LogisticRegression```](https://issues.apache.org/jira/browse/SPARK-8522). See more discussion about this topic at #11247.
cc mengxr
## How was this patch tested?
unit test.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #11365 from yanboliang/spark-13322.

101663f1

[SPARK-12566][SPARK-14324][ML] GLM model family, link function support in SparkR:::glm · 75e05a5a

Yanbo Liang authored 8 years ago

* SparkR glm supports families and link functions which match R's signature for family.
* SparkR glm API refactor. The comparative standard of the new API is R glm, so I only expose the arguments that R glm supports: ```formula, family, data, epsilon and maxit```.
* This PR is focus on glm() and predict(), summary statistics will be done in a separate PR after this get in.
* This PR depends on #12287 which make GLMs support link prediction at Scala side. After that merged, I will add more tests for predict() to this PR.

Unit tests.

cc mengxr jkbradley hhbyyh

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #12294 from yanboliang/spark-12566.

75e05a5a

[SPARK-14474][SQL] Move FileSource offset log into checkpointLocation · 6bf69214

Shixiong Zhu authored 8 years ago

## What changes were proposed in this pull request?

Now that we have a single location for storing checkpointed state. This PR just propagates the checkpoint location into FileStreamSource so that we don't have one random log off on its own.

## How was this patch tested?

test("metadataPath should be in checkpointLocation")

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #12247 from zsxwing/file-source-log-location.

6bf69214

[SPARK-3724][ML] RandomForest: More options for feature subset size. · da60b34d

Yong Tang authored 8 years ago

## What changes were proposed in this pull request?

This PR tries to support more options for feature subset size in RandomForest implementation. Previously, RandomForest only support "auto", "all", "sort", "log2", "onethird". This PR tries to support any given value to allow model search.

In this PR, `featureSubsetStrategy` could be passed with:
a) a real number in the range of `(0.0-1.0]` that represents the fraction of the number of features in each subset,
b)  an integer number (`>0`) that represents the number of features in each subset.

## How was this patch tested?

Two tests `JavaRandomForestClassifierSuite` and `JavaRandomForestRegressorSuite` have been updated to check the additional options for params in this PR.
An additional test has been added to `org.apache.spark.mllib.tree.RandomForestSuite` to cover the cases in this PR.

Author: Yong Tang <yong.tang.github@outlook.com>

Closes #11989 from yongtang/SPARK-3724.

da60b34d

[SPARK-14488][SPARK-14493][SQL] "CREATE TEMPORARY TABLE ... USING ... AS... · 124cbfb6

Cheng Lian authored 8 years ago

[SPARK-14488][SPARK-14493][SQL] "CREATE TEMPORARY TABLE ... USING ... AS SELECT" shouldn't create persisted table

## What changes were proposed in this pull request?

When planning logical plan node `CreateTableUsingAsSelect`, we neglected its `temporary` field and always generates a `CreateMetastoreDataSourceAsSelect`. This PR fixes this issue generating `CreateTempTableUsingAsSelect` when `temporary` is true.

This PR also fixes SPARK-14493 since the root cause of SPARK-14493 is that we were `CreateMetastoreDataSourceAsSelect` uses default Hive warehouse location when `PATH` data source option is absent.

## How was this patch tested?

Added a test case to create a temporary table using the target syntax and check whether it's indeed a temporary table.

Author: Cheng Lian <lian@databricks.com>

Closes #12303 from liancheng/spark-14488-fix-ctas-using.

124cbfb6

[SPARK-14508][BUILD] Add a new ScalaStyle Rule `OmitBracesInCase` · b0f5497e

Dongjoon Hyun authored 8 years ago

## What changes were proposed in this pull request?

According to the [Spark Code Style Guide](https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide) and [Scala Style Guide](http://docs.scala-lang.org/style/control-structures.html#curlybraces), we had better enforce the following rule.
  ```
  case: Always omit braces in case clauses.
  ```
This PR makes a new ScalaStyle rule, 'OmitBracesInCase', and enforces it to the code.

## How was this patch tested?

Pass the Jenkins tests (including Scala style checking)

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #12280 from dongjoon-hyun/SPARK-14508.

b0f5497e

[SPARK-14535][SQL] Remove buildInternalScan from FileFormat · 678b96e7

Wenchen Fan authored 8 years ago

## What changes were proposed in this pull request?

Now `HadoopFsRelation` with all kinds of file formats can be handled in `FileSourceStrategy`, we can remove the branches for  `HadoopFsRelation` in `FileSourceStrategy` and the `buildInternalScan` API from `FileFormat`.

## How was this patch tested?

existing tests.

Author: Wenchen Fan <wenchen@databricks.com>

Closes #12300 from cloud-fan/remove.

678b96e7

[SPARK-14554][SQL] disable whole stage codegen if there are too many input columns · 52a80112

Wenchen Fan authored 8 years ago

## What changes were proposed in this pull request?

In https://github.com/apache/spark/pull/12047/files#diff-94a1f59bcc9b6758c4ca874652437634R529, we may split field expressions codes in `CreateExternalRow` to support wide table. However, the whole stage codegen framework doesn't support it, because the input for expressions is not always the input row, but can be `CodeGenContext.currentVars`, which doesn't work well with `CodeGenContext.splitExpressions`.

Actually we do have a check to guard against this cases, but it's incomplete, it only checks output fields.

This PR improves the whole stage codegen support check, to disable it if there are too many input fields, so that we can avoid splitting field expressions codes in `CreateExternalRow` for whole stage codegen.

TODO: Is it a better solution if we can make `CodeGenContext.currentVars` work well with `CodeGenContext.splitExpressions`?

## How was this patch tested?

new test in DatasetSuite.

Author: Wenchen Fan <wenchen@databricks.com>

Closes #12322 from cloud-fan/codegen.

52a80112

[SPARK-14362][SPARK-14406][SQL][FOLLOW-UP] DDL Native Support: Drop View and Drop Table · 2d81ba54

gatorsmile authored 8 years ago

#### What changes were proposed in this pull request?
In this PR, we are trying to address the comment in the original PR: https://github.com/apache/spark/commit/dfce9665c4b2b29a19e6302216dae2800da68ff9#commitcomment-17057030

In this PR, we checks if table/view exists at the beginning and then does not need to capture the exceptions, including `NoSuchTableException` and `InvalidTableException`. We still capture the NonFatal exception when doing `sqlContext.cacheManager.tryUncacheQuery`.

#### How was this patch tested?
The existing test cases should cover the code changes of this PR.

Author: gatorsmile <gatorsmile@gmail.com>

Closes #12321 from gatorsmile/dropViewFollowup.

2d81ba54

Apr 11, 2016

[SPARK-14132][SPARK-14133][SQL] Alter table partition DDLs · 83fb9640

Andrew Or authored 8 years ago

## What changes were proposed in this pull request?

This implements a few alter table partition commands using the `SessionCatalog`. In particular:
```
ALTER TABLE ... ADD PARTITION ...
ALTER TABLE ... DROP PARTITION ...
ALTER TABLE ... RENAME PARTITION ... TO ...
```
The following operations are not supported, and an `AnalysisException` with a helpful error message will be thrown if the user tries to use them:
```
ALTER TABLE ... EXCHANGE PARTITION ...
ALTER TABLE ... ARCHIVE PARTITION ...
ALTER TABLE ... UNARCHIVE PARTITION ...
ALTER TABLE ... TOUCH ...
ALTER TABLE ... COMPACT ...
ALTER TABLE ... CONCATENATE
MSCK REPAIR TABLE ...
```

## How was this patch tested?

`DDLSuite`, `DDLCommandSuite` and `HiveDDLCommandSuite`

Author: Andrew Or <andrew@databricks.com>

Closes #12220 from andrewor14/alter-partition-ddl.

83fb9640

[MINOR][ML] Fixed MLlib build warnings · e9e1adc0

Joseph K. Bradley authored 8 years ago

## What changes were proposed in this pull request?

Fixes to eliminate warnings during package and doc builds.

## How was this patch tested?

Existing unit tests

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #12263 from jkbradley/warning-cleanups.

e9e1adc0

[SPARK-14520][SQL] Use correct return type in VectorizedParquetInputFormat · 26d7af91

Liang-Chi Hsieh authored 8 years ago

## What changes were proposed in this pull request?
JIRA: https://issues.apache.org/jira/browse/SPARK-14520

`VectorizedParquetInputFormat` inherits `ParquetInputFormat` and overrides `createRecordReader`. However, its overridden `createRecordReader` returns a `ParquetRecordReader`. It should return a `RecordReader`. Otherwise, `ClassCastException` will be thrown.

## How was this patch tested?

Existing tests.

Author: Liang-Chi Hsieh <simonh@tw.ibm.com>

Closes #12292 from viirya/fix-vectorized-input-format.

26d7af91

[SPARK-14475] Propagate user-defined context from driver to executors · 6f27027d

Eric Liang authored 8 years ago

## What changes were proposed in this pull request?

This adds a new API call `TaskContext.getLocalProperty` for getting properties set in the driver from executors. These local properties are automatically propagated from the driver to executors. For streaming, the context for streaming tasks will be the initial driver context when ssc.start() is called.

## How was this patch tested?

Unit tests.

cc JoshRosen

Author: Eric Liang <ekl@databricks.com>

Closes #12248 from ericl/sc-2813.

6f27027d

[SPARK-10521][SQL] Utilize Docker for test DB2 JDBC Dialect support · 94de6305

Luciano Resende authored 8 years ago

Add integration tests based on docker to test DB2 JDBC dialect support

Author: Luciano Resende <lresende@apache.org>

Closes #9893 from lresende/SPARK-10521.

94de6305

[SPARK-14298][ML][MLLIB] Add unit test for EM LDA disable checkpointing · 3f0f4080

Yanbo Liang authored 8 years ago

## What changes were proposed in this pull request?
This is follow up for #12089, add unit test for EM LDA which test disable checkpointing when set ```checkpointInterval = -1```.
## How was this patch tested?
unit test.

cc jkbradley

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #12286 from yanboliang/spark-14298-followup.

3f0f4080

[SPARK-13600][MLLIB] Use approxQuantile from DataFrame stats in QuantileDiscretizer · 89a41c5b

Oliver Pierson authored 8 years ago

## What changes were proposed in this pull request?
QuantileDiscretizer can return an unexpected number of buckets in certain cases.  This PR proposes to fix this issue and also refactor QuantileDiscretizer to use approxQuantiles from DataFrame stats functions.
## How was this patch tested?
QuantileDiscretizerSuite unit tests (some existing tests will change or even be removed in this PR)

Author: Oliver Pierson <ocp@gatech.edu>

Closes #11553 from oliverpierson/SPARK-13600.

89a41c5b

[SPARK-14494][SQL] Fix the race conditions in MemoryStream and MemorySink · 2dacc81e

Shixiong Zhu authored 8 years ago

## What changes were proposed in this pull request?

Make sure accessing mutable variables in MemoryStream and MemorySink are protected by `synchronized`.
This is probably why MemorySinkSuite failed here: https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.2/650/testReport/junit/org.apache.spark.sql.streaming/MemorySinkSuite/registering_as_a_table/

## How was this patch tested?
Existing unit tests.

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #12261 from zsxwing/memory-race-condition.

2dacc81e

[SPARK-14502] [SQL] Add optimization for Binary Comparison Simplification · 5de26194

Dongjoon Hyun authored 8 years ago

## What changes were proposed in this pull request?

We can simplifies binary comparisons with semantically-equal operands:

1. Replace '<=>' with 'true' literal.
2. Replace '=', '<=', and '>=' with 'true' literal if both operands are non-nullable.
3. Replace '<' and '>' with 'false' literal if both operands are non-nullable.

For example, the following example plan
```
scala> sql("SELECT * FROM (SELECT explode(array(1,2,3)) a) T WHERE a BETWEEN a AND a+7").explain()
...
:  +- Filter ((a#59 >= a#59) && (a#59 <= (a#59 + 7)))
...
```
will be optimized into the following.
```
:  +- Filter (a#47 <= (a#47 + 7))
```

## How was this patch tested?

Pass the Jenkins tests including new `BinaryComparisonSimplificationSuite`.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #12267 from dongjoon-hyun/SPARK-14502.

5de26194

[SPARK-14528] [SQL] Fix same result of Union · 652c4703

Davies Liu authored 8 years ago

## What changes were proposed in this pull request?

This PR fix resultResult() for Union.

## How was this patch tested?

Added regression test.

Author: Davies Liu <davies@databricks.com>

Closes #12295 from davies/fix_sameResult.

652c4703

[SPARK-14462][ML][MLLIB] Add the mllib-local build to maven pom · efaf7d18

DB Tsai authored 8 years ago

## What changes were proposed in this pull request?

In order to separate the linear algebra, and vector matrix classes into a standalone jar, we need to setup the build first. This PR will create a new jar called mllib-local with minimal dependencies.

The previous PR was failing the build because of `spark-core:test` dependency, and that was reverted. In this PR, `FunSuite` with `// scalastyle:ignore funsuite` in mllib-local test was used, similar to sketch.

Thanks.

## How was this patch tested?

Unit tests

mengxr tedyu holdenk

Author: DB Tsai <dbt@netflix.com>

Closes #12298 from dbtsai/dbtsai-mllib-local-build-fix.

efaf7d18

[SPARK-14510][MLLIB] Add args-checking for LDA and StreamingKMeans · 643b4e22

Zheng RuiFeng authored 8 years ago

## What changes were proposed in this pull request?
add the checking for LDA and StreamingKMeans

## How was this patch tested?
manual tests

Author: Zheng RuiFeng <ruifengz@foxmail.com>

Closes #12062 from zhengruifeng/initmodel.

643b4e22

[SPARK-14500] [ML] Accept Dataset[_] instead of DataFrame in MLlib APIs · 1c751fcf

Xiangrui Meng authored 8 years ago

## What changes were proposed in this pull request?

This PR updates MLlib APIs to accept `Dataset[_]` as input where `DataFrame` was the input type. This PR doesn't change the output type. In Java, `Dataset[_]` maps to `Dataset<?>`, which includes `Dataset<Row>`. Some implementations were changed in order to return `DataFrame`. Tests and examples were updated. Note that this is a breaking change for subclasses of Transformer/Estimator.

Lol, we don't have to rename the input argument, which has been `dataset` since Spark 1.2.

TODOs:
- [x] update MiMaExcludes (seems all covered by explicit filters from SPARK-13920)
- [x] Python
- [x] add a new test to accept Dataset[LabeledPoint]
- [x] remove unused imports of Dataset

## How was this patch tested?

Exiting unit tests with some modifications.

cc: rxin jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes #12274 from mengxr/SPARK-14500.

1c751fcf

[SPARK-14372][SQL] Dataset.randomSplit() needs a Java version · e82d95bf

Rekha Joshi authored 8 years ago

## What changes were proposed in this pull request?

1.Added method randomSplitAsList() in Dataset for java
for https://issues.apache.org/jira/browse/SPARK-14372

## How was this patch tested?

TestSuite

Author: Rekha Joshi <rekhajoshm@gmail.com>
Author: Joshi <rekhajoshm@gmail.com>

Closes #12184 from rekhajoshm/SPARK-14372.

e82d95bf

[MINOR][DOCS] Fix wrong data types in JSON Datasets example. · 1a0cca1f

Dongjoon Hyun authored 8 years ago

## What changes were proposed in this pull request?

This PR fixes the `age` data types from `integer` to `long` in `SQL Programming Guide: JSON Datasets`.

## How was this patch tested?

Manual.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #12290 from dongjoon-hyun/minor_fix_type_in_json_example.

1a0cca1f

Apr 10, 2016

[SPARK-14362][SPARK-14406][SQL][FOLLOW-UP] DDL Native Support: Drop View and Drop Table · 9f838bd2

gatorsmile authored 8 years ago

#### What changes were proposed in this pull request?
This PR is to address the comment: https://github.com/apache/spark/pull/12146#discussion-diff-59092238. It removes the function `isViewSupported` from `SessionCatalog`. After the removal, we still can capture the user errors if users try to drop a table using `DROP VIEW`.

#### How was this patch tested?
Modified the existing test cases

Author: gatorsmile <gatorsmile@gmail.com>

Closes #12284 from gatorsmile/followupDropTable.

9f838bd2

[SPARK-14419] [MINOR] coding style cleanup · fbf8d008

Davies Liu authored 8 years ago

## What changes were proposed in this pull request?

Making them more consistent.

## How was this patch tested?

Existing tests.

Author: Davies Liu <davies@databricks.com>

Closes #12289 from davies/cleanup_style.

fbf8d008

[SPARK-14415][SQL] All functions should show usages by command `DESC FUNCTION` · a7ce473b

Dongjoon Hyun authored 8 years ago

## What changes were proposed in this pull request?

Currently, many functions do now show usages like the followings.
```
scala> sql("desc function extended `sin`").collect().foreach(println)
[Function: sin]
[Class: org.apache.spark.sql.catalyst.expressions.Sin]
[Usage: To be added.]
[Extended Usage:
To be added.]
```

This PR adds descriptions for functions and adds a testcase prevent adding function without usage.
```
scala>  sql("desc function extended `sin`").collect().foreach(println);
[Function: sin]
[Class: org.apache.spark.sql.catalyst.expressions.Sin]
[Usage: sin(x) - Returns the sine of x.]
[Extended Usage:
> SELECT sin(0);
 0.0]
```

The only exceptions are `cube`, `grouping`, `grouping_id`, `rollup`, `window`.

## How was this patch tested?

Pass the Jenkins tests (including new testcases.)

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #12185 from dongjoon-hyun/SPARK-14415.

a7ce473b

Update KMeansExample.scala · b5c78562

Örjan Lundberg authored 8 years ago

## What changes were proposed in this pull request?
example does not work wo DataFrame import

## How was this patch tested?

example doc only

example does not work wo DataFrame import

Author: Örjan Lundberg <orjan.lundberg@gmail.com>

Closes #12277 from oluies/patch-1.

b5c78562