Commits · e84815dc333a69368a48e0152f02934980768a14 · cs525-sp18-g07 / spark

Jun 07, 2015

[SPARK-7733] [CORE] [BUILD] Update build, code to use Java 7 for 1.5.0+ · e84815dc

Sean Owen authored 9 years ago

Update build to use Java 7, and remove some comments and special-case support for Java 6.

Author: Sean Owen <sowen@cloudera.com>

Closes #6265 from srowen/SPARK-7733 and squashes the following commits:

59bda4e [Sean Owen] Update build to use Java 7, and remove some comments and special-case support for Java 6

e84815dc

[SPARK-7952][SQL] use internal Decimal instead of java.math.BigDecimal · db81b9d8

Wenchen Fan authored 9 years ago

This PR fixes a bug introduced in https://github.com/apache/spark/pull/6505.
Decimal literal's value is not `java.math.BigDecimal`, but Spark SQL internal type: `Decimal`.

Author: Wenchen Fan <cloud0fan@outlook.com>

Closes #6574 from cloud-fan/fix and squashes the following commits:

b0e3549 [Wenchen Fan] rename to BooleanEquality
1987b37 [Wenchen Fan] use Decimal instead of java.math.BigDecimal
f93c420 [Wenchen Fan] compare literal

db81b9d8

[SPARK-8004][SQL] Quote identifier in JDBC data source. · d6d601a0

Reynold Xin authored 9 years ago

This is a follow-up patch to #6577 to replace columnEnclosing to quoteIdentifier.

I also did some minor cleanup to the JdbcDialect file.

Author: Reynold Xin <rxin@databricks.com>

Closes #6689 from rxin/jdbc-quote and squashes the following commits:

bad365f [Reynold Xin] Fixed test compilation...
e39e14e [Reynold Xin] Fixed compilation.
db9a8e0 [Reynold Xin] [SPARK-8004][SQL] Quote identifier in JDBC data source.

d6d601a0

[DOC] [TYPO] Fix typo in standalone deploy scripts description · 835f1380

Yijie Shen authored 9 years ago

Author: Yijie Shen <henry.yijieshen@gmail.com>

Closes #6691 from yijieshen/patch-2 and squashes the following commits:

b40a4b0 [Yijie Shen] [DOC][TYPO] Fix typo in standalone deploy scripts description

835f1380

[SPARK-7042] [BUILD] use the standard akka artifacts with hadoop-2.x · ca8dafcc

Konstantin Shaposhnikov authored 9 years ago

Both akka 2.3.x and hadoop-2.x use protobuf 2.5 so only hadoop-1 build needs
custom 2.3.4-spark akka version that shades protobuf-2.5

This change also updates akka version (for hadoop-2.x profiles only) to the
latest 2.3.11 as akka-zeromq_2.11 is not available for akka 2.3.4.

This partially fixes SPARK-7042 (for hadoop-2.x builds)

Author: Konstantin Shaposhnikov <Konstantin.Shaposhnikov@sc.com>

Closes #6492 from kostya-sh/SPARK-7042 and squashes the following commits:

dc195b0 [Konstantin Shaposhnikov] [SPARK-7042] [BUILD] use the standard akka artifacts with hadoop-2.x

ca8dafcc

[SPARK-8118] [SQL] Mutes noisy Parquet log output reappeared after upgrading Parquet to 1.7.0 · 8c321d66

Cheng Lian authored 9 years ago

Author: Cheng Lian <lian@databricks.com>

Closes #6670 from liancheng/spark-8118 and squashes the following commits:

b6e85a6 [Cheng Lian] Suppresses unnecesary ParquetRecordReader log message (PARQUET-220)
385603c [Cheng Lian] Mutes noisy Parquet log output reappeared after upgrading Parquet to 1.7.0

8c321d66

[SPARK-8146] DataFrame Python API: Alias replace in df.na · 0ac47083

Reynold Xin authored 9 years ago

Author: Reynold Xin <rxin@databricks.com>

Closes #6688 from rxin/df-alias-replace and squashes the following commits:

774c19c [Reynold Xin] [SPARK-8146] DataFrame Python API: Alias replace in DataFrameNaFunctions.

0ac47083

[SPARK-8141] [SQL] Precompute datatypes for partition columns and reuse it · 26d07f1e

Liang-Chi Hsieh authored 9 years ago

JIRA: https://issues.apache.org/jira/browse/SPARK-8141

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #6687 from viirya/reuse_partition_column_types and squashes the following commits:

dab0688 [Liang-Chi Hsieh] Reuse partitionColumnTypes.

26d07f1e

[SPARK-8145] [WEBUI] Trigger a double click on the span to show full job description. · 081db947

979969786 authored 9 years ago

When using the Spark SQL, Jobs tab and Stages tab display only part of SQL. I change it to  display full SQL by double-click on the description span

before：
![before](https://cloud.githubusercontent.com/assets/5399861/8022257/9f8e0a22-0cf8-11e5-98c8-da4d7a615e7e.png)

after double click on the description span：
![after](https://cloud.githubusercontent.com/assets/5399861/8022261/dac08d4a-0cf8-11e5-8fe7-74c96c6ce933.png)

Author: 979969786 <q79969786@gmail.com>

Closes #6646 from 979969786/master and squashes the following commits:

b5ba20e [979969786] Trigger a double click on the span to show full job description.

081db947

[SPARK-8004][SQL] Enclose column names by JDBC Dialect · 901a552c

Liang-Chi Hsieh authored 9 years ago

JIRA: https://issues.apache.org/jira/browse/SPARK-8004

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #6577 from viirya/enclose_jdbc_columns and squashes the following commits:

614606a [Liang-Chi Hsieh] For comment.
bc50182 [Liang-Chi Hsieh] Enclose column names by JDBC Dialect.

901a552c

Jun 06, 2015

[SPARK-7955] [CORE] Ensure executors with cached RDD blocks are not re… · 3285a511

Hari Shreedharan authored 9 years ago

…moved if dynamic allocation is enabled.

This is a work in progress. This patch ensures that an executor that has cached RDD blocks are not removed,
but makes no attempt to find another executor to remove. This is meant to get some feedback on the current
approach, and if it makes sense then I will look at choosing another executor to remove. No testing has been done either.

Author: Hari Shreedharan <hshreedharan@apache.org>

Closes #6508 from harishreedharan/dymanic-caching and squashes the following commits:

dddf1eb [Hari Shreedharan] Minor configuration description update.
10130e2 [Hari Shreedharan] Fix compile issue.
5417b53 [Hari Shreedharan] Add documentation for new config. Remove block from cachedBlocks when it is dropped.
875916a [Hari Shreedharan] Make some code more readable.
39940ca [Hari Shreedharan] Handle the case where the executor has not yet registered.
90ad711 [Hari Shreedharan] Remove unused imports and unused methods.
063985c [Hari Shreedharan] Send correct message instead of recursively calling same method.
ec2fd7e [Hari Shreedharan] Add file missed in last commit
5d10fad [Hari Shreedharan] Update cached blocks status using local info, rather than doing an RPC.
193af4c [Hari Shreedharan] WIP. Use local state rather than via RPC.
ae932ff [Hari Shreedharan] Fix config param name.
272969d [Hari Shreedharan] Fix seconds to millis bug.
5a1993f [Hari Shreedharan] Add timeout for cache executors. Ignore broadcast blocks while checking if there are cached blocks.
57fefc2 [Hari Shreedharan] [SPARK-7955][Core] Ensure executors with cached RDD blocks are not removed if dynamic allocation is enabled.

3285a511

[SPARK-8136] [YARN] Fix flakiness in YarnClusterSuite. · ed2cc3ee

Hari Shreedharan authored 9 years ago

Instead of actually downloading the logs, just verify that the logs link is actually
a URL and is in the expected format.

Author: Hari Shreedharan <hshreedharan@apache.org>

Closes #6680 from harishreedharan/simplify-am-log-tests and squashes the following commits:

3183aeb [Hari Shreedharan] Remove check for hostname which can fail on machines with several hostnames. Removed some unused imports.
50d69a7 [Hari Shreedharan] [SPARK-8136][YARN] Fix flakiness in YarnClusterSuite.

ed2cc3ee

[SPARK-7169] [CORE] Allow metrics system to be configured through SparkConf. · 18c4fceb

Marcelo Vanzin authored 9 years ago

Author: Marcelo Vanzin <vanzin@cloudera.com>
Author: Jacek Lewandowski <lewandowski.jacek@gmail.com>

Closes #6560 from vanzin/SPARK-7169 and squashes the following commits:

737266f [Marcelo Vanzin] Feedback.
702d5a3 [Marcelo Vanzin] Scalastyle.
ce66e7e [Marcelo Vanzin] Remove metrics config handling from SparkConf.
439938a [Jacek Lewandowski] SPARK-7169: Metrics can be additionally configured from Spark configuration

18c4fceb

[SPARK-7639] [PYSPARK] [MLLIB] Python API for KernelDensity · 5aa804f3

MechCoder authored 9 years ago

Python API for KernelDensity

Author: MechCoder <manojkumarsivaraj334@gmail.com>

Closes #6387 from MechCoder/spark-7639 and squashes the following commits:

17abc62 [MechCoder] add tests
2de6540 [MechCoder] style tests
bf4acc0 [MechCoder] Added doctests
84359d5 [MechCoder] [SPARK-7639] Python API for KernelDensity

5aa804f3

[SPARK-8079] [SQL] Makes InsertIntoHadoopFsRelation job/task abortion more robust · 16fc4961

Cheng Lian authored 9 years ago

As described in SPARK-8079, when writing a DataFrame to a `HadoopFsRelation`, if `HadoopFsRelation.prepareForWriteJob` throws exception, an unexpected NPE will be thrown during job abortion. (This issue doesn't bring much damage since the job is failing anyway.)

This PR makes the job/task abortion logic in `InsertIntoHadoopFsRelation` more robust to avoid such confusing exceptions.

Author: Cheng Lian <lian@databricks.com>

Closes #6612 from liancheng/spark-8079 and squashes the following commits:

87cd81e [Cheng Lian] Addresses @rxin's comment
1864c75 [Cheng Lian] Addresses review comments
9e6dbb3 [Cheng Lian] Makes InsertIntoHadoopFsRelation job/task abortion more robust

16fc4961

[SPARK-6973] remove skipped stage ID from completed set on the allJobsPage · a8077e5c

Xu Tingjun authored 9 years ago

Though totalStages = allStages - skippedStages is understandable. But consider the problem [SPARK-6973], I think totalStages = allStages is more reasonable. Like "2/1 (2 failed) (1 skipped)", this item also shows the skipped num, it also will be understandable.

Author: Xu Tingjun <xutingjun@huawei.com>
Author: Xutingjun <xutingjun@huawei.com>
Author: meiyoula <1039320815@qq.com>

Closes #5550 from XuTingjun/allJobsPage and squashes the following commits:

a742541 [Xu Tingjun] delete the loop
40ce94b [Xutingjun] remove stage id from completed set if it retries again
6459238 [meiyoula] delete space
9e23c71 [Xu Tingjun] recover numSkippedStages
b987ea7 [Xutingjun] delete skkiped stages from completed set
47525c6 [Xu Tingjun] modify total stages/tasks on the allJobsPage

a8077e5c

[SPARK-8114][SQL] Remove some wildcard import on TestSQLContext._ round 3. · a71be0a3

Reynold Xin authored 9 years ago

Author: Reynold Xin <rxin@databricks.com>

Closes #6677 from rxin/test-wildcard and squashes the following commits:

8a17b33 [Reynold Xin] Fixed line length.
6663813 [Reynold Xin] [SPARK-8114][SQL] Remove some wildcard import on TestSQLContext._ round 3.

a71be0a3

Jun 05, 2015

[SPARK-6964] [SQL] Support Cancellation in the Thrift Server · eb19d3f7

Dong Wang authored 9 years ago

Support runInBackground in SparkExecuteStatementOperation, and add cancellation

Author: Dong Wang <dong@databricks.com>

Closes #6207 from dongwang218/SPARK-6964-jdbc-cancel and squashes the following commits:

687c113 [Dong Wang] fix 100 characters
7bfa2a7 [Dong Wang] fix merge
380480f [Dong Wang] fix for liancheng's comments
eb3e385 [Dong Wang] small nit
341885b [Dong Wang] small fix
3d8ebf8 [Dong Wang] add spark.sql.hive.thriftServer.async flag
04142c3 [Dong Wang] set SQLSession for async execution
184ec35 [Dong Wang] keep hive conf
819ae03 [Dong Wang] [SPARK-6964][SQL][WIP] Support Cancellation in the Thrift Server

eb19d3f7

[SPARK-8114][SQL] Remove some wildcard import on TestSQLContext._ cont'd. · 6ebe419f

Reynold Xin authored 9 years ago

Fixed the following packages:
sql.columnar
sql.jdbc
sql.json
sql.parquet

Author: Reynold Xin <rxin@databricks.com>

Closes #6667 from rxin/testsqlcontext_wildcard and squashes the following commits:

134a776 [Reynold Xin] Fixed compilation break.
6da7b69 [Reynold Xin] [SPARK-8114][SQL] Remove some wildcard import on TestSQLContext._ cont'd.

6ebe419f

[SPARK-7991] [PySpark] Adding support for passing lists to describe. · 356a4a9b

amey authored 9 years ago

This is a minor change.

Author: amey <amey@skytree.net>

Closes #6655 from ameyc/JIRA-7991/support-passing-list-to-describe and squashes the following commits:

e8a1dff [amey] Adding support for passing lists to describe.

356a4a9b

[SPARK-7747] [SQL] [DOCS] spark.sql.planner.externalSort · 4060526c

Luca Martinetti authored 9 years ago

Add documentation for spark.sql.planner.externalSort

Author: Luca Martinetti <luca@luca.io>

Closes #6272 from lucamartinetti/docs-externalsort and squashes the following commits:

985661b [Luca Martinetti] [SPARK-7747] [SQL] [DOCS] Add documentation for spark.sql.planner.externalSort

4060526c

[SPARK-8112] [STREAMING] Fix the negative event count issue · 4f16d3fe

zsxwing authored 9 years ago

Author: zsxwing <zsxwing@gmail.com>

Closes #6659 from zsxwing/SPARK-8112 and squashes the following commits:

a5d7da6 [zsxwing] Address comments
d255b6e [zsxwing] Fix the negative event count issue

4f16d3fe

[SPARK-7699] [CORE] Lazy start the scheduler for dynamic allocation · 3f80bc84

jerryshao authored 9 years ago

This patch propose to lazy start the scheduler for dynamic allocation to avoid fast ramp down executor numbers is load is less.

This implementation will:
1. immediately start the scheduler is `numExecutorsTarget` is 0, this is the expected behavior.
2. if `numExecutorsTarget` is not zero, start the scheduler until the number is satisfied, if the load is less, this initial started executors will last for at least 60 seconds, user will have a window to submit a job, no need to revamp the executors.
3. if `numExecutorsTarget` is not satisfied until the timeout, this means resource is not enough, the scheduler will start until this timeout, will not wait infinitely.

Please help to review, thanks a lot.

Author: jerryshao <saisai.shao@intel.com>

Closes #6430 from jerryshao/SPARK-7699 and squashes the following commits:

02cac8e [jerryshao] Address the comments
7242450 [jerryshao] Remove the useless import
ecc0b00 [jerryshao] Address the comments
6f75f00 [jerryshao] Style changes
8b8decc [jerryshao] change the test name
fb822ca [jerryshao] Change the solution according to comments
1cc74e5 [jerryshao] Lazy start the scheduler for dynamic allocation

3f80bc84

[SPARK-8099] set executor cores into system in yarn-cluster mode · 0992a0a7

Xutingjun authored 9 years ago

Author: Xutingjun <xutingjun@huawei.com>
Author: xutingjun <xutingjun@huawei.com>

Closes #6643 from XuTingjun/SPARK-8099 and squashes the following commits:

80b18cd [Xutingjun] change to STANDALONE | YARN
ce33148 [Xutingjun] set executor cores into system
e51cc9e [Xutingjun] set executor cores into system
0600861 [xutingjun] set executor cores into system

0992a0a7

Revert "[MINOR] [BUILD] Use custom temp directory during build." · 4036d05c
Andrew Or authored 9 years ago
```
This reverts commit b16b5434.
```
4036d05c

[SPARK-8085] [SPARKR] Support user-specified schema in read.df · 12f5eaee

Shivaram Venkataraman authored 9 years ago

cc davies sun-rui

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #6620 from shivaram/sparkr-read-schema and squashes the following commits:

16a6726 [Shivaram Venkataraman] Fix loadDF to pass schema Also add a unit test
a229877 [Shivaram Venkataraman] Use wrapper function to DataFrameReader
ee70ba8 [Shivaram Venkataraman] Support user-specified schema in read.df

12f5eaee

[SQL] Simplifies binary node pattern matching · bc0d76a2

Cheng Lian authored 9 years ago

This PR is a simpler version of #2764, and adds `unapply` methods to the following binary nodes for simpler pattern matching:

- `BinaryExpression`
- `BinaryComparison`
- `BinaryArithmetics`

This enables nested pattern matching for binary nodes. For example, the following pattern matching

```scala
case p: BinaryComparison if p.left.dataType == StringType &&
                            p.right.dataType == DateType =>
  p.makeCopy(Array(p.left, Cast(p.right, StringType)))
```

can be simplified to

```scala
case p  BinaryComparison(l  StringType(), r  DateType()) =>
  p.makeCopy(Array(l, Cast(r, StringType)))
```

Author: Cheng Lian <lian@databricks.com>

Closes #6537 from liancheng/binary-node-patmat and squashes the following commits:

a3bf5fe [Cheng Lian] Fixes compilation error introduced while rebasing
b738986 [Cheng Lian] Renames `l`/`r` to `left`/`right` or `lhs`/`rhs`
14900ae [Cheng Lian] Simplifies binary node pattern matching

bc0d76a2

[SPARK-6324] [CORE] Centralize handling of script usage messages. · 700312e1

Marcelo Vanzin authored 9 years ago

Reorganize code so that the launcher library handles most of the work
of printing usage messages, instead of having an awkward protocol between
the library and the scripts for that.

This mostly applies to SparkSubmit, since the launcher lib does not do
command line parsing for classes invoked in other ways, and thus cannot
handle failures for those. Most scripts end up going through SparkSubmit,
though, so it all works.

The change adds a new, internal command line switch, "--usage-error",
which prints the usage message and exits with a non-zero status. Scripts
can override the command printed in the usage message by setting an
environment variable - this avoids having to grep the output of
SparkSubmit to remove references to the "spark-submit" script.

The only sub-optimal part of the change is the special handling for the
spark-sql usage, which is now done in SparkSubmitArguments.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #5841 from vanzin/SPARK-6324 and squashes the following commits:

2821481 [Marcelo Vanzin] Merge branch 'master' into SPARK-6324
bf139b5 [Marcelo Vanzin] Filter output of Spark SQL CLI help.
c6609bf [Marcelo Vanzin] Fix exit code never being used when printing usage messages.
6bc1b41 [Marcelo Vanzin] [SPARK-6324] [core] Centralize handling of script usage messages.

700312e1

[STREAMING] Update streaming-kafka-integration.md · 019dc9f5

Akhil Das authored 9 years ago

Fixed the broken links (Examples) in the documentation.

Author: Akhil Das <akhld@darktech.ca>

Closes #6666 from akhld/patch-2 and squashes the following commits:

2228b83 [Akhil Das] Update streaming-kafka-integration.md

019dc9f5

[MINOR] [BUILD] Use custom temp directory during build. · b16b5434

Marcelo Vanzin authored 9 years ago

Even with all the efforts to cleanup the temp directories created by
unit tests, Spark leaves a lot of garbage in /tmp after a test run.
This change overrides java.io.tmpdir to place those files under the
build directory instead.

After an sbt full unit test run, I was left with > 400 MB of temp
files. Since they're now under the build dir, it's much easier to
clean them up.

Also make a slight change to a unit test to make it not pollute the
source directory with test data.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #6653 from vanzin/unit-test-tmp and squashes the following commits:

31e2dd5 [Marcelo Vanzin] Fix tests that depend on each other.
aa92944 [Marcelo Vanzin] [minor] [build] Use custom temp directory during build.

b16b5434

[MINOR] [BUILD] Change link to jenkins builds on github. · da20c8ca

Marcelo Vanzin authored 9 years ago

Link to the tail of the console log, instead of the full log. That's
bound to have the info the user is looking for, and at the same time
loads way more quickly than the (huge) full log, which is just one click
away if needed.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #6664 from vanzin/jenkins-link and squashes the following commits:

ba07ed8 [Marcelo Vanzin] [minor] [build] Change link to jenkins builds on github.

da20c8ca

[MINOR] remove unused interpolation var in log message · 3a5c4da4

Sean Owen authored 9 years ago

Completely trivial but I noticed this wrinkle in a log message today; `$sender` doesn't refer to anything and isn't interpolated here.

Author: Sean Owen <sowen@cloudera.com>

Closes #6650 from srowen/Interpolation and squashes the following commits:

518687a [Sean Owen] Actually interpolate log string
7edb866 [Sean Owen] Trivial: remove unused interpolation var in log message

3a5c4da4

[DOC][Minor]Specify the common sources available for collecting · 2777ed39

Yijie Shen authored 9 years ago

I was wondering what else common sources available until search the source code. Maybe better to make this clear.

Author: Yijie Shen <henry.yijieshen@gmail.com>

Closes #6641 from yijieshen/patch-1 and squashes the following commits:

b5b99b4 [Yijie Shen] Make it clear that JvmSource is the only available additional source currently
f23140c [Yijie Shen] [DOC][Minor]Specify the common sources available for collecting

2777ed39

[SPARK-8116][PYSPARK] Allow sc.range() to take a single argument. · e5054605

Ted Blackman authored 9 years ago


Author: Ted Blackman <ted.blackman@gmail.com>

Closes #6656 from belisarius222/branch-1.4 and squashes the following commits:

747cbc2 [Ted Blackman] [SPARK-8116][PYSPARK] Allow sc.range() to take a single argument.

(cherry picked from commit f02af7c8)
Signed-off-by: Reynold Xin <rxin@databricks.com>

e5054605

[SPARK-8114][SQL] Remove some wildcard import on TestSQLContext._ · 8f16b94a

Reynold Xin authored 9 years ago

I kept some of the sql import there to avoid changing too many lines.

Author: Reynold Xin <rxin@databricks.com>

Closes #6661 from rxin/remove-wildcard-import-sqlcontext and squashes the following commits:

c265347 [Reynold Xin] Fixed ListTablesSuite failure.
de9d491 [Reynold Xin] Fixed tests.
73b5365 [Reynold Xin] Mima.
8f6b642 [Reynold Xin] Fixed style violation.
443f6e8 [Reynold Xin] [SPARK-8113][SQL] Remove some wildcard import on TestSQLContext._

8f16b94a

Jun 04, 2015

[SPARK-8106] [SQL] Set derby.system.durability=test to speed up Hive compatibility tests · 74dc2a90

Josh Rosen authored 9 years ago

Derby has a `derby.system.durability` configuration property that can be used to disable I/O synchronization calls for writes. This sacrifices durability but can result in large performance gains, which is appropriate for tests.

We should enable this in our test system properties in order to speed up the Hive compatibility tests. I saw 2-3x speedups locally with this change.

See https://db.apache.org/derby/docs/10.8/ref/rrefproperdurability.html for more documentation of this property.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #6651 from JoshRosen/hive-compat-suite-speedup and squashes the following commits:

b7a08a2 [Josh Rosen] Set derby.system.durability=test in our unit tests.

74dc2a90

[SPARK-8098] [WEBUI] Show correct length of bytes on log page · 63bc0c44

Carson Wang authored 9 years ago

The log page should only show desired length of bytes. Currently it shows bytes from the startIndex to the end of the file. The "Next" button on the page is always disabled.

Author: Carson Wang <carson.wang@intel.com>

Closes #6640 from carsonwang/logpage and squashes the following commits:

58cb3fd [Carson Wang] Show correct length of bytes on log page

63bc0c44

[SPARK-7440][SQL] Remove physical Distinct operator in favor of Aggregate · 2bcdf8c2

Reynold Xin authored 9 years ago

This patch replaces Distinct with Aggregate in the optimizer, so Distinct will become
more efficient over time as we optimize Aggregate (via Tungsten).

Author: Reynold Xin <rxin@databricks.com>

Closes #6637 from rxin/replace-distinct and squashes the following commits:

b3cc50e [Reynold Xin] Mima excludes.
93d6117 [Reynold Xin] Code review feedback.
87e4741 [Reynold Xin] [SPARK-7440][SQL] Remove physical Distinct operator in favor of Aggregate.

2bcdf8c2

Fixed style issues for [SPARK-6909][SQL] Remove Hive Shim code. · 65938422
Reynold Xin authored 9 years ago

65938422

[SPARK-6909][SQL] Remove Hive Shim code · 0526fea4

Cheolsoo Park authored 9 years ago

This is a follow-up on #6393. I am removing the following files in this PR.
```
./sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim13.scala
./sql/hive-thriftserver/v0.13.1/src/main/scala/org/apache/spark/sql/hive/thriftserver/Shim13.scala
```
Basically, I re-factored the shim code as follows-
* Rewrote code directly with Hive 0.13 methods, or
* Converted code into private methods, or
* Extracted code into separate classes

But for leftover code that didn't fit in any of these cases, I created a HiveShim object. For eg, helper functions which wrap Hive 0.13 methods to work around Hive bugs are placed here.

Author: Cheolsoo Park <cheolsoop@netflix.com>

Closes #6604 from piaozhexiu/SPARK-6909 and squashes the following commits:

5dccc20 [Cheolsoo Park] Remove hive shim code

0526fea4