Commits · 39ae04e6b714e085a1341aa84d8fc5fc827d5f35 · cs525-sp18-g07 / spark

Jan 11, 2016

[SPARK-12692][BUILD][STREAMING] Scala style: Fix the style violation (Space before "," or ":") · 39ae04e6

Kousuke Saruta authored 9 years ago

Fix the style violation (space before , and :).
This PR is a followup for #10643.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #10685 from sarutak/SPARK-12692-followup-streaming.

39ae04e6

[SPARK-11823] Ignores HiveThriftBinaryServerSuite's test jdbc cancel · aaa2c3b6

Yin Huai authored 9 years ago

https://issues.apache.org/jira/browse/SPARK-11823

This test often hangs and times out, leaving hanging processes. Let's ignore it for now and improve the test.

Author: Yin Huai <yhuai@databricks.com>

Closes #10715 from yhuai/SPARK-11823-ignore.

aaa2c3b6

[SPARK-12498][SQL][MINOR] BooleanSimplication simplification · 36d49350

Cheng Lian authored 9 years ago

Scala syntax allows binary case classes to be used as infix operator in pattern matching. This PR makes use of this syntax sugar to make `BooleanSimplification` more readable.

Author: Cheng Lian <lian@databricks.com>

Closes #10445 from liancheng/boolean-simplification-simplification.

36d49350

[SPARK-12742][SQL] org.apache.spark.sql.hive.LogicalPlanToSQLSuite failure due... · 473907ad

wangfei authored 9 years ago

[SPARK-12742][SQL] org.apache.spark.sql.hive.LogicalPlanToSQLSuite failure due to Table already exists exception

```
[info] Exception encountered when attempting to run a suite with class name:
org.apache.spark.sql.hive.LogicalPlanToSQLSuite *** ABORTED *** (325 milliseconds)
[info]   org.apache.spark.sql.AnalysisException: Table `t1` already exists.;
[info]   at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:296)
[info]   at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:285)
[info]   at org.apache.spark.sql.hive.LogicalPlanToSQLSuite.beforeAll(LogicalPlanToSQLSuite.scala:33)
[info]   at org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187)
[info]   at org.apache.spark.sql.hive.LogicalPlanToSQLSuite.beforeAll(LogicalPlanToSQLSuite.scala:23)
[info]   at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:253)
[info]   at org.apache.spark.sql.hive.LogicalPlanToSQLSuite.run(LogicalPlanToSQLSuite.scala:23)
[info]   at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462)
[info]   at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671)
[info]   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
[info]   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
[info]   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[info]   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[info]   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[info]   at java.lang.Thread.run(Thread.java:745)
```

/cc liancheng

Author: wangfei <wangfei_hello@126.com>

Closes #10682 from scwf/fix-test.

473907ad

[SPARK-12576][SQL] Enable expression parsing in CatalystQl · fe9eb0b0

Herman van Hovell authored 9 years ago

The PR allows us to use the new SQL parser to parse SQL expressions such as: ```1 + sin(x*x)```

We enable this functionality in this PR, but we will not start using this actively yet. This will be done as soon as we have reached grammar parity with the existing parser stack.

cc rxin

Author: Herman van Hovell <hvanhovell@questtec.nl>

Closes #10649 from hvanhovell/SPARK-12576.

fe9eb0b0

[SPARK-10809][MLLIB] Single-document topicDistributions method for LocalLDAModel · bbea8885

Yuhao Yang authored 9 years ago

jira: https://issues.apache.org/jira/browse/SPARK-10809

We could provide a single-document topicDistributions method for LocalLDAModel to allow for quick queries which avoid RDD operations. Currently, the user must use an RDD of documents.

add some missing assert too.

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #9484 from hhbyyh/ldaTopicPre.

bbea8885

[SPARK-12685][MLLIB] word2vec trainWordsCount gets overflow · 4f8eefa3

Yuhao Yang authored 9 years ago

jira: https://issues.apache.org/jira/browse/SPARK-12685
the log of `word2vec` reports
trainWordsCount = -785727483
during computation over a large dataset.

Update the priority as it will affect the computation process.
`alpha = learningRate * (1 - numPartitions * wordCount.toDouble / (trainWordsCount + 1))`

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #10627 from hhbyyh/w2voverflow.

4f8eefa3

[SPARK-12603][MLLIB] PySpark MLlib GaussianMixtureModel should support single... · ee4ee02b

Yanbo Liang authored 9 years ago

[SPARK-12603][MLLIB] PySpark MLlib GaussianMixtureModel should support single instance predict/predictSoft

PySpark MLlib ```GaussianMixtureModel``` should support single instance ```predict/predictSoft``` just like Scala do.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10552 from yanboliang/spark-12603.

ee4ee02b

[SPARK-12758][SQL] add note to Spark SQL Migration guide about TimestampType casting · a767ee8a
Brandon Bradley authored 9 years ago
```
Warning users about casting changes.

Author: Brandon Bradley <bradleytastic@gmail.com>

Closes #10708 from blbradley/spark-12758.
```
a767ee8a

[SPARK-12734][HOTFIX] Build changes must trigger all tests; clean after install in dep tests · a4499145

Josh Rosen authored 9 years ago

This patch fixes a build/test issue caused by the combination of #10672 and a latent issue in the original `dev/test-dependencies` script.

First, changes which _only_ touched build files were not triggering full Jenkins runs, making it possible for a build change to be merged even though it could cause failures in other tests. The `root` build module now depends on `build`, so all tests will now be run whenever a build-related file is changed.

I also added a `clean` step to the Maven install step in `dev/test-dependencies` in order to address an issue where the dummy JARs stuck around and caused "multiple assembly JARs found" errors in tests.

/cc zsxwing

Author: Josh Rosen <joshrosen@databricks.com>

Closes #10704 from JoshRosen/fix-build-test-problems.

a4499145

[STREAMING][MINOR] Typo fixes · b313bada

Jacek Laskowski authored 9 years ago

Author: Jacek Laskowski <jacek@japila.pl>

Closes #10698 from jaceklaskowski/streaming-kafka-typo-fixes.

b313bada

[SPARK-12744][SQL] Change parsing JSON integers to timestamps to treat... · 9559ac5f

Anatoliy Plastinin authored 9 years ago

[SPARK-12744][SQL] Change parsing JSON integers to timestamps to treat integers as number of seconds

JIRA: https://issues.apache.org/jira/browse/SPARK-12744

This PR makes parsing JSON integers to timestamps consistent with casting behavior.

Author: Anatoliy Plastinin <anatoliy.plastinin@gmail.com>

Closes #10687 from antlypls/fix-json-timestamp-parsing.

9559ac5f

[SPARK-12269][STREAMING][KINESIS] Update aws-java-sdk version · 8fe928b4

BrianLondon authored 9 years ago

The current Spark Streaming kinesis connector references a quite old version 1.9.40 of the AWS Java SDK (1.10.40 is current). Numerous AWS features including Kinesis Firehose are unavailable in 1.9. Those two versions of the AWS SDK in turn require conflicting versions of Jackson (2.4.4 and 2.5.3 respectively) such that one cannot include the current AWS SDK in a project that also uses the Spark Streaming Kinesis ASL.

Author: BrianLondon <brian@seatgeek.com>

Closes #10256 from BrianLondon/master.

8fe928b4

removed lambda from sortByKey() · bd723bd5

Udo Klein authored 9 years ago

According to the documentation the sortByKey method does not take a lambda as an argument, thus the example is flawed. Removed the argument completely as this will default to ascending sort.

Author: Udo Klein <git@blinkenlight.net>

Closes #10640 from udoklein/patch-1.

bd723bd5

[SPARK-12539][FOLLOW-UP] always sort in partitioning writer · f253feff

Wenchen Fan authored 9 years ago

address comments in #10498 , especially https://github.com/apache/spark/pull/10498#discussion_r49021259

Author: Wenchen Fan <wenchen@databricks.com>

This patch had conflicts when merged, resolved by
Committer: Reynold Xin <rxin@databricks.com>

Closes #10638 from cloud-fan/bucket-write.

f253feff

[SPARK-12734][HOTFIX][TEST-MAVEN] Fix bug in Netty exclusions · f13c7f8f

Josh Rosen authored 9 years ago

This is a hotfix for a build bug introduced by the Netty exclusion changes in #10672. We can't exclude `io.netty:netty` because Akka depends on it. There's not a direct conflict between `io.netty:netty` and `io.netty:netty-all`, because the former puts classes in the `org.jboss.netty` namespace while the latter uses the `io.netty` namespace. However, there still is a conflict between `org.jboss.netty:netty` and `io.netty:netty`, so we need to continue to exclude the JBoss version of that artifact.

While the diff here looks somewhat large, note that this is only a revert of a some of the changes from #10672. You can see the net changes in pom.xml at https://github.com/apache/spark/compare/3119206b7188c23055621dfeaf6874f21c711a82...5211ab8#diff-600376dffeb79835ede4a0b285078036

Author: Josh Rosen <joshrosen@databricks.com>

Closes #10693 from JoshRosen/netty-hotfix.

f13c7f8f

[SPARK-4628][BUILD] Add a resolver to MiMaBuild.scala for mqttv3(1.0.1). · 008a5582

Kousuke Saruta authored 9 years ago

#10659 removed the repository `https://repo.eclipse.org/content/repositories/paho-releases` but it's needed by MiMa because `spark-streaming-mqtt(1.6.0)` depends on `mqttv3(1.0.1)` and it is provided by the removed repository and maven-central provide only `mqttv3(1.0.2)` for now.
Otherwise, if `mqttv3(1.0.1)` is absent from the local repository, dev/mima should fail.

JoshRosen Do you have any other better idea?

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #10688 from sarutak/SPARK-4628-followup.

008a5582

Jan 10, 2016

[SPARK-3873][BUILD] Enable import ordering error checking. · 6439a825

Marcelo Vanzin authored 9 years ago

Turn import ordering violations into build errors, plus a few adjustments
to account for how the checker behaves. I'm a little on the fence about
whether the existing code is right, but it's easier to appease the checker
than to discuss what's the more correct order here.

Plus a few fixes to imports that cropped in since my recent cleanups.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #10612 from vanzin/SPARK-3873-enable.

6439a825

[SPARK-12734][BUILD] Fix Netty exclusion and use Maven Enforcer to prevent future bugs · 3ab0138b

Josh Rosen authored 9 years ago

Netty classes are published under multiple artifacts with different names, so our build needs to exclude the `io.netty:netty` and `org.jboss.netty:netty` versions of the Netty artifact. However, our existing exclusions were incomplete, leading to situations where duplicate Netty classes would wind up on the classpath and cause compile errors (or worse).

This patch fixes the exclusion issue by adding more exclusions and uses Maven Enforcer's [banned dependencies](https://maven.apache.org/enforcer/enforcer-rules/bannedDependencies.html) rule to prevent these classes from accidentally being reintroduced. I also updated `dev/test-dependencies.sh` to run `mvn validate` so that the enforcer rules can run as part of pull request builds.

/cc rxin srowen pwendell. I'd like to backport at least the exclusion portion of this fix to `branch-1.5` in order to fix the documentation publishing job, which fails nondeterministically due to incompatible versions of Netty classes taking precedence on the compile-time classpath.

Author: Josh Rosen <rosenville@gmail.com>
Author: Josh Rosen <joshrosen@databricks.com>

Closes #10672 from JoshRosen/enforce-netty-exclusions.

3ab0138b

[SPARK-12692][BUILD][GRAPHX] Scala style: Fix the style violation (Space before "," or ":") · 3119206b

Kousuke Saruta authored 9 years ago

Fix the style violation (space before `,` and `:`).
This PR is a followup for #10643.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #10683 from sarutak/SPARK-12692-followup-graphx.

3119206b

[SPARK-12692][BUILD][MLLIB] Scala style: Fix the style violation (Space before "," or ":") · e5904bb5

Kousuke Saruta authored 9 years ago

Fix the style violation (space before , and :).
This PR is a followup for #10643.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #10684 from sarutak/SPARK-12692-followup-mllib.

e5904bb5

[SPARK-12736][CORE][DEPLOY] Standalone Master cannot be started due t… · b78e028e

Jacek Laskowski authored 9 years ago

…o NoClassDefFoundError: org/spark-project/guava/collect/Maps

/cc srowen rxin

Author: Jacek Laskowski <jacek@japila.pl>

Closes #10674 from jaceklaskowski/SPARK-12736.

b78e028e

Jan 09, 2016
- [SPARK-12735] Consolidate & move spark-ec2 to AMPLab managed repository. · 5b0d5443
  Reynold Xin authored 9 years ago
  
  Author: Reynold Xin <rxin@databricks.com> Closes #10673 from rxin/SPARK-12735.
  5b0d5443
- Close #10665 · 3efd106e
  Reynold Xin authored 9 years ago
  
  3efd106e
- [SPARK-12340] Fix overflow in various take functions. · b23c4521
  Reynold Xin authored 9 years ago
  
  This is a follow-up for the original patch #10562. Author: Reynold Xin <rxin@databricks.com> Closes #10670 from rxin/SPARK-12340.
  b23c4521
- [SPARK-12645][SPARKR] SparkR support hash function · 3d77cffe
  Yanbo Liang authored 9 years ago
  
  Add ```hash``` function for SparkR ```DataFrame```. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10597 from yanboliang/spark-12645.
  3d77cffe
Jan 08, 2016

[SPARK-12577] [SQL] Better support of parentheses in partition by and order by... · 95cd5d95

Liang-Chi Hsieh authored 9 years ago

[SPARK-12577] [SQL] Better support of parentheses in partition by and order by clause of window function's over clause

JIRA: https://issues.apache.org/jira/browse/SPARK-12577

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #10620 from viirya/fix-parentheses.

95cd5d95

[SPARK-4628][BUILD] Remove all non-Maven-Central repositories from build · 090d6913

Josh Rosen authored 9 years ago

This patch removes all non-Maven-central repositories from Spark's build, thereby avoiding any risk of future build-breaks due to us accidentally depending on an artifact which is not present in an immutable public Maven repository.

I tested this by running

```
build/mvn \
        -Phive \
        -Phive-thriftserver \
        -Pkinesis-asl \
        -Pspark-ganglia-lgpl \
        -Pyarn \
        dependency:go-offline
```

inside of a fresh Ubuntu Docker container with no Ivy or Maven caches (I did a similar test for SBT).

Author: Josh Rosen <joshrosen@databricks.com>

Closes #10659 from JoshRosen/SPARK-4628.

090d6913

[SPARK-12730][TESTS] De-duplicate some test code in BlockManagerSuite · 1fdf9bbd

Josh Rosen authored 9 years ago

This patch deduplicates some test code in BlockManagerSuite. I'm splitting this change off from a larger PR in order to make things easier to review.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #10667 from JoshRosen/block-mgr-tests-cleanup.

1fdf9bbd

[SPARK-12593][SQL] Converts resolved logical plan back to SQL · d9447cac

Cheng Lian authored 9 years ago

This PR tries to enable Spark SQL to convert resolved logical plans back to SQL query strings. For now, the major use case is to canonicalize Spark SQL native view support. The major entry point is `SQLBuilder.toSQL`, which returns an `Option[String]` if the logical plan is recognized.

The current version is still in WIP status, and is quite limited. Known limitations include:

1. The logical plan must be analyzed but not optimized

The optimizer erases `Subquery` operators, which contain necessary scope information for SQL generation. Future versions should be able to recover erased scope information by inserting subqueries when necessary.

1. The logical plan must be created using HiveQL query string

Query plans generated by composing arbitrary DataFrame API combinations are not supported yet. Operators within these query plans need to be rearranged into a canonical form that is more suitable for direct SQL generation. For example, the following query plan

```
Filter (a#1 < 10)
+- MetastoreRelation default, src, None
```

need to be canonicalized into the following form before SQL generation:

```
Project [a#1, b#2, c#3]
+- Filter (a#1 < 10)
+- MetastoreRelation default, src, None
```

Otherwise, the SQL generation process will have to handle a large number of special cases.

1. Only a fraction of expressions and basic logical plan operators are supported in this PR

Currently, 95.7% (1720 out of 1798) query plans in `HiveCompatibilitySuite` can be successfully converted to SQL query strings.

Known unsupported components are:

- Expressions
- Part of math expressions
- Part of string expressions (buggy?)
- Null expressions
- Calendar interval literal
- Part of date time expressions
- Complex type creators
- Special `NOT` expressions, e.g. `NOT LIKE` and `NOT IN`
- Logical plan operators/patterns
- Cube, rollup, and grouping set
- Script transformation
- Generator
- Distinct aggregation patterns that fit `DistinctAggregationRewriter` analysis rule
- Window functions

Support for window functions, generators, and cubes etc. will be added in follow-up PRs.

This PR leverages `HiveCompatibilitySuite` for testing SQL generation in a "round-trip" manner:

* For all select queries, we try to convert it back to SQL
* If the query plan is convertible, we parse the generated SQL into a new logical plan
* Run the new logical plan instead of the original one

If the query plan is inconvertible, the test case simply falls back to the original logic.

TODO

- [x] Fix failed test cases
- [x] Support for more basic expressions and logical plan operators (e.g. distinct aggregation etc.)
- [x] Comments and documentation

Author: Cheng Lian <lian@databricks.com>

Closes #10541 from liancheng/sql-generation.

d9447cac

[SPARK-4819] Remove Guava's "Optional" from public API · 659fd9d0

Sean Owen authored 9 years ago

Replace Guava `Optional` with (an API clone of) Java 8 `java.util.Optional` (edit: and a clone of Guava `Optional`)

See also https://github.com/apache/spark/pull/10512

Author: Sean Owen <sowen@cloudera.com>

Closes #10513 from srowen/SPARK-4819.

659fd9d0

[SPARK-12654] sc.wholeTextFiles with spark.hadoop.cloneConf=true fail… · 553fd7b9

Thomas Graves authored 9 years ago

…s on secure Hadoop

https://issues.apache.org/jira/browse/SPARK-12654

So the bug here is that WholeTextFileRDD.getPartitions has:
val conf = getConf
in getConf if the cloneConf=true it creates a new Hadoop Configuration. Then it uses that to create a new newJobContext.
The newJobContext will copy credentials around, but credentials are only present in a JobConf not in a Hadoop Configuration. So basically when it is cloning the hadoop configuration its changing it from a JobConf to Configuration and dropping the credentials that were there. NewHadoopRDD just uses the conf passed in for the getPartitions (not getConf) which is why it works.

Author: Thomas Graves <tgraves@staydecay.corp.gq1.yahoo.com>

Closes #10651 from tgravescs/SPARK-12654.

553fd7b9

fixed numVertices in transitive closure example · 8c70cb4c
Udo Klein authored 9 years ago
```
Author: Udo Klein <git@blinkenlight.net>

Closes #10642 from udoklein/patch-2.
```
8c70cb4c

[DOCUMENTATION] doc fix of job scheduling · 00d92617

Jeff Zhang authored 9 years ago

spark.shuffle.service.enabled is spark application related configuration, it is not necessary to set it in yarn-site.xml

Author: Jeff Zhang <zjffdu@apache.org>

Closes #10657 from zjffdu/doc-fix.

00d92617

[SPARK-12701][CORE] FileAppender should use join to ensure writing thread completion · ea104b8f

Bryan Cutler authored 9 years ago

Changed Logging FileAppender to use join in `awaitTermination` to ensure that thread is properly finished before returning.

Author: Bryan Cutler <cutlerb@gmail.com>

Closes #10654 from BryanCutler/fileAppender-join-thread-SPARK-12701.

ea104b8f

[SPARK-12687] [SQL] Support from clause surrounded by `()`. · cfe1ba56

Liang-Chi Hsieh authored 9 years ago

JIRA: https://issues.apache.org/jira/browse/SPARK-12687

Some queries such as `(select 1 as a) union (select 2 as a)` can't work. This patch fixes it.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #10660 from viirya/fix-union.

cfe1ba56

[SPARK-12618][CORE][STREAMING][SQL] Clean up build warnings: 2.0.0 edition · b9c83533

Sean Owen authored 9 years ago

Fix most build warnings: mostly deprecated API usages. I'll annotate some of the changes below. CC rxin who is leading the charge to remove the deprecated APIs.

Author: Sean Owen <sowen@cloudera.com>

Closes #10570 from srowen/SPARK-12618.

b9c83533

[SPARK-12692][BUILD] Scala style: check no white space before comma and colon · 794ea553

Kousuke Saruta authored 9 years ago

We should not put a white space before `,` and `:` so let's check it.
Because there are lots of style violations, first, I'd like to add a checker, enable and let the level `warning`.
Then, I'd like to fix the style step by step.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #10643 from sarutak/SPARK-12692.

794ea553

Jan 07, 2016

Fix indentation for the previous patch. · 726bd3c4
Reynold Xin authored 9 years ago

726bd3c4

[SPARK-12317][SQL] Support units (m,k,g) in SQLConf · 5028a001

Kevin Yu authored 9 years ago

This PR is continue from previous closed PR 10314.

In this PR, SHUFFLE_TARGET_POSTSHUFFLE_INPUT_SIZE will be taken memory string conventions as input.

For example, the user can now specify 10g for SHUFFLE_TARGET_POSTSHUFFLE_INPUT_SIZE in SQLConf file.

marmbrus srowen : Can you help review this code changes ? Thanks.

Author: Kevin Yu <qyu@us.ibm.com>

Closes #10629 from kevinyu98/spark-12317.

5028a001