Commits · ae74c3fa84885dac18f39368e6dda48a2f7a25a5 · cs525-sp18-g07 / spark

Sep 08, 2015

[RELEASE] Add more contributors & only show names in release notes. · ae74c3fa
Reynold Xin authored 9 years ago
```
Author: Reynold Xin <rxin@databricks.com>

Closes #8660 from rxin/contrib.
```
ae74c3fa
[HOTFIX] Fix build break caused by #8494 · 2143d592
Michael Armbrust authored 9 years ago
```
Author: Michael Armbrust <michael@databricks.com>

Closes #8659 from marmbrus/testBuildBreak.
```
2143d592

[SPARK-10327] [SQL] Cache Table is not working while subquery has alias in its project list · d637a666

Cheng Hao authored 9 years ago

```scala
    import org.apache.spark.sql.hive.execution.HiveTableScan
    sql("select key, value, key + 1 from src").registerTempTable("abc")
    cacheTable("abc")

    val sparkPlan = sql(
      """select a.key, b.key, c.key from
        |abc a join abc b on a.key=b.key
        |join abc c on a.key=c.key""".stripMargin).queryExecution.sparkPlan

    assert(sparkPlan.collect { case e: InMemoryColumnarTableScan => e }.size === 3) // failed
    assert(sparkPlan.collect { case e: HiveTableScan => e }.size === 0) // failed
```

The actual plan is:

```
== Parsed Logical Plan ==
'Project [unresolvedalias('a.key),unresolvedalias('b.key),unresolvedalias('c.key)]
 'Join Inner, Some(('a.key = 'c.key))
  'Join Inner, Some(('a.key = 'b.key))
   'UnresolvedRelation [abc], Some(a)
   'UnresolvedRelation [abc], Some(b)
  'UnresolvedRelation [abc], Some(c)

== Analyzed Logical Plan ==
key: int, key: int, key: int
Project [key#14,key#61,key#66]
 Join Inner, Some((key#14 = key#66))
  Join Inner, Some((key#14 = key#61))
   Subquery a
    Subquery abc
     Project [key#14,value#15,(key#14 + 1) AS _c2#16]
      MetastoreRelation default, src, None
   Subquery b
    Subquery abc
     Project [key#61,value#62,(key#61 + 1) AS _c2#58]
      MetastoreRelation default, src, None
  Subquery c
   Subquery abc
    Project [key#66,value#67,(key#66 + 1) AS _c2#63]
     MetastoreRelation default, src, None

== Optimized Logical Plan ==
Project [key#14,key#61,key#66]
 Join Inner, Some((key#14 = key#66))
  Project [key#14,key#61]
   Join Inner, Some((key#14 = key#61))
    Project [key#14]
     InMemoryRelation [key#14,value#15,_c2#16], true, 10000, StorageLevel(true, true, false, true, 1), (Project [key#14,value#15,(key#14 + 1) AS _c2#16]), Some(abc)
    Project [key#61]
     MetastoreRelation default, src, None
  Project [key#66]
   MetastoreRelation default, src, None

== Physical Plan ==
TungstenProject [key#14,key#61,key#66]
 BroadcastHashJoin [key#14], [key#66], BuildRight
  TungstenProject [key#14,key#61]
   BroadcastHashJoin [key#14], [key#61], BuildRight
    ConvertToUnsafe
     InMemoryColumnarTableScan [key#14], (InMemoryRelation [key#14,value#15,_c2#16], true, 10000, StorageLevel(true, true, false, true, 1), (Project [key#14,value#15,(key#14 + 1) AS _c2#16]), Some(abc))
    ConvertToUnsafe
     HiveTableScan [key#61], (MetastoreRelation default, src, None)
  ConvertToUnsafe
   HiveTableScan [key#66], (MetastoreRelation default, src, None)
```

Author: Cheng Hao <hao.cheng@intel.com>

Closes #8494 from chenghao-intel/weird_cache.

d637a666

[SPARK-10492] [STREAMING] [DOCUMENTATION] Update Streaming documentation about... · 52b24a60

Tathagata Das authored 9 years ago

[SPARK-10492] [STREAMING] [DOCUMENTATION] Update Streaming documentation about rate limiting and backpressure

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #8656 from tdas/SPARK-10492 and squashes the following commits:

986cdd6 [Tathagata Das] Added information on backpressure

52b24a60

[SPARK-10468] [ MLLIB ] Verify schema before Dataframe select API call · e6f8d368

Vinod K C authored 9 years ago

Loader.checkSchema was called to verify the schema after dataframe.select(...).
Schema verification should be done before dataframe.select(...)

Author: Vinod K C <vinod.kc@huawei.com>

Closes #8636 from vinodkc/fix_GaussianMixtureModel_load_verification.

e6f8d368

[SPARK-10441] [SQL] Save data correctly to json. · 7a9dcbc9

Yin Huai authored 9 years ago

https://issues.apache.org/jira/browse/SPARK-10441

Author: Yin Huai <yhuai@databricks.com>

Closes #8597 from yhuai/timestampJson.

7a9dcbc9

[SPARK-10470] [ML] ml.IsotonicRegressionModel.copy should set parent · f7b55dbf

Yanbo Liang authored 9 years ago

Copied model must have the same parent, but ml.IsotonicRegressionModel.copy did not set parent.
Here fix it and add test case.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #8637 from yanboliang/spark-10470.

f7b55dbf

[SPARK-10316] [SQL] respect nondeterministic expressions in PhysicalOperation · 5fd57955

Wenchen Fan authored 9 years ago

We did a lot of special handling for non-deterministic expressions in `Optimizer`. However, `PhysicalOperation` just collects all Projects and Filters and mess it up. We should respect the operators order caused by non-deterministic expressions in `PhysicalOperation`.

Author: Wenchen Fan <cloud0fan@outlook.com>

Closes #8486 from cloud-fan/fix.

5fd57955

[SPARK-10480] [ML] Fix ML.LinearRegressionModel.copy() · 5b2192e8

Yanbo Liang authored 9 years ago

This PR fix two model ```copy()``` related issues:
[SPARK-10480](https://issues.apache.org/jira/browse/SPARK-10480)
```ML.LinearRegressionModel.copy()``` ignored argument ```extra```, it will not take effect when users setting this parameter.
[SPARK-10479](https://issues.apache.org/jira/browse/SPARK-10479)
```ML.LogisticRegressionModel.copy()``` should copy model summary if available.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #8641 from yanboliang/linear-regression-copy.

5b2192e8

[SPARK-9170] [SQL] Use OrcStructInspector to be case preserving when writing ORC files · 990c9f79

Liang-Chi Hsieh authored 9 years ago

JIRA: https://issues.apache.org/jira/browse/SPARK-9170

`StandardStructObjectInspector` will implicitly lowercase column names. But I think Orc format doesn't have such requirement. In fact, there is a `OrcStructInspector` specified for Orc format. We should use it when serialize rows to Orc file. It can be case preserving when writing ORC files.

Author: Liang-Chi Hsieh <viirya@appier.com>

Closes #7520 from viirya/use_orcstruct.

990c9f79

Docs small fixes · 6ceed852

Jacek Laskowski authored 9 years ago

Author: Jacek Laskowski <jacek@japila.pl>

Closes #8629 from jaceklaskowski/docs-fixes.

6ceed852

[DOC] Added R to the list of languages with "high-level API" support in the… · 9d8e838d
Stephen Hopper authored 9 years ago
```
… main README.

Author: Stephen Hopper <shopper@shopper-osx.local>

Closes #8646 from enragedginger/master.
```
9d8e838d

Sep 07, 2015

[SPARK-9767] Remove ConnectionManager. · 5ffe752b

Reynold Xin authored 9 years ago

We introduced the Netty network module for shuffle in Spark 1.2, and has turned it on by default for 3 releases. The old ConnectionManager is difficult to maintain. If we merge the patch now, by the time it is released, it would be 1 yr for which ConnectionManager is off by default. It's time to remove it.

Author: Reynold Xin <rxin@databricks.com>

Closes #8161 from rxin/SPARK-9767.

5ffe752b

Sep 05, 2015

[SPARK-10013] [ML] [JAVA] [TEST] remove java assert from java unit tests · 871764c6

Holden Karau authored 9 years ago

From Jira: We should use assertTrue, etc. instead to make sure the asserts are not ignored in tests.

Author: Holden Karau <holden@pigscanfly.ca>

Closes #8607 from holdenk/SPARK-10013-remove-java-assert-from-java-unit-tests.

871764c6

[SPARK-10434] [SQL] Fixes Parquet schema of arrays that may contain null · bca8c072

Cheng Lian authored 9 years ago

To keep full compatibility of Parquet write path with Spark 1.4, we should rename the innermost field name of arrays that may contain null from "array_element" to "array".

Please refer to [SPARK-10434] [1] for more details.

[1]: https://issues.apache.org/jira/browse/SPARK-10434

Author: Cheng Lian <lian@databricks.com>

Closes #8586 from liancheng/spark-10434/fix-parquet-array-type.

bca8c072

[SPARK-10440] [STREAMING] [DOCS] Update python API stuff in the programming guides and python docs · 7a4f326c

Tathagata Das authored 9 years ago

- Fixed information around Python API tags in streaming programming guides
- Added missing stuff in python docs

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #8595 from tdas/SPARK-10440.

7a4f326c

[HOTFIX] [SQL] Fixes compilation error · 6c751940

Cheng Lian authored 9 years ago

Jenkins master builders are currently broken by a merge conflict between PR #8584 and PR #8155.

Author: Cheng Lian <lian@databricks.com>

Closes #8614 from liancheng/hotfix/fix-pr-8155-8584-conflict.

6c751940

Sep 04, 2015

[SPARK-9925] [SQL] [TESTS] Set SQLConf.SHUFFLE_PARTITIONS.key correctly for tests · 47058ca5

Yin Huai authored 9 years ago

This PR fix the failed test and conflict for #8155

https://issues.apache.org/jira/browse/SPARK-9925

Closes #8155

Author: Yin Huai <yhuai@databricks.com>
Author: Davies Liu <davies@databricks.com>

Closes #8602 from davies/shuffle_partitions.

47058ca5

[SPARK-10402] [DOCS] [ML] Add defaults to the scaladoc for params in ml/ · 22eab706

Holden Karau authored 9 years ago

We should make sure the scaladoc for params includes their default values through the models in ml/

Author: Holden Karau <holden@pigscanfly.ca>

Closes #8591 from holdenk/SPARK-10402-add-scaladoc-for-default-values-of-params-in-ml.

22eab706

[SPARK-10311] [STREAMING] Reload appId and attemptId when app starts with... · eafe3723

xutingjun authored 9 years ago

[SPARK-10311] [STREAMING] Reload appId and attemptId when app starts with checkpoint file in cluster mode

Author: xutingjun <xutingjun@huawei.com>

Closes #8477 from XuTingjun/streaming-attempt.

eafe3723

[SPARK-10454] [SPARK CORE] wait for empty event queue · 2e1c1755
robbins authored 9 years ago
```
Author: robbins <robbins@uk.ibm.com>

Closes #8605 from robbinspg/DAGSchedulerSuite-fix.
```
2e1c1755

[SPARK-9669] [MESOS] Support PySpark on Mesos cluster mode. · b087d23e

Timothy Chen authored 9 years ago

Support running pyspark with cluster mode on Mesos!
This doesn't upload any scripts, so if running in a remote Mesos requires the user to specify the script from a available URI.

Author: Timothy Chen <tnachen@gmail.com>

Closes #8349 from tnachen/mesos_python.

b087d23e

[SPARK-10450] [SQL] Minor improvements to readability / style / typos etc. · 3339e6f6
Andrew Or authored 9 years ago
```
Author: Andrew Or <andrew@databricks.com>

Closes #8603 from andrewor14/minor-sql-changes.
```
3339e6f6

[SPARK-10176] [SQL] Show partially analyzed plans when checkAnswer fails to analyze · c3c0e431

Wenchen Fan authored 9 years ago

This PR takes over https://github.com/apache/spark/pull/8389.

This PR improves `checkAnswer` to print the partially analyzed plan in addition to the user friendly error message, in order to aid debugging failing tests.

In doing so, I ran into a conflict with the various ways that we bring a SQLContext into the tests. Depending on the trait we refer to the current context as `sqlContext`, `_sqlContext`, `ctx` or `hiveContext` with access modifiers `public`, `protected` and `private` depending on the defining class.

I propose we refactor as follows:

1. All tests should only refer to a `protected sqlContext` when testing general features, and `protected hiveContext` when it is a method that only exists on a `HiveContext`.
2. All tests should only import `testImplicits._` (i.e., don't import `TestHive.implicits._`)

Author: Wenchen Fan <cloud0fan@outlook.com>

Closes #8584 from cloud-fan/cleanupTests.

c3c0e431

MAINTENANCE: Automated closing of pull requests. · 804a0126

Michael Armbrust authored 9 years ago

This commit exists to close the following pull requests on Github:

Closes #1890 (requested by andrewor14, JoshRosen)
Closes #3558 (requested by JoshRosen, marmbrus)
Closes #3890 (requested by marmbrus)
Closes #3895 (requested by andrewor14, marmbrus)
Closes #4055 (requested by andrewor14)
Closes #4105 (requested by andrewor14)
Closes #4812 (requested by marmbrus)
Closes #5109 (requested by andrewor14)
Closes #5178 (requested by andrewor14)
Closes #5298 (requested by marmbrus)
Closes #5393 (requested by marmbrus)
Closes #5449 (requested by andrewor14)
Closes #5468 (requested by marmbrus)
Closes #5715 (requested by marmbrus)
Closes #6192 (requested by marmbrus)
Closes #6319 (requested by marmbrus)
Closes #6326 (requested by marmbrus)
Closes #6349 (requested by marmbrus)
Closes #6380 (requested by andrewor14)
Closes #6554 (requested by marmbrus)
Closes #6696 (requested by marmbrus)
Closes #6868 (requested by marmbrus)
Closes #6951 (requested by marmbrus)
Closes #7129 (requested by marmbrus)
Closes #7188 (requested by marmbrus)
Closes #7358 (requested by marmbrus)
Closes #7379 (requested by marmbrus)
Closes #7628 (requested by marmbrus)
Closes #7715 (requested by marmbrus)
Closes #7782 (requested by marmbrus)
Closes #7914 (requested by andrewor14)
Closes #8051 (requested by andrewor14)
Closes #8269 (requested by andrewor14)
Closes #8448 (requested by andrewor14)
Closes #8576 (requested by andrewor14)

804a0126

[MINOR] Minor style fix in SparkR · 143e521d

Shivaram Venkataraman authored 9 years ago

`dev/lintr-r` passes on my machine now

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #8601 from shivaram/sparkr-style-fix.

143e521d

Sep 03, 2015

[SPARK-10003] Improve readability of DAGScheduler · cf421386

Andrew Or authored 9 years ago

Note: this is not intended to be in Spark 1.5!

This patch rewrites some code in the `DAGScheduler` to make it more readable. In particular
- there were blocks of code that are unnecessary and removed for simplicity
- there were abstractions that are unnecessary and made the code hard to navigate
- other minor changes

Author: Andrew Or <andrew@databricks.com>

Closes #8217 from andrewor14/dag-scheduler-readability and squashes the following commits:

57abca3 [Andrew Or] Move comment back into if case
574fb1e [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-scheduler-readability
64a9ed2 [Andrew Or] Remove unnecessary code + minor code rewrites

cf421386

[SPARK-10421] [BUILD] Exclude curator artifacts from tachyon dependencies. · 208fbca1

Marcelo Vanzin authored 9 years ago

This avoids them being mistakenly pulled instead of the newer ones that
Spark actually uses. Spark only depends on these artifacts transitively,
so sometimes maven just decides to pick tachyon's version of the
dependency for whatever reason.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #8577 from vanzin/SPARK-10421.

208fbca1

[SPARK-10435] Spark submit should fail fast for Mesos cluster mode with R · 08b07509

Andrew Or authored 9 years ago

It's not supported yet so we should error with a clear message.

Author: Andrew Or <andrew@databricks.com>

Closes #8590 from andrewor14/mesos-cluster-r-guard.

08b07509

[SPARK-9591] [CORE] Job may fail for exception during getting remote block · db4c130f

jeanlyn authored 9 years ago

[SPARK-9591](https://issues.apache.org/jira/browse/SPARK-9591)
When we getting the broadcast variable, we can fetch the block form several location,but now when connecting the lost blockmanager(idle for enough time removed by driver when using dynamic resource allocate and so on) will cause task fail,and the worse case will cause the job fail.

Author: jeanlyn <jeanlyn92@gmail.com>

Closes #7927 from jeanlyn/catch_exception.

db4c130f

[SPARK-10430] [CORE] Added hashCode methods in AccumulableInfo and RDDOperationScope · 11ef32c5
Vinod K C authored 9 years ago
```
Author: Vinod K C <vinod.kc@huawei.com>

Closes #8581 from vinodkc/fix_RDDOperationScope_Hashcode.
```
11ef32c5

[SPARK-9672] [MESOS] Don’t include SPARK_ENV_LOADED when passing env vars · e62f4a46

Pat Shields authored 9 years ago

This contribution is my original work and I license the work to the project under the project's open source license.

Author: Pat Shields <yeoldefortran@gmail.com>

Closes #7979 from pashields/env-loading-on-driver.

e62f4a46

[SPARK-9869] [STREAMING] Wait for all event notifications before asserting results · 754f853b
robbins authored 9 years ago
```
Author: robbins <robbins@uk.ibm.com>

Closes #8589 from robbinspg/InputStreamSuite-fix.
```
754f853b
[SPARK-10431] [CORE] Fix intermittent test failure. Wait for event queue to be clear · d911c682
robbins authored 9 years ago
```
Author: robbins <robbins@uk.ibm.com>

Closes #8582 from robbinspg/InputOutputMetricsSuite.
```
d911c682
[SPARK-10432] spark.port.maxRetries documentation is unclear · 49aff7b9
Tom Graves authored 9 years ago
```
Author: Tom Graves <tgraves@yahoo-inc.com>

Closes #8585 from tgravescs/SPARK-10432.
```
49aff7b9

[SPARK-8951] [SPARKR] support Unicode characters in collect() · af0e3125

CHOIJAEHONG authored 9 years ago

Spark gives an error message and does not show the output when a field of the result DataFrame contains characters in CJK.
I changed SerDe.scala in order that Spark support Unicode characters when writes a string to R.

Author: CHOIJAEHONG <redrock07@naver.com>

Closes #7494 from CHOIJAEHONG1/SPARK-8951.

af0e3125

[SPARK-9596] [SQL] treat hadoop classes as shared one in IsolatedClientLoader · 3abc0d51

WangTaoTheTonic authored 9 years ago

https://issues.apache.org/jira/browse/SPARK-9596

Author: WangTaoTheTonic <wangtao111@huawei.com>

Closes #7931 from WangTaoTheTonic/SPARK-9596.

3abc0d51

[SPARK-10332] [CORE] Fix yarn spark executor validation · 67580f1f

Holden Karau authored 9 years ago

From Jira:
Running spark-submit with yarn with number-executors equal to 0 when not using dynamic allocation should error out.
In spark 1.5.0 it continues and ends up hanging.
yarn.ClientArguments still has the check so something else must have changed.
spark-submit --master yarn --deploy-mode cluster --class org.apache.spark.examples.SparkPi --num-executors 0 ....
spark 1.4.1 errors with:
java.lang.IllegalArgumentException:
Number of executors was 0, but must be at least 1
(or 0 if dynamic executor allocation is enabled).

Author: Holden Karau <holden@pigscanfly.ca>

Closes #8580 from holdenk/SPARK-10332-spark-submit-to-yarn-executors-0-message.

67580f1f

[SPARK-10411] [SQL] Move visualization above explain output and hide explain by default · 0349b5b4

zsxwing authored 9 years ago

New screenshots after this fix:

<img width="627" alt="s1" src="https://cloud.githubusercontent.com/assets/1000778/9625782/4b2dba36-518b-11e5-9104-c713ff026e3d.png">

Default:
<img width="462" alt="s2" src="https://cloud.githubusercontent.com/assets/1000778/9625817/92366e50-518b-11e5-9981-cdfb774d66b8.png">

After clicking `+details`:
<img width="377" alt="s3" src="https://cloud.githubusercontent.com/assets/1000778/9625784/4ba24342-518b-11e5-8522-846a16a95d44.png">

Author: zsxwing <zsxwing@gmail.com>

Closes #8570 from zsxwing/SPARK-10411.

0349b5b4

[SPARK-10379] preserve first page in UnsafeShuffleExternalSorter · 62b4690d
Davies Liu authored 9 years ago
```
Author: Davies Liu <davies@databricks.com>

Closes #8543 from davies/preserve_page.
```
62b4690d