Commits · ca8f70e9d473d2c81866f3c330cc6545c33bdac7 · cs525-sp18-g07 / spark

Aug 11, 2015

[SPARK-9649] Fix flaky test MasterSuite again - disable REST · ca8f70e9

Andrew Or authored 9 years ago

The REST server is not actually used in most tests and so we can disable it. It is a source of flakiness because it tries to bind to a specific port in vain. There was also some code that avoided the shuffle service in tests. This is actually not necessary because the shuffle service is already off by default.

Author: Andrew Or <andrew@databricks.com>

Closes #8084 from andrewor14/fix-master-suite-again.

ca8f70e9

[SPARK-9849] [SQL] DirectParquetOutputCommitter qualified name should be backward compatible · afa757c9

Reynold Xin authored 9 years ago

DirectParquetOutputCommitter was moved in SPARK-9763. However, users can explicitly set the class as a config option, so we must be able to resolve the old committer qualified name.

Author: Reynold Xin <rxin@databricks.com>

Closes #8114 from rxin/SPARK-9849.

afa757c9

[SPARK-9074] [LAUNCHER] Allow arbitrary Spark args to be set. · 5a5bbc29

Marcelo Vanzin authored 9 years ago

This change allows any Spark argument to be added to the app to
be started using SparkLauncher. Known arguments are properly
validated, while unknown arguments are allowed so that the
library can launch newer Spark versions (in case SPARK_HOME points
at one).

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #7975 from vanzin/SPARK-9074 and squashes the following commits:

b5e451a [Marcelo Vanzin] [SPARK-9074] [launcher] Allow arbitrary Spark args to be set.

5a5bbc29

[HOTFIX] Fix style error caused by 017b5de0 · 736af95b
Andrew Or authored 9 years ago

736af95b

[SPARK-8925] [MLLIB] Add @since tags to mllib.util · 017b5de0

Sudhakar Thota authored 9 years ago

Went thru the history of changes the file MLUtils.scala and picked up the version that the change went in.

Author: Sudhakar Thota <sudhakarthota@yahoo.com>
Author: Sudhakar Thota <sudhakarthota@sudhakars-mbp-2.usca.ibm.com>

Closes #7436 from sthota2014/SPARK-8925_thotas.

017b5de0

[SPARK-9788] [MLLIB] Fix LDA Binary Compatibility · be3e2716

Feynman Liang authored 9 years ago

1. Add “asymmetricDocConcentration” and revert docConcentration changes. If the (internal) doc concentration vector is a single value, “getDocConcentration" returns it. If it is a constant vector, getDocConcentration returns the first item, and fails otherwise.
2. Give `LDAModel.gammaShape` a default value in `LDAModel` concrete class constructors.

jkbradley

Author: Feynman Liang <fliang@databricks.com>

Closes #8077 from feynmanliang/SPARK-9788 and squashes the following commits:

6b07bc8 [Feynman Liang] Code review changes
9d6a71e [Feynman Liang] Add asymmetricAlpha alias
bf4e685 [Feynman Liang] Asymmetric docConcentration
4cab972 [Feynman Liang] Default gammaShape

be3e2716

Closes #1290 · 423cdfd8
Xiangrui Meng authored 9 years ago
```
Closes #4934
```
423cdfd8

[SPARK-9824] [CORE] Fix the issue that InternalAccumulator leaks WeakReference · f16bc68d

zsxwing authored 9 years ago

`InternalAccumulator.create` doesn't call `registerAccumulatorForCleanup` to register itself with ContextCleaner, so `WeakReference`s for these accumulators in `Accumulators.originals` won't be removed.

This PR added `registerAccumulatorForCleanup` for internal accumulators to avoid the memory leak.

Author: zsxwing <zsxwing@gmail.com>

Closes #8108 from zsxwing/internal-accumulators-leak.

f16bc68d

[SPARK-9814] [SQL] EqualNotNull not passing to data sources · 00c02728

hyukjinkwon authored 9 years ago

Author: hyukjinkwon <gurwls223@gmail.com>
Author: 권혁진 <gurwls223@gmail.com>

Closes #8096 from HyukjinKwon/master.

00c02728

[SPARK-7726] Add import so Scaladoc doesn't fail. · 2a3be4dd

Patrick Wendell authored 9 years ago

This is another import needed so Scala 2.11 doc generation doesn't fail.
See SPARK-7726 for more detail. I tested this locally and the 2.11
install goes from failing to succeeding with this patch.

Author: Patrick Wendell <patrick@databricks.com>

Closes #8095 from pwendell/scaladoc.

2a3be4dd

[SPARK-9750] [MLLIB] Improve equals on SparseMatrix and DenseMatrix · 520ad44b

Feynman Liang authored 9 years ago

Adds unit test for `equals` on `mllib.linalg.Matrix` class and `equals` to both `SparseMatrix` and `DenseMatrix`. Supports equality testing between `SparseMatrix` and `DenseMatrix`.

mengxr

Author: Feynman Liang <fliang@databricks.com>

Closes #8042 from feynmanliang/SPARK-9750 and squashes the following commits:

bb70d5e [Feynman Liang] Breeze compare for dense matrices as well, in case other is sparse
ab6f3c8 [Feynman Liang] Sparse matrix compare for equals
22782df [Feynman Liang] Add equality based on matrix semantics, not representation
78f9426 [Feynman Liang] Add casts
43d28fa [Feynman Liang] Fix failing test
6416fa0 [Feynman Liang] Add failing sparse matrix equals tests

520ad44b

[SPARK-9646] [SQL] Add metrics for all join and aggregate operators · 5831294a

zsxwing authored 9 years ago

This PR added metrics for all join and aggregate operators. However, I found the metrics may be confusing in the following two case:
1. The iterator is not totally consumed and the metric values will be less.
2. Recreating the iterators will make metric values look bigger than the size of the input source, such as `CartesianProduct`.

Author: zsxwing <zsxwing@gmail.com>

Closes #8060 from zsxwing/sql-metrics and squashes the following commits:

40f3fc1 [zsxwing] Mark LongSQLMetric private[metric] to avoid using incorrectly and leak memory
b1b9071 [zsxwing] Merge branch 'master' into sql-metrics
4bef25a [zsxwing] Add metrics for SortMergeOuterJoin
95ccfc6 [zsxwing] Merge branch 'master' into sql-metrics
67cb4dd [zsxwing] Add metrics for Project and TungstenProject; remove metrics from PhysicalRDD and LocalTableScan
0eb47d4 [zsxwing] Merge branch 'master' into sql-metrics
dd9d932 [zsxwing] Avoid creating new Iterators
589ea26 [zsxwing] Add metrics for all join and aggregate operators

5831294a

[SPARK-9572] [STREAMING] [PYSPARK] Added StreamingContext.getActiveOrCreate() in Python · 5b8bb1b2

Tathagata Das authored 9 years ago

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #8080 from tdas/SPARK-9572 and squashes the following commits:

64a231d [Tathagata Das] Fix based on comments
741a0d0 [Tathagata Das] Fixed style
f4f094c [Tathagata Das] Tweaked test
9afcdbe [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-9572
e21488d [Tathagata Das] Minor update
1a371d9 [Tathagata Das] Addressed comments.
60479da [Tathagata Das] Fixed indent
9c2da9c [Tathagata Das] Fixed bugs
b5bd32c [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-9572
b55b348 [Tathagata Das] Removed prints
5781728 [Tathagata Das] Fix style issues
b711214 [Tathagata Das] Reverted run-tests.py
643b59d [Tathagata Das] Revert unnecessary change
150e58c [Tathagata Das] Added StreamingContext.getActiveOrCreate() in Python

5b8bb1b2

[SPARK-8764] [ML] string indexer should take option to handle unseen values · dbd778d8

Holden Karau authored 9 years ago

As a precursor to adding a public constructor add an option to handle unseen values by skipping rather than throwing an exception (default remains throwing an exception),

Author: Holden Karau <holden@pigscanfly.ca>

Closes #7266 from holdenk/SPARK-8764-string-indexer-should-take-option-to-handle-unseen-values and squashes the following commits:

38a4de9 [Holden Karau] fix long line
045bf22 [Holden Karau] Add a second b entry so b gets 0 for sure
81dd312 [Holden Karau] Update the docs for handleInvalid param to be more descriptive
7f37f6e [Holden Karau] remove extra space (scala style)
414e249 [Holden Karau] And switch to using handleInvalid instead of skipInvalid
1e53f9b [Holden Karau] update the param (codegen side)
7a22215 [Holden Karau] fix typo
100a39b [Holden Karau] Merge in master
aa5b093 [Holden Karau] Since we filter we should never go down this code path if getSkipInvalid is true
75ffa69 [Holden Karau] Remove extra newline
d69ef5e [Holden Karau] Add a test
b5734be [Holden Karau] Add support for unseen labels
afecd4e [Holden Karau] Add a param to skip invalid entries.

dbd778d8

[SPARK-8345] [ML] Add an SQL node as a feature transformer · 8cad854e

Yanbo Liang authored 9 years ago

Implements the transforms which are defined by SQL statement.
Currently we only support SQL syntax like 'SELECT ... FROM __THIS__'
where '__THIS__' represents the underlying table of the input dataset.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #7465 from yanboliang/spark-8345 and squashes the following commits:

b403fcb [Yanbo Liang] address comments
0d4bb15 [Yanbo Liang] a better transformSchema() implementation
51eb9e7 [Yanbo Liang] Add an SQL node as a feature transformer

8cad854e

Fix comment error · bce72797

Jeff Zhang authored 9 years ago

API is updated but its doc comment is not updated.

Author: Jeff Zhang <zjffdu@apache.org>

Closes #8097 from zjffdu/dev.

bce72797

[SPARK-9785] [SQL] HashPartitioning compatibility should consider expression ordering · dfe347d2

Josh Rosen authored 9 years ago

HashPartitioning compatibility is currently defined w.r.t the _set_ of expressions, but the ordering of those expressions matters when computing hash codes; this could lead to incorrect answers if we mistakenly avoided a shuffle based on the assumption that HashPartitionings with the same expressions in different orders will produce equivalent row hashcodes. The first commit adds a regression test which illustrates this problem.

The fix for this is simple: make `HashPartitioning.compatibleWith` and `HashPartitioning.guarantees` sensitive to the expression ordering (i.e. do not perform set comparison).

Author: Josh Rosen <joshrosen@databricks.com>

Closes #8074 from JoshRosen/hashpartitioning-compatiblewith-fixes and squashes the following commits:

b61412f [Josh Rosen] Demonstrate that I haven't cheated in my fix
0b4d7d9 [Josh Rosen] Update so that clusteringSet is only used in satisfies().
dc9c9d7 [Josh Rosen] Add failing regression test for SPARK-9785

dfe347d2

[SPARK-9815] Rename PlatformDependent.UNSAFE -> Platform. · d378396f

Reynold Xin authored 9 years ago

PlatformDependent.UNSAFE is way too verbose.

Author: Reynold Xin <rxin@databricks.com>

Closes #8094 from rxin/SPARK-9815 and squashes the following commits:

229b603 [Reynold Xin] [SPARK-9815] Rename PlatformDependent.UNSAFE -> Platform.

d378396f

[SPARK-9727] [STREAMING] [BUILD] Updated streaming kinesis SBT project name to be more consistent · 600031eb

Tathagata Das authored 9 years ago

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #8092 from tdas/SPARK-9727 and squashes the following commits:

b1b01fd [Tathagata Das] Updated streaming kinesis project name

600031eb

[SPARK-9810] [BUILD] Remove individual commit messages from the squash commit message · 55752d88

Reynold Xin authored 9 years ago

For more information, please see the JIRA ticket and the associated dev list discussion.

https://issues.apache.org/jira/browse/SPARK-9810

http://apache-spark-developers-list.1001551.n3.nabble.com/discuss-Removing-individual-commit-messages-from-the-squash-commit-message-td13295.html

Author: Reynold Xin <rxin@databricks.com>

Closes #8091 from rxin/SPARK-9810.

55752d88

[SPARK-9640] [STREAMING] [TEST] Do not run Python Kinesis tests when the... · 0f90d605

Tathagata Das authored 9 years ago

[SPARK-9640] [STREAMING] [TEST] Do not run Python Kinesis tests when the Kinesis assembly JAR has not been generated

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #7961 from tdas/SPARK-9640 and squashes the following commits:

974ce19 [Tathagata Das] Undo changes related to SPARK-9727
004ae26 [Tathagata Das] style fixes
9bbb97d [Tathagata Das] Minor style fies
e6a677e [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-9640
ca90719 [Tathagata Das] Removed extra line
ba9cfc7 [Tathagata Das] Improved kinesis test selection logic
88d59bd [Tathagata Das] updated test modules
871fcc8 [Tathagata Das] Fixed SparkBuild
94be631 [Tathagata Das] Fixed style
b858196 [Tathagata Das] Fixed conditions and few other things based on PR comments.
e292e64 [Tathagata Das] Added filters for Kinesis python tests

0f90d605

[SPARK-9729] [SPARK-9363] [SQL] Use sort merge join for left and right outer join · 91e9389f

Josh Rosen authored 9 years ago

This patch adds a new `SortMergeOuterJoin` operator that performs left and right outer joins using sort merge join.  It also refactors `SortMergeJoin` in order to improve performance and code clarity.

Along the way, I also performed a couple pieces of minor cleanup and optimization:

- Rename the `HashJoin` physical planner rule to `EquiJoinSelection`, since it's also used for non-hash joins.
- Rewrite the comment at the top of `HashJoin` to better explain the precedence for choosing join operators.
- Update `JoinSuite` to use `SqlTestUtils.withConf` for changing SQLConf settings.

This patch incorporates several ideas from adrian-wang's patch, #5717.

Closes #5717.

<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/7904)
<!-- Reviewable:end -->

Author: Josh Rosen <joshrosen@databricks.com>
Author: Daoyuan Wang <daoyuan.wang@intel.com>

Closes #7904 from JoshRosen/outer-join-smj and squashes 1 commits.

91e9389f

Aug 10, 2015

[SPARK-9340] [SQL] Fixes converting unannotated Parquet lists · 071bbad5

Damian Guy authored 9 years ago

This PR is inspired by #8063 authored by dguy. Especially, testing Parquet files added here are all taken from that PR.

**Committer who merges this PR should attribute it to "Damian Guy <damian.guygmail.com>".**

----

SPARK-6776 and SPARK-6777 followed `parquet-avro` to implement backwards-compatibility rules defined in `parquet-format` spec. However, both Spark SQL and `parquet-avro` neglected the following statement in `parquet-format`:

> This does not affect repeated fields that are not annotated: A repeated field that is neither contained by a `LIST`- or `MAP`-annotated group nor annotated by `LIST` or `MAP` should be interpreted as a required list of required elements where the element type is the type of the field.

One of the consequences is that, Parquet files generated by `parquet-protobuf` containing unannotated repeated fields are not correctly converted to Catalyst arrays.

This PR fixes this issue by

1. Handling unannotated repeated fields in `CatalystSchemaConverter`.
2. Converting this kind of special repeated fields to Catalyst arrays in `CatalystRowConverter`.

Two special converters, `RepeatedPrimitiveConverter` and `RepeatedGroupConverter`, are added. They delegate actual conversion work to a child `elementConverter` and accumulates elements in an `ArrayBuffer`.

Two extra methods, `start()` and `end()`, are added to `ParentContainerUpdater`. So that they can be used to initialize new `ArrayBuffer`s for unannotated repeated fields, and propagate converted array values to upstream.

Author: Cheng Lian <lian@databricks.com>

Closes #8070 from liancheng/spark-9340/unannotated-parquet-list and squashes the following commits:

ace6df7 [Cheng Lian] Moves ParquetProtobufCompatibilitySuite
f1c7bfd [Cheng Lian] Updates .rat-excludes
420ad2b [Cheng Lian] Fixes converting unannotated Parquet lists

071bbad5

[SPARK-9801] [STREAMING] Check if file exists before deleting temporary files. · 3c9802d9

Hao Zhu authored 9 years ago

Spark streaming deletes the temp file and backup files without checking if they exist or not

Author: Hao Zhu <viadeazhu@gmail.com>

Closes #8082 from viadea/master and squashes the following commits:

242d05f [Hao Zhu] [SPARK-9801][Streaming]No need to check the existence of those files
fd143f2 [Hao Zhu] [SPARK-9801][Streaming]Check if backupFile exists before deleting backupFile files.
087daf0 [Hao Zhu] SPARK-9801

3c9802d9

[SPARK-5155] [PYSPARK] [STREAMING] Mqtt streaming support in Python · 853809e9

Prabeesh K authored 9 years ago

This PR is based on #4229, thanks prabeesh.

Closes #4229

Author: Prabeesh K <prabsmails@gmail.com>
Author: zsxwing <zsxwing@gmail.com>
Author: prabs <prabsmails@gmail.com>
Author: Prabeesh K <prabeesh.k@namshi.com>

Closes #7833 from zsxwing/pr4229 and squashes the following commits:

9570bec [zsxwing] Fix the variable name and check null in finally
4a9c79e [zsxwing] Fix pom.xml indentation
abf5f18 [zsxwing] Merge branch 'master' into pr4229
935615c [zsxwing] Fix the flaky MQTT tests
47278c5 [zsxwing] Include the project class files
478f844 [zsxwing] Add unpack
5f8a1d4 [zsxwing] Make the maven build generate the test jar for Python MQTT tests
734db99 [zsxwing] Merge branch 'master' into pr4229
126608a [Prabeesh K] address the comments
b90b709 [Prabeesh K] Merge pull request #1 from zsxwing/pr4229
d07f454 [zsxwing] Register StreamingListerner before starting StreamingContext; Revert unncessary changes; fix the python unit test
a6747cb [Prabeesh K] wait for starting the receiver before publishing data
87fc677 [Prabeesh K] address the comments:
97244ec [zsxwing] Make sbt build the assembly test jar for streaming mqtt
80474d1 [Prabeesh K] fix
1f0cfe9 [Prabeesh K] python style fix
e1ee016 [Prabeesh K] scala style fix
a5a8f9f [Prabeesh K] added Python test
9767d82 [Prabeesh K] implemented Python-friendly class
a11968b [Prabeesh K] fixed python style
795ec27 [Prabeesh K] address comments
ee387ae [Prabeesh K] Fix assembly jar location of mqtt-assembly
3f4df12 [Prabeesh K] updated version
b34c3c1 [prabs] adress comments
3aa7fff [prabs] Added Python streaming mqtt word count example
b7d42ff [prabs] Mqtt streaming support in Python

853809e9

[SPARK-9759] [SQL] improve decimal.times() and cast(int, decimalType) · c4fd2a24

Davies Liu authored 9 years ago

This patch optimize two things:

1. passing MathContext to JavaBigDecimal.multiply/divide/reminder to do right rounding, because java.math.BigDecimal.apply(MathContext) is expensive

2. Cast integer/short/byte to decimal directly (without double)

This two optimizations could speed up the end-to-end time of a aggregation (SUM(short * decimal(5, 2)) 75% (from 19s -> 10.8s)

Author: Davies Liu <davies@databricks.com>

Closes #8052 from davies/optimize_decimal and squashes the following commits:

225efad [Davies Liu] improve decimal.times() and cast(int, decimalType)

c4fd2a24

[SPARK-9620] [SQL] generated UnsafeProjection should support many columns or large exressions · fe2fb7fb

Davies Liu authored 9 years ago

Currently, generated UnsafeProjection can reach 64k byte code limit of Java. This patch will split the generated expressions into multiple functions, to avoid the limitation.

After this patch, we can work well with table that have up to 64k columns (hit max number of constants limit in Java), it should be enough in practice.

cc rxin

Author: Davies Liu <davies@databricks.com>

Closes #8044 from davies/wider_table and squashes the following commits:

9192e6c [Davies Liu] fix generated safe projection
d1ef81a [Davies Liu] fix failed tests
737b3d3 [Davies Liu] Merge branch 'master' of github.com:apache/spark into wider_table
ffcd132 [Davies Liu] address comments
1b95be4 [Davies Liu] put the generated class into sql package
77ed72d [Davies Liu] address comments
4518e17 [Davies Liu] Merge branch 'master' of github.com:apache/spark into wider_table
75ccd01 [Davies Liu] Merge branch 'master' of github.com:apache/spark into wider_table
495e932 [Davies Liu] support wider table with more than 1k columns for generated projections

fe2fb7fb

[SPARK-9763][SQL] Minimize exposure of internal SQL classes. · 40ed2af5

Reynold Xin authored 9 years ago

There are a few changes in this pull request:

1. Moved all data sources to execution.datasources, except the public JDBC APIs.
2. In order to maintain backward compatibility from 1, added a backward compatibility translation map in data source resolution.
3. Moved ui and metric package into execution.
4. Added more documentation on some internal classes.
5. Renamed DataSourceRegister.format -> shortName.
6. Added "override" modifier on shortName.
7. Removed IntSQLMetric.

Author: Reynold Xin <rxin@databricks.com>

Closes #8056 from rxin/SPARK-9763 and squashes the following commits:

9df4801 [Reynold Xin] Removed hardcoded name in test cases.
d9babc6 [Reynold Xin] Shorten.
e484419 [Reynold Xin] Removed VisibleForTesting.
171b812 [Reynold Xin] MimaExcludes.
2041389 [Reynold Xin] Compile ...
79dda42 [Reynold Xin] Compile.
0818ba3 [Reynold Xin] Removed IntSQLMetric.
c46884f [Reynold Xin] Two more fixes.
f9aa88d [Reynold Xin] [SPARK-9763][SQL] Minimize exposure of internal SQL classes.

40ed2af5

[SPARK-9784] [SQL] Exchange.isUnsafe should check whether codegen and unsafe are enabled · 0fe66744

Josh Rosen authored 9 years ago

Exchange.isUnsafe should check whether codegen and unsafe are enabled.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #8073 from JoshRosen/SPARK-9784 and squashes the following commits:

7a1019f [Josh Rosen] [SPARK-9784] Exchange.isUnsafe should check whether codegen and unsafe are enabled

0fe66744

Fixed AtmoicReference<> Example · d2852127

Mahmoud Lababidi authored 9 years ago

Author: Mahmoud Lababidi <lababidi@gmail.com>

Closes #8076 from lababidi/master and squashes the following commits:

af4553b [Mahmoud Lababidi] Fixed AtmoicReference<> Example

d2852127

[SPARK-9755] [MLLIB] Add docs to MultivariateOnlineSummarizer methods · 00b655cc

Feynman Liang authored 9 years ago

Adds method documentations back to `MultivariateOnlineSummarizer`, which were present in 1.4 but disappeared somewhere along the way to 1.5.

jkbradley

Author: Feynman Liang <fliang@databricks.com>

Closes #8045 from feynmanliang/SPARK-9755 and squashes the following commits:

af67fde [Feynman Liang] Add MultivariateOnlineSummarizer docs

00b655cc

[SPARK-9710] [TEST] Fix RPackageUtilsSuite when R is not available. · 0f3366a4

Marcelo Vanzin authored 9 years ago

RUtils.isRInstalled throws an exception if R is not installed,
instead of returning false. Fix that.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #8008 from vanzin/SPARK-9710 and squashes the following commits:

df72d8c [Marcelo Vanzin] [SPARK-9710] [test] Fix RPackageUtilsSuite when R is not available.

0f3366a4

[SPARK-9743] [SQL] Fixes JSONRelation refreshing · e3fef0f9

Cheng Lian authored 9 years ago

PR #7696 added two `HadoopFsRelation.refresh()` calls ([this] [1], and [this] [2]) in `DataSourceStrategy` to make test case `InsertSuite.save directly to the path of a JSON table` pass. However, this forces every `HadoopFsRelation` table scan to do a refresh, which can be super expensive for tables with large number of partitions.

The reason why the original test case fails without the `refresh()` calls is that, the old JSON relation builds the base RDD with the input paths, while `HadoopFsRelation` provides `FileStatus`es of leaf files. With the old JSON relation, we can create a temporary table based on a path, writing data to that, and then read newly written data without refreshing the table. This is no long true for `HadoopFsRelation`.

This PR removes those two expensive refresh calls, and moves the refresh into `JSONRelation` to fix this issue. We might want to update `HadoopFsRelation` interface to provide better support for this use case.

[1]: https://github.com/apache/spark/blob/ebfd91c542aaead343cb154277fcf9114382fee7/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L63
[2]: https://github.com/apache/spark/blob/ebfd91c542aaead343cb154277fcf9114382fee7/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L91

Author: Cheng Lian <lian@databricks.com>

Closes #8035 from liancheng/spark-9743/fix-json-relation-refreshing and squashes the following commits:

ec1957d [Cheng Lian] Fixes JSONRelation refreshing

e3fef0f9

[SPARK-9777] [SQL] Window operator can accept UnsafeRows · be80def0

Yin Huai authored 9 years ago

https://issues.apache.org/jira/browse/SPARK-9777

Author: Yin Huai <yhuai@databricks.com>

Closes #8064 from yhuai/windowUnsafe and squashes the following commits:

8fb3537 [Yin Huai] Set canProcessUnsafeRows to true.

be80def0

Aug 09, 2015

[CORE] [SPARK-9760] Use Option instead of Some for Ivy repos · 46025616

Shivaram Venkataraman authored 9 years ago

This was introduced in #7599

cc rxin brkyvz

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #8055 from shivaram/spark-packages-repo-fix and squashes the following commits:

890f306 [Shivaram Venkataraman] Remove test case
51d69ee [Shivaram Venkataraman] Add test case for --packages without --repository
c02e0b4 [Shivaram Venkataraman] Use Option instead of Some for Ivy repos

46025616

[SPARK-9703] [SQL] Refactor EnsureRequirements to avoid certain unnecessary shuffles · 23cf5af0

Josh Rosen authored 9 years ago

This pull request refactors the `EnsureRequirements` planning rule in order to avoid the addition of certain unnecessary shuffles.

As an example of how unnecessary shuffles can occur, consider SortMergeJoin, which requires clustered distribution and sorted ordering of its children's input rows. Say that both of SMJ's children produce unsorted output but are both SinglePartition. In this case, we will need to inject sort operators but should not need to inject Exchanges. Unfortunately, it looks like the EnsureRequirements unnecessarily repartitions using a hash partitioning.

This patch solves this problem by refactoring `EnsureRequirements` to properly implement the `compatibleWith` checks that were broken in earlier implementations. See the significant inline comments for a better description of how this works. The majority of this PR is new comments and test cases, with few actual changes to the code.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #7988 from JoshRosen/exchange-fixes and squashes the following commits:

38006e7 [Josh Rosen] Rewrite EnsureRequirements _yet again_ to make things even simpler
0983f75 [Josh Rosen] More guarantees vs. compatibleWith cleanup; delete BroadcastPartitioning.
8784bd9 [Josh Rosen] Giant comment explaining compatibleWith vs. guarantees
1307c50 [Josh Rosen] Update conditions for requiring child compatibility.
18cddeb [Josh Rosen] Rename DummyPlan to DummySparkPlan.
2c7e126 [Josh Rosen] Merge remote-tracking branch 'origin/master' into exchange-fixes
fee65c4 [Josh Rosen] Further refinement to comments / reasoning
642b0bb [Josh Rosen] Further expand comment / reasoning
06aba0c [Josh Rosen] Add more comments
8dbc845 [Josh Rosen] Add even more tests.
4f08278 [Josh Rosen] Fix the test by adding the compatibility check to EnsureRequirements
a1c12b9 [Josh Rosen] Add failing test to demonstrate allCompatible bug
0725a34 [Josh Rosen] Small assertion cleanup.
5172ac5 [Josh Rosen] Add test for requiresChildrenToProduceSameNumberOfPartitions.
2e0f33a [Josh Rosen] Write a more generic test for EnsureRequirements.
752b8de [Josh Rosen] style fix
c628daf [Josh Rosen] Revert accidental ExchangeSuite change.
c9fb231 [Josh Rosen] Rewrite exchange to fix better handle this case.
adcc742 [Josh Rosen] Move test to PlannerSuite.
0675956 [Josh Rosen] Preserving ordering and partitioning in row format converters also does not help.
cc5669c [Josh Rosen] Adding outputPartitioning to Repartition does not fix the test.
2dfc648 [Josh Rosen] Add failing test illustrating bad exchange planning.

23cf5af0

Disable JobGeneratorSuite "Do not clear received block data too soon". · a863348f
Reynold Xin authored 9 years ago

a863348f

[SPARK-9737] [YARN] Add the suggested configuration when required executor... · 86fa4ba6

Yadong Qi authored 9 years ago

[SPARK-9737] [YARN] Add the suggested configuration when required executor memory is above the max threshold of this cluster on YARN mode

Author: Yadong Qi <qiyadong2010@gmail.com>

Closes #8028 from watermen/SPARK-9737 and squashes the following commits:

48bdf3d [Yadong Qi] Add suggested configuration.

86fa4ba6

[SPARK-8930] [SQL] Throw a AnalysisException with meaningful messages if... · 68ccc6e1

Yijie Shen authored 9 years ago

[SPARK-8930] [SQL] Throw a AnalysisException with meaningful messages if DataFrame#explode takes a star in expressions

Author: Yijie Shen <henry.yijieshen@gmail.com>

Closes #8057 from yjshen/explode_star and squashes the following commits:

eae181d [Yijie Shen] change explaination message
54c9d11 [Yijie Shen] meaning message for * in explode

68ccc6e1

[SPARK-9752][SQL] Support UnsafeRow in Sample operator. · e9c36938

Reynold Xin authored 9 years ago

In order for this to work, I had to disable gap sampling.

Author: Reynold Xin <rxin@databricks.com>

Closes #8040 from rxin/SPARK-9752 and squashes the following commits:

f9e248c [Reynold Xin] Fix the test case for real this time.
adbccb3 [Reynold Xin] Fixed test case.
589fb23 [Reynold Xin] Merge branch 'SPARK-9752' of github.com:rxin/spark into SPARK-9752
55ccddc [Reynold Xin] Fixed core test.
78fa895 [Reynold Xin] [SPARK-9752][SQL] Support UnsafeRow in Sample operator.
c9e7112 [Reynold Xin] [SPARK-9752][SQL] Support UnsafeRow in Sample operator.

e9c36938