Commits · ff9ae61a3b7bbbfc2aac93a99c05a9e1ea9c08bc · cs525-sp18-g07 / spark

Apr 14, 2016

[SPARK-14601][DOC] Minor doc/usage changes related to removal of Spark assembly · ff9ae61a

Mark Grover authored 8 years ago

## What changes were proposed in this pull request?

Removing references to assembly jar in documentation.
Adding an additional (previously undocumented) usage of spark-submit to run examples.

## How was this patch tested?

Ran spark-submit usage to ensure formatting was fine. Ran examples using SparkSubmit.

Author: Mark Grover <mark@apache.org>

Closes #12365 from markgrover/spark-14601.

ff9ae61a

[SPARK-12869] Implemented an improved version of the toIndexedRowMatrix · c80586d9

Fokko Driesprong authored 8 years ago

Hi guys,

I've implemented an improved version of the `toIndexedRowMatrix` function on the `BlockMatrix`. I needed this for a project, but would like to share it with the rest of the community. In the case of dense matrices, it can increase performance up to 19 times:
https://github.com/Fokko/BlockMatrixToIndexedRowMatrix

If there are any questions or suggestions, please let me know. Keep up the good work! Cheers.

Author: Fokko Driesprong <f.driesprong@catawiki.nl>
Author: Fokko Driesprong <fokko@driesprongen.nl>

Closes #10839 from Fokko/master.

c80586d9

[SPARK-14565][ML] RandomForest should use parseInt and parseDouble for feature... · 01dd1f5c

Yong Tang authored 8 years ago

[SPARK-14565][ML] RandomForest should use parseInt and parseDouble for feature subset size instead of regexes

## What changes were proposed in this pull request?

This fix tries to change RandomForest's supported strategies from using regexes to using parseInt and
parseDouble, for the purpose of robustness and maintainability.

## How was this patch tested?

Existing tests passed.

Author: Yong Tang <yong.tang.github@outlook.com>

Closes #12360 from yongtang/SPARK-14565.

01dd1f5c

[SPARK-14545][SQL] Improve `LikeSimplification` by adding `a%b` rule · d7e124ed

Dongjoon Hyun authored 8 years ago

## What changes were proposed in this pull request?

Current `LikeSimplification` handles the following four rules.
- 'a%' => expr.StartsWith("a")
- '%b' => expr.EndsWith("b")
- '%a%' => expr.Contains("a")
- 'a' => EqualTo("a")

This PR adds the following rule.
- 'a%b' => expr.Length() >= 2 && expr.StartsWith("a") && expr.EndsWith("b")

Here, 2 is statically calculated from "a".size + "b".size.

**Before**
```
scala> sql("select a from (select explode(array('abc','adc')) a) T where a like 'a%c'").explain()
== Physical Plan ==
WholeStageCodegen
:  +- Filter a#5 LIKE a%c
:     +- INPUT
+- Generate explode([abc,adc]), false, false, [a#5]
   +- Scan OneRowRelation[]
```

**After**
```
scala> sql("select a from (select explode(array('abc','adc')) a) T where a like 'a%c'").explain()
== Physical Plan ==
WholeStageCodegen
:  +- Filter ((length(a#5) >= 2) && (StartsWith(a#5, a) && EndsWith(a#5, c)))
:     +- INPUT
+- Generate explode([abc,adc]), false, false, [a#5]
   +- Scan OneRowRelation[]
```

## How was this patch tested?

Pass the Jenkins tests (including new testcase).

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #12312 from dongjoon-hyun/SPARK-14545.

d7e124ed

[SPARK-14238][ML][MLLIB][PYSPARK] Add binary toggle Param to PySpark HashingTF in ML & MLlib · bc748b7b

Yong Tang authored 8 years ago

## What changes were proposed in this pull request?

This fix tries to add binary toggle Param to PySpark HashingTF in ML & MLlib. If this toggle is set, then all non-zero counts will be set to 1.

Note: This fix (SPARK-14238) is extended from SPARK-13963 where Scala implementation was done.

## How was this patch tested?

This fix adds two tests to cover the code changes. One for HashingTF in PySpark's ML and one for HashingTF in PySpark's MLLib.

Author: Yong Tang <yong.tang.github@outlook.com>

Closes #12079 from yongtang/SPARK-14238.

bc748b7b

[SPARK-14618][ML][DOC] Updated RegressionEvaluator.metricName param doc · bf65c87f

Joseph K. Bradley authored 8 years ago

## What changes were proposed in this pull request?

In Spark 1.4, we negated some metrics from RegressionEvaluator since CrossValidator always maximized metrics. This was fixed in 1.5, but the docs were not updated. This PR updates the docs.

## How was this patch tested?

no tests

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #12377 from jkbradley/regeval-doc.

bf65c87f

[SPARK-13967][PYSPARK][ML] Added binary Param to Python CountVectorizer · c5172f82

Bryan Cutler authored 8 years ago

Added binary toggle param to CountVectorizer feature transformer in PySpark.

Created a unit test for using CountVectorizer with the binary toggle on.

Author: Bryan Cutler <cutlerb@gmail.com>

Closes #12308 from BryanCutler/binary-param-python-CountVectorizer-SPARK-13967.

c5172f82

[SPARK-14592][SQL] Native support for CREATE TABLE LIKE DDL command · 28efdd3f

Liang-Chi Hsieh authored 8 years ago

## What changes were proposed in this pull request?
JIRA: https://issues.apache.org/jira/browse/SPARK-14592

This patch adds native support for DDL command `CREATE TABLE LIKE`.

The SQL syntax is like:

    CREATE TABLE table_name LIKE existing_table
    CREATE TABLE IF NOT EXISTS table_name LIKE existing_table

## How was this patch tested?
`HiveDDLCommandSuite`. `HiveQuerySuite` already tests `CREATE TABLE LIKE`.

Author: Liang-Chi Hsieh <simonh@tw.ibm.com>

This patch had conflicts when merged, resolved by
Committer: Andrew Or <andrew@databricks.com>

Closes #12362 from viirya/create-table-like.

28efdd3f

[SPARK-14499][SQL][TEST] Drop Partition Does Not Delete Data of External Tables · c971aee4

gatorsmile authored 8 years ago

#### What changes were proposed in this pull request?
This PR is to add a test to ensure drop partitions of an external table will not delete data.

cc yhuai andrewor14

#### How was this patch tested?
N/A

Author: gatorsmile <gatorsmile@gmail.com>

This patch had conflicts when merged, resolved by
Committer: Andrew Or <andrew@databricks.com>

Closes #12350 from gatorsmile/testDropPartition.

c971aee4

[SPARK-14558][CORE] In ClosureCleaner, clean the outer pointer if it's a REPL line object · 1d04c86f

Wenchen Fan authored 8 years ago

## What changes were proposed in this pull request?

When we clean a closure, if its outermost parent is not a closure, we won't clone and clean it as cloning user's objects is dangerous. However, if it's a REPL line object, which may carry a lot of unnecessary references(like hadoop conf, spark conf, etc.), we should clean it as it's not a user object.

This PR improves the check for user's objects to exclude REPL line object.

## How was this patch tested?

existing tests.

Author: Wenchen Fan <wenchen@databricks.com>

Closes #12327 from cloud-fan/closure.

1d04c86f

[SPARK-14617] Remove deprecated APIs in TaskMetrics · a46f98d3

Reynold Xin authored 8 years ago

## What changes were proposed in this pull request?
This patch removes some of the deprecated APIs in TaskMetrics. This is part of my bigger effort to simplify accumulators and task metrics.

## How was this patch tested?
N/A - only removals

Author: Reynold Xin <rxin@databricks.com>

Closes #12375 from rxin/SPARK-14617.

a46f98d3

[SPARK-14619] Track internal accumulators (metrics) by stage attempt · dac40b68

Reynold Xin authored 8 years ago

## What changes were proposed in this pull request?
When there are multiple attempts for a stage, we currently only reset internal accumulator values if all the tasks are resubmitted. It would make more sense to reset the accumulator values for each stage attempt. This will allow us to eventually get rid of the internal flag in the Accumulator class. This is part of my bigger effort to simplify accumulators and task metrics.

## How was this patch tested?
Covered by existing tests.

Author: Reynold Xin <rxin@databricks.com>

Closes #12378 from rxin/SPARK-14619.

dac40b68

[SPARK-14612][ML] Consolidate the version of dependencies in mllib and mllib-local into one place · 9fa43a33

Sean Owen authored 8 years ago

## What changes were proposed in this pull request?

Move json4s, breeze dependency declaration into parent

## How was this patch tested?

Should be no functional change, but Jenkins tests will test that.

Author: Sean Owen <sowen@cloudera.com>

Closes #12390 from srowen/SPARK-14612.

9fa43a33

[SPARK-14630][BUILD][CORE][SQL][STREAMING] Code style: public abstract methods... · 3e27940a

Liwei Lin authored 8 years ago

[SPARK-14630][BUILD][CORE][SQL][STREAMING] Code style: public abstract methods should have explicit return types

## What changes were proposed in this pull request?

Currently many public abstract methods (in abstract classes as well as traits) don't declare return types explicitly, such as in [o.a.s.streaming.dstream.InputDStream](https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/InputDStream.scala#L110):
```scala
def start() // should be: def start(): Unit
def stop()  // should be: def stop(): Unit
```

These methods exist in core, sql, streaming; this PR fixes them.

## How was this patch tested?

N/A

## Which piece of scala style rule led to the changes?

the rule was added separately in https://github.com/apache/spark/pull/12396

Author: Liwei Lin <lwlin7@gmail.com>

Closes #12389 from lw-lin/public-abstract-methods.

3e27940a

[SPARK-14625] TaskUIData and ExecutorUIData shouldn't be case classes · de2ad528

Reynold Xin authored 8 years ago

## What changes were proposed in this pull request?
I was trying to understand the accumulator and metrics update source code and these two classes don't really need to be case classes. It would also be more consistent with other UI classes if they are not case classes. This is part of my bigger effort to simplify accumulators and task metrics.

## How was this patch tested?
This is a straightforward refactoring without behavior change.

Author: Reynold Xin <rxin@databricks.com>

Closes #12386 from rxin/SPARK-14625.

de2ad528

[SPARK-14125][SQL] Native DDL Support: Alter View · 0d22092c

gatorsmile authored 8 years ago

#### What changes were proposed in this pull request?
This PR is to provide a native DDL support for the following three Alter View commands:

Based on the Hive DDL document:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
##### 1. ALTER VIEW RENAME
**Syntax:**
```SQL
ALTER VIEW view_name RENAME TO new_view_name
```
- to change the name of a view to a different name
- not allowed to rename a view's name by ALTER TABLE

##### 2. ALTER VIEW SET TBLPROPERTIES
**Syntax:**
```SQL
ALTER VIEW view_name SET TBLPROPERTIES ('comment' = new_comment);
```
- to add metadata to a view
- not allowed to set views' properties by ALTER TABLE
- ignore it if trying to set a view's existing property key when the value is the same
- overwrite the value if trying to set a view's existing key to a different value

##### 3. ALTER VIEW UNSET TBLPROPERTIES
**Syntax:**
```SQL
ALTER VIEW view_name UNSET TBLPROPERTIES [IF EXISTS] ('comment', 'key')
```
- to remove metadata from a view
- not allowed to unset views' properties by ALTER TABLE
- issue an exception if trying to unset a view's non-existent key

#### How was this patch tested?
Added test cases to verify if it works properly.

Author: gatorsmile <gatorsmile@gmail.com>
Author: xiaoli <lixiao1983@gmail.com>
Author: Xiao Li <xiaoli@Xiaos-MacBook-Pro.local>

Closes #12324 from gatorsmile/alterView.

0d22092c

[SPARK-14572][DOC] Update config docs to allow -Xms in extraJavaOptions · f83ba454

Dhruve Ashar authored 8 years ago

## What changes were proposed in this pull request?
The configuration docs are updated to reflect the changes introduced with [SPARK-12384](https://issues.apache.org/jira/browse/SPARK-12384). This allows the user to specify initial heap memory settings through the extraJavaOptions for executor, driver and am.

## How was this patch tested?
The changes are tested in [SPARK-12384](https://issues.apache.org/jira/browse/SPARK-12384). This is just documenting the changes made.

Author: Dhruve Ashar <dhruveashar@gmail.com>

Closes #12333 from dhruve/doc/SPARK-14572.

f83ba454

[SPARK-14518][SQL] Support Comment in CREATE VIEW · 3cf3db17

gatorsmile authored 8 years ago

#### What changes were proposed in this pull request?
**HQL Syntax**: [Create View](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/AlterView
)
```SQL
CREATE VIEW [IF NOT EXISTS] [db_name.]view_name [(column_name [COMMENT column_comment], ...) ]
  [COMMENT view_comment]
  [TBLPROPERTIES (property_name = property_value, ...)]
  AS SELECT ...;
```
Add a support for the `[COMMENT view_comment]` clause

#### How was this patch tested?
Modified the existing test cases to verify the correctness.

Author: gatorsmile <gatorsmile@gmail.com>
Author: xiaoli <lixiao1983@gmail.com>
Author: Xiao Li <xiaoli@Xiaos-MacBook-Pro.local>

Closes #12288 from gatorsmile/addCommentInCreateView.

3cf3db17

[MINOR][SQL] Remove extra anonymous closure within functional transformations · 6fc3dc88

hyukjinkwon authored 8 years ago

## What changes were proposed in this pull request?

This PR removes extra anonymous closure within functional transformations.

For example,

```scala
.map(item => {
  ...
})
```

which can be just simply as below:

```scala
.map { item =>
  ...
}
```

## How was this patch tested?

Related unit tests and `sbt scalastyle`.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #12382 from HyukjinKwon/minor-extra-closers.

6fc3dc88

[SPARK-14573][PYSPARK][BUILD] Fix PyDoc Makefile & highlighting issues · 478af2f4

Holden Karau authored 8 years ago

## What changes were proposed in this pull request?

The PyDoc Makefile used "=" rather than "?=" for setting env variables so it overwrote the user values. This ignored the environment variables we set for linting allowing warnings through. This PR also fixes the warnings that had been introduced.

## How was this patch tested?

manual local export & make

Author: Holden Karau <holden@us.ibm.com>

Closes #12336 from holdenk/SPARK-14573-fix-pydoc-makefile.

478af2f4

[SPARK-14596][SQL] Remove not used SqlNewHadoopRDD and some more unused imports · b4819404

hyukjinkwon authored 8 years ago

## What changes were proposed in this pull request?

Old `HadoopFsRelation` API includes `buildInternalScan()` which uses `SqlNewHadoopRDD` in `ParquetRelation`.
Because now the old API is removed, `SqlNewHadoopRDD` is not used anymore.

So, this PR removes `SqlNewHadoopRDD` and several unused imports.

This was discussed in https://github.com/apache/spark/pull/12326.

## How was this patch tested?

Several related existing unit tests and `sbt scalastyle`.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #12354 from HyukjinKwon/SPARK-14596.

b4819404

Apr 13, 2016

[SPARK-14607] [SPARK-14484] [SQL] fix case-insensitive predicates in FileSourceStrategy · 62b7f306

Davies Liu authored 8 years ago

## What changes were proposed in this pull request?

When prune the partitions or push down predicates, case-sensitivity is not respected. In order to make it work with case-insensitive, this PR update the AttributeReference inside predicate to use the name from schema.

## How was this patch tested?

Add regression tests for case-insensitive.

Author: Davies Liu <davies@databricks.com>

Closes #12371 from davies/case_insensi.

62b7f306

[SPARK-14472][PYSPARK][ML] Cleanup ML JavaWrapper and related class hierarchy · fc3cd2f5

Bryan Cutler authored 8 years ago

Currently, JavaWrapper is only a wrapper class for pipeline classes that have Params and JavaCallable is a separate mixin that provides methods to make Java calls. This change simplifies the class structure and to define the Java wrapper in a plain base class along with methods to make Java calls. Also, renames Java wrapper classes to better reflect their purpose.

Ran existing Python ml tests and generated documentation to test this change.

Author: Bryan Cutler <cutlerb@gmail.com>

Closes #12304 from BryanCutler/pyspark-cleanup-JavaWrapper-SPARK-14472.

fc3cd2f5

[SPARK-13089][ML] [Doc] spark.ml Naive Bayes user guide and examples · 781df499

Yuhao Yang authored 8 years ago

jira: https://issues.apache.org/jira/browse/SPARK-13089

Add section in ml-classification.md for NaiveBayes DataFrame-based API, plus example code (using include_example to clip code from examples/ folder files).

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #11015 from hhbyyh/naiveBayesDoc.

781df499

[SPARK-14509][DOC] Add python CountVectorizerExample · fcdd6926

Zheng RuiFeng authored 8 years ago

## What changes were proposed in this pull request?
Add python CountVectorizerExample

## How was this patch tested?
manual tests

Author: Zheng RuiFeng <ruifengz@foxmail.com>

Closes #11917 from zhengruifeng/cv_pe.

fcdd6926

[SPARK-14375][ML] Unit test for spark.ml KMeansSummary · a91aaf5a

Yanbo Liang authored 8 years ago

## What changes were proposed in this pull request?
* Modify ```KMeansSummary.clusterSizes``` method to make it robust to empty clusters.
* Add unit test for spark.ml ```KMeansSummary```.
* Add Since tag.

## How was this patch tested?
unit tests.

cc jkbradley

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #12254 from yanboliang/spark-14375.

a91aaf5a

[SPARK-14461][ML] GLM training summaries should provide solver · 0d17593b

Yanbo Liang authored 8 years ago

## What changes were proposed in this pull request?
GLM training summaries should provide solver.

## How was this patch tested?
Unit tests.

cc jkbradley

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #12253 from yanboliang/spark-14461.

0d17593b

[SPARK-10386][MLLIB] PrefixSpanModel supports save/load · b0adb9f5

Yanbo Liang authored 8 years ago

```PrefixSpanModel``` supports ```save/load```. It's similar with #9267.

cc jkbradley

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10664 from yanboliang/spark-10386.

b0adb9f5

[SPARK-14581] [SQL] push predicatese through more logical plans · dbbe1490

Davies Liu authored 8 years ago

## What changes were proposed in this pull request?

Right now, filter push down only works with Project, Aggregate, Generate and Join, they can't be pushed through many other plans.

This PR added support for Union, Intersect, Except and all unary plans.

## How was this patch tested?

Added tests.

Author: Davies Liu <davies@databricks.com>

Closes #12342 from davies/filter_hint.

dbbe1490

[SPARK-13783][ML] Model export/import for spark.ml: GBTs · f9d578ea

Yanbo Liang authored 8 years ago

## What changes were proposed in this pull request?
* Added save/load for ```GBTClassifier/GBTClassificationModel/GBTRegressor/GBTRegressionModel```.
* Meanwhile, I modified ```EnsembleModelReadWrite.saveImpl/loadImpl``` to support save/load ```treeWeights```.

## How was this patch tested?
Adds standard unit tests for GBT save/load.

cc jkbradley GayathriMurali

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #12230 from yanboliang/spark-13783.

f9d578ea

[SPARK-14388][SQL] Implement CREATE TABLE · 7d2ed8cc

Andrew Or authored 8 years ago

## What changes were proposed in this pull request?

This patch implements the `CREATE TABLE` command using the `SessionCatalog`. Previously we handled only `CTAS` and `CREATE TABLE ... USING`. This requires us to refactor `CatalogTable` to accept various fields (e.g. bucket and skew columns) and pass them to Hive.

WIP: Note that I haven't verified whether this actually works yet! But I believe it does.

## How was this patch tested?

Tests will come in a future commit.

Author: Andrew Or <andrew@databricks.com>
Author: Yin Huai <yhuai@databricks.com>

Closes #12271 from andrewor14/create-table-ddl.

7d2ed8cc

[SPARK-14568][ML] Instrumentation framework for logistic regression · 1018a1c1

Timothy Hunter authored 8 years ago

## What changes were proposed in this pull request?

This adds extra logging information about a `LogisticRegression` estimator when being fit on a dataset. With this PR, you see the following extra lines when running the example in the documentation:

```
16/04/13 07:19:00 INFO Instrumentation: Instrumentation(LogisticRegression-logreg_55dd3c09f164-1230977381-1): training: numPartitions=1 storageLevel=StorageLevel(disk=true, memory=true, offheap=false, deserialized=true, replication=1)
16/04/13 07:19:00 INFO Instrumentation: Instrumentation(LogisticRegression-logreg_55dd3c09f164-1230977381-1): {"regParam":0.3,"elasticNetParam":0.8,"maxIter":10}
...
16/04/12 11:48:07 INFO Instrumentation: Instrumentation(LogisticRegression-logreg_a89eb23cb386-358781145):numClasses=2
16/04/12 11:48:07 INFO Instrumentation: Instrumentation(LogisticRegression-logreg_a89eb23cb386-358781145):numFeatures=692
...
16/04/13 07:19:01 INFO Instrumentation: Instrumentation(LogisticRegression-logreg_55dd3c09f164-1230977381-1): training finished
```

## How was this patch tested?

This PR was manually tested.

Author: Timothy Hunter <timhunter@databricks.com>

Closes #12331 from thunterdb/1604-instrumentation.

1018a1c1

Revert "[SPARK-14154][MLLIB] Simplify the implementation for Kolmogorov–Smirnov test" · 323e7390
Xiangrui Meng authored 8 years ago
```
This reverts commit d2a819a6.
```
323e7390

[SPARK-14537][CORE] Make TaskSchedulerImpl waiting fail if context is shut down · dd11e401

Charles Allen authored 8 years ago

This patch makes the postStartHook throw an IllegalStateException if the SparkContext is shutdown while it is waiting for the backend to be ready

Author: Charles Allen <charles@allen-net.com>

Closes #12301 from drcrallen/SPARK-14537.

dd11e401

[SPARK-13992][CORE][PYSPARK][FOLLOWUP] Update OFF_HEAP semantics for Java api and Python api · 23f93f55

Liwei Lin authored 8 years ago

## What changes were proposed in this pull request?

- updated `OFF_HEAP` semantics for `StorageLevels.java`
- updated `OFF_HEAP` semantics for `storagelevel.py`

## How was this patch tested?

no need to test

Author: Liwei Lin <lwlin7@gmail.com>

Closes #12126 from lw-lin/storagelevel.py.

23f93f55

Apr 12, 2016

[SPARK-14554][SQL][FOLLOW-UP] use checkDataset to check the result · a5f8c9b1

Wenchen Fan authored 8 years ago

## What changes were proposed in this pull request?

address this comment: https://github.com/apache/spark/pull/12322#discussion_r59417359

## How was this patch tested?

N/A

Author: Wenchen Fan <wenchen@databricks.com>

Closes #12346 from cloud-fan/tmp.

a5f8c9b1

[MINOR][SQL] Remove some unused imports in datasources. · 587cd554

hyukjinkwon authored 8 years ago

## What changes were proposed in this pull request?

It looks several recent commits for datasources (maybe while removing old `HadoopFsRelation` interface) missed removing some unused imports.

This PR removes some unused imports in datasources.

## How was this patch tested?

`sbt scalastyle` and some unit tests for them.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #12326 from HyukjinKwon/minor-imports.

587cd554

[SPARK-14579][SQL] Fix a race condition in StreamExecution.processAllAvailable · 768b3d62

Shixiong Zhu authored 8 years ago

## What changes were proposed in this pull request?

There is a race condition in `StreamExecution.processAllAvailable`. Here is an execution order to reproduce it.

| Time        |Thread 1           | MicroBatchThread  |
|:-------------:|:-------------:|:-----:|
| 1 | |  `dataAvailable in constructNextBatch` returns false  |
| 2 | addData(newData)      |   |
| 3 | `noNewData = false` in  processAllAvailable |  |
| 4 | | noNewData = true |
| 5 | `noNewData` is true so just return | |

The root cause is that `checking dataAvailable and change noNewData to true` is not atomic. This PR puts these two actions into `synchronized` to make sure they are atomic.

In addition, this PR also has the following changes:

- Make `committedOffsets` and `availableOffsets` volatile to make sure they can be seen in other threads.
- Copy the reference of `availableOffsets` to a local variable so that `sourceStatuses` can use a snapshot of `availableOffsets`.

## How was this patch tested?

Existing unit tests.

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #12339 from zsxwing/race-condition.

768b3d62

[SPARK-14578] [SQL] Fix codegen for CreateExternalRow with nested wide schema · 372baf04

Davies Liu authored 8 years ago

## What changes were proposed in this pull request?

The wide schema, the expression of fields will be splitted into multiple functions, but the variable for loopVar can't be accessed in splitted functions, this PR change them as class member.

## How was this patch tested?

Added regression test.

Author: Davies Liu <davies@databricks.com>

Closes #12338 from davies/nested_row.

372baf04

[SPARK-14363] Fix executor OOM due to memory leak in the Sorter · d187e7de

Sital Kedia authored 8 years ago

## What changes were proposed in this pull request?

Fix memory leak in the Sorter. When the UnsafeExternalSorter spills the data to disk, it does not free up the underlying pointer array. As a result, we see a lot of executor OOM and also memory under utilization.
This is a regression partially introduced in PR https://github.com/apache/spark/pull/9241

## How was this patch tested?

Tested by running a job and observed around 30% speedup after this change.

Author: Sital Kedia <skedia@fb.com>

Closes #12285 from sitalkedia/executor_oom.

d187e7de