- Dec 12, 2016
-
-
Bill Chambers authored
## What changes were proposed in this pull request? This PR clarifies where accumulators will be displayed. ## How was this patch tested? No testing. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Bill Chambers <bill@databricks.com> Author: anabranch <wac.chambers@gmail.com> Author: Bill Chambers <wchambers@ischool.berkeley.edu> Closes #16180 from anabranch/improve-acc-docs. (cherry picked from commit 70ffff21) Signed-off-by:
Sean Owen <sowen@cloudera.com>
-
- Dec 10, 2016
-
-
Dongjoon Hyun authored
## What changes were proposed in this pull request? According to the notice of the following Wiki front page, we can remove the obsolete wiki pointer safely in `README.md` and `docs/index.md`, too. These two lines are the last occurrence of that links. ``` All current wiki content has been merged into pages at http://spark.apache.org as of November 2016. Each page links to the new location of its information on the Spark web site. Obsolete wiki content is still hosted here, but carries a notice that it is no longer current. ``` ## How was this patch tested? Manual. - `README.md`: https://github.com/dongjoon-hyun/spark/tree/remove_wiki_from_readme - `docs/index.md`: ``` cd docs SKIP_API=1 jekyll build ```  Author: Dongjoon Hyun <dongjoon@apache.org> Closes #16239 from dongjoon-hyun/remove_wiki_from_readme. (cherry picked from commit f3a3fed7) Signed-off-by:
Sean Owen <sowen@cloudera.com>
-
- Dec 09, 2016
-
-
Xiangrui Meng authored
## What changes were proposed in this pull request? There has been some confusion around "Spark ML" vs. "MLlib". This PR adds some FAQ-like entries to the MLlib user guide to explain "Spark ML" and reduce the confusion. I check the [Spark FAQ page](http://spark.apache.org/faq.html ), which seems too high-level for the content here. So I added it to the MLlib user guide instead. cc: mateiz Author: Xiangrui Meng <meng@databricks.com> Closes #16241 from mengxr/SPARK-18812. (cherry picked from commit d2493a20) Signed-off-by:
Xiangrui Meng <meng@databricks.com>
-
Jacek Laskowski authored
## What changes were proposed in this pull request? Typo fixes ## How was this patch tested? Local build. Awaiting the official build. Author: Jacek Laskowski <jacek@japila.pl> Closes #16144 from jaceklaskowski/typo-fixes. (cherry picked from commit b162cc0c) Signed-off-by:
Sean Owen <sowen@cloudera.com>
-
- Dec 08, 2016
-
-
Yanbo Liang authored
## What changes were proposed in this pull request? * Add all R examples for ML wrappers which were added during 2.1 release cycle. * Split the whole ```ml.R``` example file into individual example for each algorithm, which will be convenient for users to rerun them. * Add corresponding examples to ML user guide. * Update ML section of SparkR user guide. Note: MLlib Scala/Java/Python examples will be consistent, however, SparkR examples may different from them, since R users may use the algorithms in a different way, for example, using R ```formula``` to specify ```featuresCol``` and ```labelCol```. ## How was this patch tested? Run all examples manually. Author: Yanbo Liang <ybliang8@gmail.com> Closes #16148 from yanboliang/spark-18325. (cherry picked from commit 9bf8f3cd) Signed-off-by:
Yanbo Liang <ybliang8@gmail.com>
-
Patrick Wendell authored
-
Patrick Wendell authored
-
- Dec 07, 2016
-
-
sethah authored
## What changes were proposed in this pull request? WeightedLeastSquares now supports L1 and elastic net penalties and has an additional solver option: QuasiNewton. The docs are updated to reflect this change. ## How was this patch tested? Docs only. Generated documentation to make sure Latex looks ok. Author: sethah <seth.hendrickson16@gmail.com> Closes #16139 from sethah/SPARK-18705. (cherry picked from commit 82253617) Signed-off-by:
Yanbo Liang <ybliang8@gmail.com>
-
wm624@hotmail.com authored
## What changes were proposed in this pull request? Logistic Regression summary is added in Python API. We need to add example and document for summary. The newly added example is consistent with Scala and Java examples. ## How was this patch tested? Manually tests: Run the example with spark-submit; copy & paste code into pyspark; build document and check the document. Author: wm624@hotmail.com <wm624@hotmail.com> Closes #16064 from wangmiao1981/py. (cherry picked from commit aad11209) Signed-off-by:
Joseph K. Bradley <joseph@databricks.com>
-
- Dec 05, 2016
-
-
Nicholas Chammas authored
Looking at the distributions provided on spark.apache.org, I see that the Spark YARN shuffle jar is under `yarn/` and not `lib/`. This change is so minor I'm not sure it needs a JIRA. But let me know if so and I'll create one. Author: Nicholas Chammas <nicholas.chammas@gmail.com> Closes #16130 from nchammas/yarn-doc-fix. (cherry picked from commit 5a92dc76) Signed-off-by:
Marcelo Vanzin <vanzin@cloudera.com>
-
Dongjoon Hyun authored
## What changes were proposed in this pull request? In `SQL Programming Guide`, this PR uses `TRUE` instead of `True` in SparkR and adds default values of `nullable` for `StructField` in Scala/Python/R (i.e., "Note: The default value of nullable is true."). In Java API, `nullable` is not optional. **BEFORE** * SPARK 2.1.0 RC1 http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc1-docs/sql-programming-guide.html#data-types **AFTER** * R <img width="916" alt="screen shot 2016-12-04 at 11 58 19 pm" src="https://cloud.githubusercontent.com/assets/9700541/20877443/abba19a6-ba7d-11e6-8984-afbe00333fb0.png"> * Scala <img width="914" alt="screen shot 2016-12-04 at 11 57 37 pm" src="https://cloud.githubusercontent.com/assets/9700541/20877433/99ce734a-ba7d-11e6-8bb5-e8619041b09b.png"> * Python <img width="914" alt="screen shot 2016-12-04 at 11 58 04 pm" src="https://cloud.githubusercontent.com/assets/9700541/20877440/a5c89338-ba7d-11e6-8f92-6c0ae9388d7e.png "> ## How was this patch tested? Manual. ``` cd docs SKIP_API=1 jekyll build open _site/index.html ``` Author: Dongjoon Hyun <dongjoon@apache.org> Closes #16141 from dongjoon-hyun/SPARK-SQL-GUIDE. (cherry picked from commit 410b7898) Signed-off-by:
Shivaram Venkataraman <shivaram@cs.berkeley.edu>
-
Yanbo Liang authored
## What changes were proposed in this pull request? Add R examples to ML programming guide for the following algorithms as POC: * spark.glm * spark.survreg * spark.naiveBayes * spark.kmeans The four algorithms were added to SparkR since 2.0.0, more docs for algorithms added during 2.1 release cycle will be addressed in a separate follow-up PR. ## How was this patch tested? This is the screenshots of generated ML programming guide for ```GeneralizedLinearRegression```:  Author: Yanbo Liang <ybliang8@gmail.com> Closes #16136 from yanboliang/spark-18279. (cherry picked from commit eb8dd681) Signed-off-by:
Yanbo Liang <ybliang8@gmail.com>
-
- Dec 04, 2016
-
-
Felix Cheung authored
## What changes were proposed in this pull request? If SparkR is running as a package and it has previously downloaded Spark Jar it should be able to run as before without having to set SPARK_HOME. Basically with this bug the auto install Spark will only work in the first session. This seems to be a regression on the earlier behavior. Fix is to always try to install or check for the cached Spark if running in an interactive session. As discussed before, we should probably only install Spark iff running in an interactive session (R shell, RStudio etc) ## How was this patch tested? Manually Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #16077 from felixcheung/rsessioninteractive. (cherry picked from commit b019b3a8) Signed-off-by:
Shivaram Venkataraman <shivaram@cs.berkeley.edu>
-
- Dec 03, 2016
-
-
Yunni authored
## What changes were proposed in this pull request? The user guide for LSH is added to ml-features.md, with several scala/java examples in spark-examples. ## How was this patch tested? Doc has been generated through Jekyll, and checked through manual inspection. Author: Yunni <Euler57721@gmail.com> Author: Yun Ni <yunn@uber.com> Author: Joseph K. Bradley <joseph@databricks.com> Author: Yun Ni <Euler57721@gmail.com> Closes #15795 from Yunni/SPARK-18081-lsh-guide. (cherry picked from commit 34777184) Signed-off-by:
Joseph K. Bradley <joseph@databricks.com>
-
- Dec 02, 2016
-
-
Yanbo Liang authored
## What changes were proposed in this pull request? Update ML programming and migration guide for 2.1 release. ## How was this patch tested? Doc change, no test. Author: Yanbo Liang <ybliang8@gmail.com> Closes #16076 from yanboliang/spark-18324. (cherry picked from commit 2dc0d7ef) Signed-off-by:
Joseph K. Bradley <joseph@databricks.com>
-
- Nov 30, 2016
-
-
Yanbo Liang authored
## What changes were proposed in this pull request? API review for 2.1, except ```LSH``` related classes which are still under development. ## How was this patch tested? Only doc changes, no new tests. Author: Yanbo Liang <ybliang8@gmail.com> Closes #16009 from yanboliang/spark-18318. (cherry picked from commit 60022bfd) Signed-off-by:
Joseph K. Bradley <joseph@databricks.com>
-
manishAtGit authored
## What changes were proposed in this pull request? Added missing semicolon in quick-start-guide java example code which wasn't compiling before. ## How was this patch tested? Locally by running and generating site for docs. You can see the last line contains ";" in the below snapshot.  Author: manishAtGit <manish@knoldus.com> Closes #16081 from manishatGit/fixed-quick-start-guide. (cherry picked from commit bc95ea0b) Signed-off-by:
Andrew Or <andrewor14@gmail.com>
-
- Nov 29, 2016
-
-
Eric Liang authored
## What changes were proposed in this pull request? This documents the partition handling changes for Spark 2.1 and how to migrate existing tables. ## How was this patch tested? Built docs locally. rxin Author: Eric Liang <ekl@databricks.com> Closes #16074 from ericl/spark-18145. (cherry picked from commit 489845f3) Signed-off-by:
Reynold Xin <rxin@databricks.com>
-
aokolnychyi authored
[MINOR][DOCS] Updates to the Accumulator example in the programming guide. Fixed typos, AccumulatorV2 in Java ## What changes were proposed in this pull request? This pull request contains updates to Scala and Java Accumulator code snippets in the programming guide. - For Scala, the pull request fixes the signature of the 'add()' method in the custom Accumulator, which contained two params (as the old AccumulatorParam) instead of one (as in AccumulatorV2). - The Java example was updated to use the AccumulatorV2 class since AccumulatorParam is marked as deprecated. - Scala and Java examples are more consistent now. ## How was this patch tested? This patch was tested manually by building the docs locally.  Author: aokolnychyi <okolnychyyanton@gmail.com> Closes #16024 from aokolnychyi/fixed_accumulator_example. (cherry picked from commit f045d9da) Signed-off-by:
Sean Owen <sowen@cloudera.com>
-
- Nov 28, 2016
-
-
Marcelo Vanzin authored
This change modifies the method used to propagate encryption keys used during shuffle. Instead of relying on YARN's UserGroupInformation credential propagation, this change explicitly distributes the key using the messages exchanged between driver and executor during registration. When RPC encryption is enabled, this means key propagation is also secure. This allows shuffle encryption to work in non-YARN mode, which means that it's easier to write unit tests for areas of the code that are affected by the feature. The key is stored in the SecurityManager; because there are many instances of that class used in the code, the key is only guaranteed to exist in the instance managed by the SparkEnv. This path was chosen to avoid storing the key in the SparkConf, which would risk having the key being written to disk as part of the configuration (as, for example, is done when starting YARN applications). Tested by new and existing unit tests (which were moved from the YARN module to core), and by running apps with shuffle encryption enabled. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #15981 from vanzin/SPARK-18547. (cherry picked from commit 8b325b17) Signed-off-by:
Shixiong Zhu <shixiong@databricks.com>
-
Patrick Wendell authored
-
Patrick Wendell authored
-
- Nov 26, 2016
-
-
Weiqing Yang authored
## What changes were proposed in this pull request? This PR is to fix incorrect `code` tag in `sql-programming-guide.md` ## How was this patch tested? Manually. Author: Weiqing Yang <yangweiqing001@gmail.com> Closes #15941 from weiqingy/fixtag. (cherry picked from commit f4a98e42) Signed-off-by:
Sean Owen <sowen@cloudera.com>
-
- Nov 23, 2016
-
-
Sean Owen authored
## What changes were proposed in this pull request? Updates links to the wiki to links to the new location of content on spark.apache.org. ## How was this patch tested? Doc builds Author: Sean Owen <sowen@cloudera.com> Closes #15967 from srowen/SPARK-18073.1. (cherry picked from commit 7e0cd1d9) Signed-off-by:
Sean Owen <sowen@cloudera.com>
-
- Nov 19, 2016
-
-
Sean Owen authored
## What changes were proposed in this pull request? Avoid hard-coding spark.rpc.askTimeout to non-default in Client; fix doc about spark.rpc.askTimeout default ## How was this patch tested? Existing tests Author: Sean Owen <sowen@cloudera.com> Closes #15833 from srowen/SPARK-18353. (cherry picked from commit 8b1e1088) Signed-off-by:
Sean Owen <sowen@cloudera.com>
-
hyukjinkwon authored
[SPARK-18445][BUILD][DOCS] Fix the markdown for `Note:`/`NOTE:`/`Note that`/`'''Note:'''` across Scala/Java API documentation It seems in Scala/Java, - `Note:` - `NOTE:` - `Note that` - `'''Note:'''` - `note` This PR proposes to fix those to `note` to be consistent. **Before** - Scala  - Java  **After** - Scala  - Java  The notes were found via ```bash grep -r "NOTE: " . | \ # Note:|NOTE:|Note that|'''Note:''' grep -v "// NOTE: " | \ # starting with // does not appear in API documentation. grep -E '.scala|.java' | \ # java/scala files grep -v Suite | \ # exclude tests grep -v Test | \ # exclude tests grep -e 'org.apache.spark.api.java' \ # packages appear in API documenation -e 'org.apache.spark.api.java.function' \ # note that this is a regular expression. So actual matches were mostly `org/apache/spark/api/java/functions ...` -e 'org.apache.spark.api.r' \ ... ``` ```bash grep -r "Note that " . | \ # Note:|NOTE:|Note that|'''Note:''' grep -v "// Note that " | \ # starting with // does not appear in API documentation. grep -E '.scala|.java' | \ # java/scala files grep -v Suite | \ # exclude tests grep -v Test | \ # exclude tests grep -e 'org.apache.spark.api.java' \ # packages appear in API documenation -e 'org.apache.spark.api.java.function' \ -e 'org.apache.spark.api.r' \ ... ``` ```bash grep -r "Note: " . | \ # Note:|NOTE:|Note that|'''Note:''' grep -v "// Note: " | \ # starting with // does not appear in API documentation. grep -E '.scala|.java' | \ # java/scala files grep -v Suite | \ # exclude tests grep -v Test | \ # exclude tests grep -e 'org.apache.spark.api.java' \ # packages appear in API documenation -e 'org.apache.spark.api.java.function' \ -e 'org.apache.spark.api.r' \ ... ``` ```bash grep -r "'''Note:'''" . | \ # Note:|NOTE:|Note that|'''Note:''' grep -v "// '''Note:''' " | \ # starting with // does not appear in API documentation. grep -E '.scala|.java' | \ # java/scala files grep -v Suite | \ # exclude tests grep -v Test | \ # exclude tests grep -e 'org.apache.spark.api.java' \ # packages appear in API documenation -e 'org.apache.spark.api.java.function' \ -e 'org.apache.spark.api.r' \ ... ``` And then fixed one by one comparing with API documentation/access modifiers. After that, manually tested via `jekyll build`. Author: hyukjinkwon <gurwls223@gmail.com> Closes #15889 from HyukjinKwon/SPARK-18437. (cherry picked from commit d5b1d5fc) Signed-off-by:
Sean Owen <sowen@cloudera.com>
-
- Nov 17, 2016
-
-
Zheng RuiFeng authored
## What changes were proposed in this pull request? 1, There are two `[Graph.partitionBy]` in `graphx-programming-guide.md`, the first one had no effert. 2, `DataFrame`, `Transformer`, `Pipeline` and `Parameter` in `ml-pipeline.md` were linked to `ml-guide.html` by mistake. 3, `PythonMLLibAPI` in `mllib-linear-methods.md` was not accessable, because class `PythonMLLibAPI` is private. 4, Other link updates. ## How was this patch tested? manual tests Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #15912 from zhengruifeng/md_fix. (cherry picked from commit cdaf4ce9) Signed-off-by:
Sean Owen <sowen@cloudera.com>
-
Weiqing Yang authored
## What changes were proposed in this pull request? Remove `spark.driver.memory`, `spark.executor.memory`, `spark.driver.cores`, and `spark.executor.cores` from `running-on-yarn.md` as they are not Yarn-specific, and they are also defined in`configuration.md`. ## How was this patch tested? Build passed & Manually check. Author: Weiqing Yang <yangweiqing001@gmail.com> Closes #15869 from weiqingy/yarnDoc. (cherry picked from commit a3cac7bd) Signed-off-by:
Sean Owen <sowen@cloudera.com>
-
- Nov 16, 2016
-
-
Holden Karau authored
## What changes were proposed in this pull request? This PR aims to provide a pip installable PySpark package. This does a bunch of work to copy the jars over and package them with the Python code (to prevent challenges from trying to use different versions of the Python code with different versions of the JAR). It does not currently publish to PyPI but that is the natural follow up (SPARK-18129). Done: - pip installable on conda [manual tested] - setup.py installed on a non-pip managed system (RHEL) with YARN [manual tested] - Automated testing of this (virtualenv) - packaging and signing with release-build* Possible follow up work: - release-build update to publish to PyPI (SPARK-18128) - figure out who owns the pyspark package name on prod PyPI (is it someone with in the project or should we ask PyPI or should we choose a different name to publish with like ApachePySpark?) - Windows support and or testing ( SPARK-18136 ) - investigate details of wheel caching and see if we can avoid cleaning the wheel cache during our test - consider how we want to number our dev/snapshot versions Explicitly out of scope: - Using pip installed PySpark to start a standalone cluster - Using pip installed PySpark for non-Python Spark programs *I've done some work to test release-build locally but as a non-committer I've just done local testing. ## How was this patch tested? Automated testing with virtualenv, manual testing with conda, a system wide install, and YARN integration. release-build changes tested locally as a non-committer (no testing of upload artifacts to Apache staging websites) Author: Holden Karau <holden@us.ibm.com> Author: Juliet Hougland <juliet@cloudera.com> Author: Juliet Hougland <not@myemail.com> Closes #15659 from holdenk/SPARK-1267-pip-install-pyspark.
-
Artur Sukhenko authored
## What changes were proposed in this pull request? Suggest users to increase `NodeManager's` heap size if `External Shuffle Service` is enabled as `NM` can spend a lot of time doing GC resulting in shuffle operations being a bottleneck due to `Shuffle Read blocked time` bumped up. Also because of GC `NodeManager` can use an enormous amount of CPU and cluster performance will suffer. I have seen NodeManager using 5-13G RAM and up to 2700% CPU with `spark_shuffle` service on. ## How was this patch tested? #### Added step 5:  Author: Artur Sukhenko <artur.sukhenko@gmail.com> Closes #15906 from Devian-ua/nmHeapSize. (cherry picked from commit 55589987) Signed-off-by:
Reynold Xin <rxin@databricks.com>
-
Tathagata Das authored
## What changes were proposed in this pull request? <img width="941" alt="screen shot 2016-11-15 at 6 27 32 pm" src="https://cloud.githubusercontent.com/assets/663212/20332521/4190b858-ab61-11e6-93a6-4bdc05105ed9.png"> <img width="940" alt="screen shot 2016-11-15 at 6 27 45 pm" src="https://cloud.githubusercontent.com/assets/663212/20332525/44a0d01e-ab61-11e6-8668-47f925490d4f.png "> Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #15897 from tdas/SPARK-18461. (cherry picked from commit bb6cdfd9) Signed-off-by:
Michael Armbrust <michael@databricks.com>
-
Zheng RuiFeng authored
## What changes were proposed in this pull request? Add links to API docs for ML algos ## How was this patch tested? Manual checking for the API links Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #15890 from zhengruifeng/algo_link. (cherry picked from commit a75e3fe9) Signed-off-by:
Sean Owen <sowen@cloudera.com>
-
Weiqing Yang authored
[MINOR][DOC] Fix typos in the 'configuration', 'monitoring' and 'sql-programming-guide' documentation ## What changes were proposed in this pull request? Fix typos in the 'configuration', 'monitoring' and 'sql-programming-guide' documentation. ## How was this patch tested? Manually. Author: Weiqing Yang <yangweiqing001@gmail.com> Closes #15886 from weiqingy/fixTypo. (cherry picked from commit 241e04bc) Signed-off-by:
Sean Owen <sowen@cloudera.com>
-
Liwei Lin authored
## Before  ## After  Author: Liwei Lin <lwlin7@gmail.com> Closes #15903 from lw-lin/kafka-doc-lines. (cherry picked from commit 3e01f128) Signed-off-by:
Sean Owen <sowen@cloudera.com>
-
- Nov 15, 2016
-
-
Zheng RuiFeng authored
## What changes were proposed in this pull request? 1,Remove `runs` from docs of mllib.KMeans 2,Add notes for `k` according to comments in sources ## How was this patch tested? existing tests Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #15873 from zhengruifeng/update_doc_mllib_kmeans. (cherry picked from commit 33be4da5) Signed-off-by:
Sean Owen <sowen@cloudera.com>
-
- Nov 14, 2016
-
-
Zheng RuiFeng authored
## What changes were proposed in this pull request? 1, Add link of `VertexRDD` and `EdgeRDD` 2, Notify in `Vertex and Edge RDDs` that not all methods are listed 3, `VertexID` -> `VertexId` ## How was this patch tested? No tests, only docs is modified Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #15875 from zhengruifeng/update_graphop_doc. (cherry picked from commit c31def1d) Signed-off-by:
Reynold Xin <rxin@databricks.com>
-
Noritaka Sekiyama authored
Changed HDFS default block size from 64MB to 128MB. https://issues.apache.org/jira/browse/SPARK-18432 Author: Noritaka Sekiyama <moomindani@gmail.com> Closes #15879 from moomindani/SPARK-18432. (cherry picked from commit 9d07ceee) Signed-off-by:
Kousuke Saruta <sarutak@oss.nttdata.co.jp>
-
- Nov 13, 2016
-
-
Denny Lee authored
[SPARK-18426][STRUCTURED STREAMING] Python Documentation Fix for Structured Streaming Programming Guide ## What changes were proposed in this pull request? Update the python section of the Structured Streaming Guide from .builder() to .builder ## How was this patch tested? Validated documentation and successfully running the test example. Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before opening a pull request. 'Builder' object is not callable object hence changed .builder() to .builder Author: Denny Lee <dennylee@gallifrey.local> Closes #15872 from dennyglee/master. (cherry picked from commit b91a51bb) Signed-off-by:
Reynold Xin <rxin@databricks.com>
-
- Nov 08, 2016
-
-
chie8842 authored
I created Scala and Java example and added documentation. Author: chie8842 <hayashidac@nttdata.co.jp> Closes #15658 from hayashidac/SPARK-13770. (cherry picked from commit ee2e741a) Signed-off-by:
Sean Owen <sowen@cloudera.com>
-
- Nov 07, 2016
-
-
fidato authored
## What changes were proposed in this pull request? This Pull request comprises of the critical bug SPARK-16575 changes. This change rectifies the issue with BinaryFileRDD partition calculations as upon creating an RDD with sc.binaryFiles, the resulting RDD always just consisted of two partitions only. ## How was this patch tested? The original issue ie. getNumPartitions on binary Files RDD (always having two partitions) was first replicated and then tested upon the changes. Also the unit tests have been checked and passed. This contribution is my original work and I licence the work to the project under the project's open source license srowen hvanhovell rxin vanzin skyluc kmader zsxwing datafarmer Please have a look . Author: fidato <fidato.july13@gmail.com> Closes #15327 from fidato13/SPARK-16575. (cherry picked from commit 6f369713) Signed-off-by:
Reynold Xin <rxin@databricks.com>
-