Skip to content
Snippets Groups Projects
  1. Jun 29, 2016
    • Dongjoon Hyun's avatar
      [SPARK-16006][SQL] Attemping to write empty DataFrame with no fields throws non-intuitive exception · 9b1b3ae7
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      This PR allows `emptyDataFrame.write` since the user didn't specify any partition columns.
      
      **Before**
      ```scala
      scala> spark.emptyDataFrame.write.parquet("/tmp/t1")
      org.apache.spark.sql.AnalysisException: Cannot use all columns for partition columns;
      scala> spark.emptyDataFrame.write.csv("/tmp/t1")
      org.apache.spark.sql.AnalysisException: Cannot use all columns for partition columns;
      ```
      
      After this PR, there occurs no exceptions and the created directory has only one file, `_SUCCESS`, as expected.
      
      ## How was this patch tested?
      
      Pass the Jenkins tests including updated test cases.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #13730 from dongjoon-hyun/SPARK-16006.
      9b1b3ae7
    • Yin Huai's avatar
      [SPARK-16301] [SQL] The analyzer rule for resolving using joins should respect... · 8b5a8b25
      Yin Huai authored
      [SPARK-16301] [SQL] The analyzer rule for resolving using joins should respect the case sensitivity setting.
      
      ## What changes were proposed in this pull request?
      The analyzer rule for resolving using joins should respect the case sensitivity setting.
      
      ## How was this patch tested?
      New tests in ResolveNaturalJoinSuite
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #13977 from yhuai/SPARK-16301.
      8b5a8b25
    • hyukjinkwon's avatar
      [TRIVIAL] [PYSPARK] Clean up orc compression option as well · d8a87a3e
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      This PR corrects ORC compression option for PySpark as well. I think this was missed mistakenly in https://github.com/apache/spark/pull/13948.
      
      ## How was this patch tested?
      
      N/A
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #13963 from HyukjinKwon/minor-orc-compress.
      d8a87a3e
    • Tathagata Das's avatar
      [SPARK-16256][SQL][STREAMING] Added Structured Streaming Programming Guide · 64132a14
      Tathagata Das authored
      Title defines all.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #13945 from tdas/SPARK-16256.
      64132a14
    • hyukjinkwon's avatar
      [SPARK-14480][SQL] Remove meaningless StringIteratorReader for CSV data source. · cb1b9d34
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      This PR removes meaningless `StringIteratorReader` for CSV data source.
      
      In `CSVParser.scala`, there is an `Reader` wrapping `Iterator` but there are two problems by this.
      
      Firstly, it was actually not faster than processing line by line with Iterator due to additional logics to wrap `Iterator` to `Reader`.
      Secondly, this brought a bit of complexity because it needs additional logics to allow every line to be read bytes by bytes. So, it was pretty difficult to figure out issues about parsing, (eg. SPARK-14103).
      
      A benchmark was performed manually and the results were below:
      
      - Original codes with Reader wrapping Iterator
      
      |End-to-end (ns)  |   Parse Time (ns) |
      |-----------------------|------------------------|
      |14116265034      |2008277960        |
      
      - New codes with Iterator
      
      |End-to-end (ns)  |   Parse Time (ns) |
      |-----------------------|------------------------|
      |13451699644      | 1549050564       |
      
      For the details for the environment, dataset and methods, please refer the JIRA ticket.
      
      ## How was this patch tested?
      
      Existing tests should cover this.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #13808 from HyukjinKwon/SPARK-14480-small.
      cb1b9d34
    • gatorsmile's avatar
      [SPARK-16236][SQL][FOLLOWUP] Add Path Option back to Load API in DataFrameReader · 39f2eb1d
      gatorsmile authored
      #### What changes were proposed in this pull request?
      In Python API, we have the same issue. Thanks for identifying this issue, zsxwing ! Below is an example:
      ```Python
      spark.read.format('json').load('python/test_support/sql/people.json')
      ```
      #### How was this patch tested?
      Existing test cases cover the changes by this PR
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #13965 from gatorsmile/optionPaths.
      39f2eb1d
    • Xin Ren's avatar
      [SPARK-16140][MLLIB][SPARKR][DOCS] Group k-means method in generated R doc · 8c9cd0a7
      Xin Ren authored
      https://issues.apache.org/jira/browse/SPARK-16140
      
      ## What changes were proposed in this pull request?
      
      Group the R doc of spark.kmeans, predict(KM), summary(KM), read/write.ml(KM) under Rd spark.kmeans. The example code was updated.
      
      ## How was this patch tested?
      
      Tested on my local machine
      
      And on my laptop `jekyll build` is failing to build API docs, so here I can only show you the html I manually generated from Rd files, with no CSS applied, but the doc content should be there.
      
      ![screenshotkmeans](https://cloud.githubusercontent.com/assets/3925641/16403203/c2c9ca1e-3ca7-11e6-9e29-f2164aee75fc.png)
      
      Author: Xin Ren <iamshrek@126.com>
      
      Closes #13921 from keypointt/SPARK-16140.
      8c9cd0a7
    • Yanbo Liang's avatar
      [MINOR][SPARKR] Fix arguments of survreg in SparkR · c6a220d7
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      Fix wrong arguments description of ```survreg``` in SparkR.
      
      ## How was this patch tested?
      ```Arguments``` section of ```survreg``` doc before this PR (with wrong description for ```path``` and missing ```overwrite```):
      ![image](https://cloud.githubusercontent.com/assets/1962026/16447548/fe7a5ed4-3da1-11e6-8b96-b5bf2083b07e.png)
      
      After this PR:
      ![image](https://cloud.githubusercontent.com/assets/1962026/16447617/368e0b18-3da2-11e6-8277-45640fb11859.png)
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #13970 from yanboliang/spark-16143-followup.
      c6a220d7
    • jerryshao's avatar
      [SPARK-15990][YARN] Add rolling log aggregation support for Spark on yarn · 272a2f78
      jerryshao authored
      ## What changes were proposed in this pull request?
      
      Yarn supports rolling log aggregation since 2.6, previously log will only be aggregated to HDFS after application is finished, it is quite painful for long running applications like Spark Streaming, thriftserver. Also out of disk problem will be occurred when log file is too large. So here propose to add support of rolling log aggregation for Spark on yarn.
      
      One limitation for this is that log4j should be set to change to file appender, now in Spark itself uses console appender by default, in which file will not be created again once removed after aggregation. But I think lots of production users should have changed their log4j configuration instead of default on, so this is not a big problem.
      
      ## How was this patch tested?
      
      Manually verified with Hadoop 2.7.1.
      
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #13712 from jerryshao/SPARK-15990.
      272a2f78
    • Mahmoud Rawas's avatar
      [SPARK-15858][ML] Fix calculating error by tree stack over flow prob… · 393db655
      Mahmoud Rawas authored
      ## What changes were proposed in this pull request?
      
      What changes were proposed in this pull request?
      
      Improving evaluateEachIteration function in mllib as it fails when trying to calculate error by tree for a model that has more than 500 trees
      
      ## How was this patch tested?
      
      the batch tested on productions data set (2K rows x 2K features) training a gradient boosted model without validation with 1000 maxIteration settings, then trying to produce the error by tree, the new patch was able to perform the calculation within 30 seconds, while previously it was take hours then fail.
      
      **PS**: It would be better if this PR can be cherry picked into release branches 1.6.1 and 2.0
      
      Author: Mahmoud Rawas <mhmoudr@gmail.com>
      Author: Mahmoud Rawas <Mahmoud.Rawas@quantium.com.au>
      
      Closes #13624 from mhmoudr/SPARK-15858.master.
      393db655
    • Bryan Cutler's avatar
      [SPARK-16261][EXAMPLES][ML] Fixed incorrect appNames in ML Examples · 21385d02
      Bryan Cutler authored
      ## What changes were proposed in this pull request?
      
      Some appNames in ML examples are incorrect, mostly in PySpark but one in Scala.  This corrects the names.
      
      ## How was this patch tested?
      Style, local tests
      
      Author: Bryan Cutler <cutlerb@gmail.com>
      
      Closes #13949 from BryanCutler/pyspark-example-appNames-fix-SPARK-16261.
      21385d02
    • gatorsmile's avatar
      [SPARK-16157][SQL] Add New Methods for comments in StructField and StructType · 7ee9e39c
      gatorsmile authored
      #### What changes were proposed in this pull request?
      Based on the previous discussion with cloud-fan hvanhovell in another related PR https://github.com/apache/spark/pull/13764#discussion_r67994276, it looks reasonable to add convenience methods for users to add `comment` when defining `StructField`.
      
      Currently, the column-related `comment` attribute is stored in `Metadata` of `StructField`. For example, users can add the `comment` attribute using the following way:
      ```Scala
      StructType(
        StructField(
          "cl1",
          IntegerType,
          nullable = false,
          new MetadataBuilder().putString("comment", "test").build()) :: Nil)
      ```
      This PR is to add more user friendly methods for the `comment` attribute when defining a `StructField`. After the changes, users are provided three different ways to do it:
      ```Scala
      val struct = (new StructType)
        .add("a", "int", true, "test1")
      
      val struct = (new StructType)
        .add("c", StringType, true, "test3")
      
      val struct = (new StructType)
        .add(StructField("d", StringType).withComment("test4"))
      ```
      
      #### How was this patch tested?
      Added test cases:
      - `DataTypeSuite` is for testing three types of API changes,
      - `DataFrameReaderWriterSuite` is for parquet, json and csv formats - using in-memory catalog
      - `OrcQuerySuite.scala` is for orc format using Hive-metastore
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #13860 from gatorsmile/newMethodForComment.
      7ee9e39c
    • Cheng Lian's avatar
      [SPARK-16291][SQL] CheckAnalysis should capture nested aggregate functions... · d1e81088
      Cheng Lian authored
      [SPARK-16291][SQL] CheckAnalysis should capture nested aggregate functions that reference no input attributes
      
      ## What changes were proposed in this pull request?
      
      `MAX(COUNT(*))` is invalid since aggregate expression can't be nested within another aggregate expression. This case should be captured at analysis phase, but somehow sneaks off to runtime.
      
      The reason is that when checking aggregate expressions in `CheckAnalysis`, a checking branch treats all expressions that reference no input attributes as valid ones. However, `MAX(COUNT(*))` is translated into `MAX(COUNT(1))` at analysis phase and also references no input attribute.
      
      This PR fixes this issue by removing the aforementioned branch.
      
      ## How was this patch tested?
      
      New test case added in `AnalysisErrorSuite`.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #13968 from liancheng/spark-16291-nested-agg-functions.
      d1e81088
    • Holden Karau's avatar
      [TRIVIAL][DOCS][STREAMING][SQL] The return type mentioned in the Javadoc is... · 757dc2c0
      Holden Karau authored
      [TRIVIAL][DOCS][STREAMING][SQL] The return type mentioned in the Javadoc is incorrect for toJavaRDD, …
      
      ## What changes were proposed in this pull request?
      
      Change the return type mentioned in the JavaDoc for `toJavaRDD` / `javaRDD` to match the actual return type & be consistent with the scala rdd return type.
      
      ## How was this patch tested?
      
      Docs only change.
      
      Author: Holden Karau <holden@us.ibm.com>
      
      Closes #13954 from holdenk/trivial-streaming-tojavardd-doc-fix.
      757dc2c0
    • Tathagata Das's avatar
      [SPARK-16266][SQL][STREAING] Moved DataStreamReader/Writer from pyspark.sql to... · f454a7f9
      Tathagata Das authored
      [SPARK-16266][SQL][STREAING] Moved DataStreamReader/Writer from pyspark.sql to pyspark.sql.streaming
      
      ## What changes were proposed in this pull request?
      
      - Moved DataStreamReader/Writer from pyspark.sql to pyspark.sql.streaming to make them consistent with scala packaging
      - Exposed the necessary classes in sql.streaming package so that they appear in the docs
      - Added pyspark.sql.streaming module to the docs
      
      ## How was this patch tested?
      - updated unit tests.
      - generated docs for testing visibility of pyspark.sql.streaming classes.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #13955 from tdas/SPARK-16266.
      f454a7f9
  2. Jun 28, 2016
    • petermaxlee's avatar
      [SPARK-16271][SQL] Implement Hive's UDFXPathUtil · 153c2f9a
      petermaxlee authored
      ## What changes were proposed in this pull request?
      This patch ports Hive's UDFXPathUtil over to Spark, which can be used to implement xpath functionality in Spark in the near future.
      
      ## How was this patch tested?
      Added two new test suites UDFXPathUtilSuite and ReusableStringReaderSuite. They have been ported over from Hive (but rewritten in Scala in order to leverage ScalaTest).
      
      Author: petermaxlee <petermaxlee@gmail.com>
      
      Closes #13961 from petermaxlee/xpath.
      153c2f9a
    • Yanbo Liang's avatar
      [SPARK-16245][ML] model loading backward compatibility for ml.feature.PCA · 0df5ce1b
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      model loading backward compatibility for ml.feature.PCA.
      
      ## How was this patch tested?
      existing ut and manual test for loading models saved by Spark 1.6.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #13937 from yanboliang/spark-16245.
      0df5ce1b
    • Reynold Xin's avatar
      [SPARK-16248][SQL] Whitelist the list of Hive fallback functions · 363bcede
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      This patch removes the blind fallback into Hive for functions. Instead, it creates a whitelist and adds only a small number of functions to the whitelist, i.e. the ones we intend to support in the long run in Spark.
      
      ## How was this patch tested?
      Updated tests to reflect the change.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #13939 from rxin/hive-whitelist.
      363bcede
    • Shixiong Zhu's avatar
      [SPARK-16268][PYSPARK] SQLContext should import DataStreamReader · 5bf8881b
      Shixiong Zhu authored
      ## What changes were proposed in this pull request?
      
      Fixed the following error:
      ```
      >>> sqlContext.readStream
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
        File "...", line 442, in readStream
          return DataStreamReader(self._wrapped)
      NameError: global name 'DataStreamReader' is not defined
      ```
      
      ## How was this patch tested?
      
      The added test.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #13958 from zsxwing/fix-import.
      5bf8881b
    • Felix Cheung's avatar
      [SPARKR] add csv tests · 823518c2
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      Add unit tests for csv data for SPARKR
      
      ## How was this patch tested?
      
      unit tests
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #13904 from felixcheung/rcsv.
      823518c2
    • Burak Yavuz's avatar
      [MINOR][DOCS][STRUCTURED STREAMING] Minor doc fixes around `DataFrameWriter` and `DataStreamWriter` · 5545b791
      Burak Yavuz authored
      ## What changes were proposed in this pull request?
      
      Fixes a couple old references to `DataFrameWriter.startStream` to `DataStreamWriter.start
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      
      Closes #13952 from brkyvz/minor-doc-fix.
      5545b791
    • James Thomas's avatar
      [SPARK-16114][SQL] structured streaming network word count examples · 3554713a
      James Thomas authored
      ## What changes were proposed in this pull request?
      
      Network word count example for structured streaming
      
      ## How was this patch tested?
      
      Run locally
      
      Author: James Thomas <jamesjoethomas@gmail.com>
      Author: James Thomas <jamesthomas@Jamess-MacBook-Pro.local>
      
      Closes #13816 from jjthomas/master.
      3554713a
    • Wenchen Fan's avatar
      [SPARK-16100][SQL] fix bug when use Map as the buffer type of Aggregator · 8a977b06
      Wenchen Fan authored
      ## What changes were proposed in this pull request?
      
      The root cause is in `MapObjects`. Its parameter `loopVar` is not declared as child, but sometimes can be same with `lambdaFunction`(e.g. the function that takes `loopVar` and produces `lambdaFunction` may be `identity`), which is a child. This brings trouble when call `withNewChildren`, it may mistakenly treat `loopVar` as a child and cause `IndexOutOfBoundsException: 0` later.
      
      This PR fixes this bug by simply pulling out the paremters from `LambdaVariable` and pass them to `MapObjects` directly.
      
      ## How was this patch tested?
      
      new test in `DatasetAggregatorSuite`
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #13835 from cloud-fan/map-objects.
      8a977b06
    • gatorsmile's avatar
      [SPARK-16236][SQL] Add Path Option back to Load API in DataFrameReader · 25520e97
      gatorsmile authored
      #### What changes were proposed in this pull request?
      koertkuipers identified the PR https://github.com/apache/spark/pull/13727/ changed the behavior of `load` API. After the change, the `load` API does not add the value of `path` into the `options`. Thank you!
      
      This PR is to add the option `path` back to `load()` API in `DataFrameReader`, if and only if users specify one and only one `path` in the `load` API. For example, users can see the `path` option after the following API call,
      ```Scala
      spark.read
        .format("parquet")
        .load("/test")
      ```
      
      #### How was this patch tested?
      Added test cases.
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #13933 from gatorsmile/optionPath.
      25520e97
    • Davies Liu's avatar
      [SPARK-16175] [PYSPARK] handle None for UDT · 35438fb0
      Davies Liu authored
      ## What changes were proposed in this pull request?
      
      Scala UDT will bypass all the null and will not pass them into serialize() and deserialize() of UDT, this PR update the Python UDT to do this as well.
      
      ## How was this patch tested?
      
      Added tests.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #13878 from davies/udt_null.
      35438fb0
    • Davies Liu's avatar
      [SPARK-16259][PYSPARK] cleanup options in DataFrame read/write API · 1aad8c6e
      Davies Liu authored
      ## What changes were proposed in this pull request?
      
      There are some duplicated code for options in DataFrame reader/writer API, this PR clean them up, it also fix a bug for `escapeQuotes` of csv().
      
      ## How was this patch tested?
      
      Existing tests.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #13948 from davies/csv_options.
      1aad8c6e
    • Tom Magrino's avatar
      [SPARK-16148][SCHEDULER] Allow for underscores in TaskLocation in the Executor ID · ae14f362
      Tom Magrino authored
      ## What changes were proposed in this pull request?
      
      Previously, the TaskLocation implementation would not allow for executor ids which include underscores.  This tweaks the string split used to get the hostname and executor id, allowing for underscores in the executor id.
      
      This addresses the JIRA found here: https://issues.apache.org/jira/browse/SPARK-16148
      
      This is moved over from a previous PR against branch-1.6: https://github.com/apache/spark/pull/13857
      
      ## How was this patch tested?
      
      Ran existing unit tests for core and streaming.  Manually ran a simple streaming job with an executor whose id contained underscores and confirmed that the job ran successfully.
      
      This is my original work and I license the work to the project under the project's open source license.
      
      Author: Tom Magrino <tmagrino@fb.com>
      
      Closes #13858 from tmagrino/fixtasklocation.
      ae14f362
    • WeichenXu's avatar
      [MINOR][SPARKR] update sparkR DataFrame.R comment · d59ba8e3
      WeichenXu authored
      ## What changes were proposed in this pull request?
      
      update sparkR DataFrame.R comment
      SQLContext ==> SparkSession
      
      ## How was this patch tested?
      
      N/A
      
      Author: WeichenXu <WeichenXu123@outlook.com>
      
      Closes #13946 from WeichenXu123/sparkR_comment_update_sparkSession.
      d59ba8e3
    • Yanbo Liang's avatar
      [SPARK-15643][DOC][ML] Update spark.ml and spark.mllib migration guide from 1.6 to 2.0 · 26252f70
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      Update ```spark.ml``` and ```spark.mllib``` migration guide from 1.6 to 2.0.
      
      ## How was this patch tested?
      Docs update, no tests.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #13378 from yanboliang/spark-13448.
      26252f70
    • Wenchen Fan's avatar
      [SPARK-16181][SQL] outer join with isNull filter may return wrong result · 1f2776df
      Wenchen Fan authored
      ## What changes were proposed in this pull request?
      
      The root cause is: the output attributes of outer join are derived from its children, while they are actually different attributes(outer join can return null).
      
      We have already added some special logic to handle it, e.g. `PushPredicateThroughJoin` won't push down predicates through outer join side, `FixNullability`.
      
      This PR adds one more special logic in `FoldablePropagation`.
      
      ## How was this patch tested?
      
      new test in `DataFrameSuite`
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #13884 from cloud-fan/bug.
      1f2776df
    • Yin Huai's avatar
      [SPARK-16224] [SQL] [PYSPARK] SparkSession builder's configs need to be set to... · 0923c4f5
      Yin Huai authored
      [SPARK-16224] [SQL] [PYSPARK] SparkSession builder's configs need to be set to the existing Scala SparkContext's SparkConf
      
      ## What changes were proposed in this pull request?
      When we create a SparkSession at the Python side, it is possible that a SparkContext has been created. For this case, we need to set configs of the SparkSession builder to the Scala SparkContext's SparkConf (we need to do so because conf changes on a active Python SparkContext will not be propagated to the JVM side). Otherwise, we may create a wrong SparkSession (e.g. Hive support is not enabled even if enableHiveSupport is called).
      
      ## How was this patch tested?
      New tests and manual tests.
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #13931 from yhuai/SPARK-16224.
      0923c4f5
    • Yanbo Liang's avatar
      [SPARK-16242][MLLIB][PYSPARK] Conversion between old/new matrix columns in a DataFrame (Python) · e158478a
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      This PR implements python wrappers for #13888 to convert old/new matrix columns in a DataFrame.
      
      ## How was this patch tested?
      Doctest in python.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #13935 from yanboliang/spark-16242.
      e158478a
    • Prashant Sharma's avatar
      [SPARK-16128][SQL] Allow setting length of characters to be truncated to, in Dataset.show function. · f6b497fc
      Prashant Sharma authored
      ## What changes were proposed in this pull request?
      
      Allowing truncate to a specific number of character is convenient at times, especially while operating from the REPL. Sometimes those last few characters make all the difference, and showing everything brings in whole lot of noise.
      
      ## How was this patch tested?
      Existing tests. + 1 new test in DataFrameSuite.
      
      For SparkR and pyspark, existing tests and manual testing.
      
      Author: Prashant Sharma <prashsh1@in.ibm.com>
      Author: Prashant Sharma <prashant@apache.org>
      
      Closes #13839 from ScrapCodes/add_truncateTo_DF.show.
      f6b497fc
    • gatorsmile's avatar
      [SPARK-16202][SQL][DOC] Correct The Description of CreatableRelationProvider's createRelation · 4cbf611c
      gatorsmile authored
      #### What changes were proposed in this pull request?
      The API description of `createRelation` in `CreatableRelationProvider` is misleading. The current description only expects users to return the relation.
      
      ```Scala
      trait CreatableRelationProvider {
        def createRelation(
            sqlContext: SQLContext,
            mode: SaveMode,
            parameters: Map[String, String],
            data: DataFrame): BaseRelation
      }
      ```
      
      However, the major goal of this API should also include saving the `DataFrame`.
      
      Since this API is critical for Data Source API developers, this PR is to correct the description.
      
      #### How was this patch tested?
      N/A
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #13903 from gatorsmile/readUnderscoreFiles.
      4cbf611c
    • Yin Huai's avatar
      [SPARK-15863][SQL][DOC][FOLLOW-UP] Update SQL programming guide. · dd6b7dbe
      Yin Huai authored
      ## What changes were proposed in this pull request?
      This PR makes several updates to SQL programming guide.
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #13938 from yhuai/doc.
      dd6b7dbe
    • Dongjoon Hyun's avatar
      [SPARK-16221][SQL] Redirect Parquet JUL logger via SLF4J for WRITE operations · a0da854f
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      [SPARK-8118](https://github.com/apache/spark/pull/8196) implements redirecting Parquet JUL logger via SLF4J, but it is currently applied only when READ operations occurs. If users use only WRITE operations, there occurs many Parquet logs.
      
      This PR makes the redirection work on WRITE operations, too.
      
      **Before**
      ```scala
      scala> spark.range(10).write.format("parquet").mode("overwrite").save("/tmp/p")
      SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
      SLF4J: Defaulting to no-operation (NOP) logger implementation
      SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
      Jun 26, 2016 9:04:38 PM INFO: org.apache.parquet.hadoop.codec.CodecConfig: Compression: SNAPPY
      ............ about 70 lines Parquet Log .............
      scala> spark.range(10).write.format("parquet").mode("overwrite").save("/tmp/p")
      ............ about 70 lines Parquet Log .............
      ```
      
      **After**
      ```scala
      scala> spark.range(10).write.format("parquet").mode("overwrite").save("/tmp/p")
      SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
      SLF4J: Defaulting to no-operation (NOP) logger implementation
      SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
      scala> spark.range(10).write.format("parquet").mode("overwrite").save("/tmp/p")
      ```
      
      This PR also fixes some typos.
      
      ## How was this patch tested?
      
      Manual.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #13918 from dongjoon-hyun/SPARK-16221.
      a0da854f
  3. Jun 27, 2016
Loading