Skip to content
Snippets Groups Projects
  1. Jul 10, 2017
  2. Jul 09, 2017
    • Wenchen Fan's avatar
      [SPARK-18016][SQL][FOLLOWUP] merge declareAddedFunctions, initNestedClasses... · 680b33f1
      Wenchen Fan authored
      [SPARK-18016][SQL][FOLLOWUP] merge declareAddedFunctions, initNestedClasses and declareNestedClasses
      
      ## What changes were proposed in this pull request?
      
      These 3 methods have to be used together, so it makes more sense to merge them into one method and then the caller side only need to call one method.
      
      ## How was this patch tested?
      
      existing tests.
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #18579 from cloud-fan/minor.
      680b33f1
  3. Jul 08, 2017
    • Xiao Li's avatar
      [SPARK-21307][REVERT][SQL] Remove SQLConf parameters from the parser-related classes · c3712b77
      Xiao Li authored
      ## What changes were proposed in this pull request?
      Since we do not set active sessions when parsing the plan, we are unable to correctly use SQLConf.get to find the correct active session. Since https://github.com/apache/spark/pull/18531 breaks the build, I plan to revert it at first.
      
      ## How was this patch tested?
      The existing test cases
      
      Author: Xiao Li <gatorsmile@gmail.com>
      
      Closes #18568 from gatorsmile/revert18531.
      c3712b77
    • Zhenhua Wang's avatar
      [SPARK-21083][SQL] Store zero size and row count when analyzing empty table · 9fccc362
      Zhenhua Wang authored
      ## What changes were proposed in this pull request?
      
      We should be able to store zero size and row count after analyzing empty table.
      
      This pr also enhances the test cases for re-analyzing tables.
      
      ## How was this patch tested?
      
      Added a new test case and enhanced some test cases.
      
      Author: Zhenhua Wang <wangzhenhua@huawei.com>
      
      Closes #18292 from wzhfy/analyzeNewColumn.
      9fccc362
    • Dongjoon Hyun's avatar
      [SPARK-21345][SQL][TEST][TEST-MAVEN] SparkSessionBuilderSuite should clean up stopped sessions. · 0b8dd2d0
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      `SparkSessionBuilderSuite` should clean up stopped sessions. Otherwise, it leaves behind some stopped `SparkContext`s interfereing with other test suites using `ShardSQLContext`.
      
      Recently, master branch fails consequtively.
      - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/
      
      ## How was this patch tested?
      
      Pass the Jenkins with a updated suite.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #18567 from dongjoon-hyun/SPARK-SESSION.
      0b8dd2d0
    • Michael Patterson's avatar
      [SPARK-20456][DOCS] Add examples for functions collection for pyspark · f5f02d21
      Michael Patterson authored
      ## What changes were proposed in this pull request?
      
      This adds documentation to many functions in pyspark.sql.functions.py:
      `upper`, `lower`, `reverse`, `unix_timestamp`, `from_unixtime`, `rand`, `randn`, `collect_list`, `collect_set`, `lit`
      Add units to the trigonometry functions.
      Renames columns in datetime examples to be more informative.
      Adds links between some functions.
      
      ## How was this patch tested?
      
      `./dev/lint-python`
      `python python/pyspark/sql/functions.py`
      `./python/run-tests.py --module pyspark-sql`
      
      Author: Michael Patterson <map222@gmail.com>
      
      Closes #17865 from map222/spark-20456.
      f5f02d21
    • Takeshi Yamamuro's avatar
      [SPARK-21281][SQL] Use string types by default if array and map have no argument · 7896e7b9
      Takeshi Yamamuro authored
      ## What changes were proposed in this pull request?
      This pr modified code to use string types by default if `array` and `map` in functions have no argument. This behaviour is the same with Hive one;
      ```
      hive> CREATE TEMPORARY TABLE t1 AS SELECT map();
      hive> DESCRIBE t1;
      _c0   map<string,string>
      
      hive> CREATE TEMPORARY TABLE t2 AS SELECT array();
      hive> DESCRIBE t2;
      _c0   array<string>
      ```
      
      ## How was this patch tested?
      Added tests in `DataFrameFunctionsSuite`.
      
      Author: Takeshi Yamamuro <yamamuro@apache.org>
      
      Closes #18516 from maropu/SPARK-21281.
      7896e7b9
    • Andrew Ray's avatar
      [SPARK-21100][SQL] Add summary method as alternative to describe that gives... · e1a172c2
      Andrew Ray authored
      [SPARK-21100][SQL] Add summary method as alternative to describe that gives quartiles similar to Pandas
      
      ## What changes were proposed in this pull request?
      
      Adds method `summary`  that allows user to specify which statistics and percentiles to calculate. By default it include the existing statistics from `describe` and quartiles (25th, 50th, and 75th percentiles) similar to Pandas. Also changes the implementation of `describe` to delegate to `summary`.
      
      ## How was this patch tested?
      
      additional unit test
      
      Author: Andrew Ray <ray.andrew@gmail.com>
      
      Closes #18307 from aray/SPARK-21100.
      e1a172c2
  4. Jul 07, 2017
    • Wang Gengliang's avatar
      [SPARK-21336] Revise rand comparison in BatchEvalPythonExecSuite · a0fe32a2
      Wang Gengliang authored
      ## What changes were proposed in this pull request?
      
      Revise rand comparison in BatchEvalPythonExecSuite
      
      In BatchEvalPythonExecSuite, there are two cases using the case "rand() > 3"
      Rand() generates a random value in [0, 1), it is wired to be compared with 3, use 0.3 instead
      
      ## How was this patch tested?
      
      unit test
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: Wang Gengliang <ltnwgl@gmail.com>
      
      Closes #18560 from gengliangwang/revise_BatchEvalPythonExecSuite.
      a0fe32a2
    • Wenchen Fan's avatar
      [SPARK-21335][SQL] support un-aliased subquery · fef08130
      Wenchen Fan authored
      ## What changes were proposed in this pull request?
      
      un-aliased subquery is supported by Spark SQL for a long time. Its semantic was not well defined and had confusing behaviors, and it's not a standard SQL syntax, so we disallowed it in https://issues.apache.org/jira/browse/SPARK-20690 .
      
      However, this is a breaking change, and we do have existing queries using un-aliased subquery. We should add the support back and fix its semantic.
      
      This PR fixes the un-aliased subquery by assigning a default alias name.
      
      After this PR, there is no syntax change from branch 2.2 to master, but we invalid a weird use case:
      `SELECT v.i from (SELECT i FROM v)`. Now this query will throw analysis exception because users should not be able to use the qualifier inside a subquery.
      
      ## How was this patch tested?
      
      new regression test
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #18559 from cloud-fan/sub-query.
      fef08130
    • Jacek Laskowski's avatar
      [SPARK-21313][SS] ConsoleSink's string representation · 7fcbb9b5
      Jacek Laskowski authored
      ## What changes were proposed in this pull request?
      
      Add `toString` with options for `ConsoleSink` so it shows nicely in query progress.
      
      **BEFORE**
      
      ```
        "sink" : {
          "description" : "org.apache.spark.sql.execution.streaming.ConsoleSink4b340441"
        }
      ```
      
      **AFTER**
      
      ```
        "sink" : {
          "description" : "ConsoleSink[numRows=10, truncate=false]"
        }
      ```
      
      /cc zsxwing tdas
      
      ## How was this patch tested?
      
      Local build
      
      Author: Jacek Laskowski <jacek@japila.pl>
      
      Closes #18539 from jaceklaskowski/SPARK-21313-ConsoleSink-toString.
      7fcbb9b5
    • Liang-Chi Hsieh's avatar
      [SPARK-20703][SQL][FOLLOW-UP] Associate metrics with data writes onto DataFrameWriter operations · 5df99bd3
      Liang-Chi Hsieh authored
      ## What changes were proposed in this pull request?
      
      Remove time metrics since it seems no way to measure it in non per-row tracking.
      
      ## How was this patch tested?
      
      Existing tests.
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #18558 from viirya/SPARK-20703-followup.
      5df99bd3
    • Kazuaki Ishizaki's avatar
      [SPARK-21217][SQL] Support ColumnVector.Array.to<type>Array() · c09b31eb
      Kazuaki Ishizaki authored
      ## What changes were proposed in this pull request?
      
      This PR implements bulk-copy for `ColumnVector.Array.to<type>Array()` methods (e.g. `toIntArray()`) in `ColumnVector.Array` by using `System.arrayCopy()` or `Platform.copyMemory()`.
      
      Before this PR, when one of these method is called, the generic method in `ArrayData` is called. It is not fast since element-wise copy is performed.
      
      This PR can improve performance of a benchmark program by 1.9x and 3.2x.
      
      Without this PR
      ```
      OpenJDK 64-Bit Server VM 1.8.0_131-8u131-b11-0ubuntu1.16.04.2-b11 on Linux 4.4.0-66-generic
      Intel(R) Xeon(R) CPU E5-2667 v3  3.20GHz
      
      Int Array                                Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)
      ------------------------------------------------------------------------------------------------
      ON_HEAP                                        586 /  628         14.3          69.9
      OFF_HEAP                                       893 /  902          9.4         106.5
      ```
      
      With this PR
      ```
      OpenJDK 64-Bit Server VM 1.8.0_131-8u131-b11-0ubuntu1.16.04.2-b11 on Linux 4.4.0-66-generic
      Intel(R) Xeon(R) CPU E5-2667 v3  3.20GHz
      
      Int Array                                Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)
      ------------------------------------------------------------------------------------------------
      ON_HEAP                                        306 /  331         27.4          36.4
      OFF_HEAP                                       282 /  287         29.8          33.6
      ```
      
      Source program
      ```
          (MemoryMode.ON_HEAP :: MemoryMode.OFF_HEAP :: Nil).foreach { memMode => {
            val len = 8 * 1024 * 1024
            val column = ColumnVector.allocate(len * 2, new ArrayType(IntegerType, false), memMode)
      
            val data = column.arrayData
            var i = 0
            while (i < len) {
              data.putInt(i, i)
              i += 1
            }
            column.putArray(0, 0, len)
      
            val benchmark = new Benchmark("Int Array", len, minNumIters = 20)
            benchmark.addCase(s"$memMode") { iter =>
              var i = 0
              while (i < 50) {
                column.getArray(0).toIntArray
                i += 1
              }
            }
            benchmark.run
          }}
      ```
      
      ## How was this patch tested?
      
      Added test suite
      
      Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com>
      
      Closes #18425 from kiszk/SPARK-21217.
      c09b31eb
  5. Jul 06, 2017
    • Jacek Laskowski's avatar
      [SPARK-21329][SS] Make EventTimeWatermarkExec explicitly UnaryExecNode · e5bb2617
      Jacek Laskowski authored
      ## What changes were proposed in this pull request?
      
      Making EventTimeWatermarkExec explicitly UnaryExecNode
      
      /cc tdas zsxwing
      
      ## How was this patch tested?
      
      Local build.
      
      Author: Jacek Laskowski <jacek@japila.pl>
      
      Closes #18509 from jaceklaskowski/EventTimeWatermarkExec-UnaryExecNode.
      e5bb2617
    • Wenchen Fan's avatar
      [SPARK-20946][SQL] Do not update conf for existing SparkContext in SparkSession.getOrCreate · 40c7add3
      Wenchen Fan authored
      ## What changes were proposed in this pull request?
      
      SparkContext is shared by all sessions, we should not update its conf for only one session.
      
      ## How was this patch tested?
      
      existing tests
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #18536 from cloud-fan/config.
      40c7add3
    • Tathagata Das's avatar
      [SPARK-21267][SS][DOCS] Update Structured Streaming Documentation · 0217dfd2
      Tathagata Das authored
      ## What changes were proposed in this pull request?
      
      Few changes to the Structured Streaming documentation
      - Clarify that the entire stream input table is not materialized
      - Add information for Ganglia
      - Add Kafka Sink to the main docs
      - Removed a couple of leftover experimental tags
      - Added more associated reading material and talk videos.
      
      In addition, https://github.com/apache/spark/pull/16856 broke the link to the RDD programming guide in several places while renaming the page. This PR fixes those sameeragarwal cloud-fan.
      - Added a redirection to avoid breaking internal and possible external links.
      - Removed unnecessary redirection pages that were there since the separate scala, java, and python programming guides were merged together in 2013 or 2014.
      
      ## How was this patch tested?
      
      (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
      (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #18485 from tdas/SPARK-21267.
      0217dfd2
    • Wang Gengliang's avatar
      [SPARK-21323][SQL] Rename plans.logical.statsEstimation.Range to ValueInterval · bf66335a
      Wang Gengliang authored
      ## What changes were proposed in this pull request?
      
      Rename org.apache.spark.sql.catalyst.plans.logical.statsEstimation.Range to ValueInterval.
      The current naming is identical to logical operator "range".
      Refactoring it to ValueInterval is more accurate.
      
      ## How was this patch tested?
      
      unit test
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: Wang Gengliang <ltnwgl@gmail.com>
      
      Closes #18549 from gengliangwang/ValueInterval.
      bf66335a
    • Liang-Chi Hsieh's avatar
      [SPARK-21204][SQL] Add support for Scala Set collection types in serialization · 48e44b24
      Liang-Chi Hsieh authored
      ## What changes were proposed in this pull request?
      
      Currently we can't produce a `Dataset` containing `Set` in SparkSQL. This PR tries to support serialization/deserialization of `Set`.
      
      Because there's no corresponding internal data type in SparkSQL for a `Set`, the most proper choice for serializing a set should be an array.
      
      ## How was this patch tested?
      
      Added unit tests.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #18416 from viirya/SPARK-21204.
      48e44b24
    • Bogdan Raducanu's avatar
      [SPARK-21228][SQL] InSet incorrect handling of structs · 26ac085d
      Bogdan Raducanu authored
      ## What changes were proposed in this pull request?
      When data type is struct, InSet now uses TypeUtils.getInterpretedOrdering (similar to EqualTo) to build a TreeSet. In other cases it will use a HashSet as before (which should be faster). Similarly, In.eval uses Ordering.equiv instead of equals.
      
      ## How was this patch tested?
      New test in SQLQuerySuite.
      
      Author: Bogdan Raducanu <bogdan@databricks.com>
      
      Closes #18455 from bogdanrdc/SPARK-21228.
      26ac085d
    • Wang Gengliang's avatar
      [SPARK-21273][SQL][FOLLOW-UP] Add missing test cases back and revise code style · d540dfbf
      Wang Gengliang authored
      ## What changes were proposed in this pull request?
      
      Add missing test cases back and revise code style
      
      Follow up the previous PR: https://github.com/apache/spark/pull/18479
      
      ## How was this patch tested?
      
      Unit test
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: Wang Gengliang <ltnwgl@gmail.com>
      
      Closes #18548 from gengliangwang/stat_propagation_revise.
      d540dfbf
    • wangzhenhua's avatar
      [SPARK-21324][TEST] Improve statistics test suites · b8e4d567
      wangzhenhua authored
      ## What changes were proposed in this pull request?
      
      1. move `StatisticsCollectionTestBase` to a separate file.
      2. move some test cases to `StatisticsCollectionSuite` so that `hive/StatisticsSuite` only keeps tests that need hive support.
      3. clear up some test cases.
      
      ## How was this patch tested?
      
      Existing tests.
      
      Author: wangzhenhua <wangzhenhua@huawei.com>
      Author: Zhenhua Wang <wzh_zju@163.com>
      
      Closes #18545 from wzhfy/cleanStatSuites.
      b8e4d567
    • Liang-Chi Hsieh's avatar
      [SPARK-20703][SQL] Associate metrics with data writes onto DataFrameWriter operations · 6ff05a66
      Liang-Chi Hsieh authored
      ## What changes were proposed in this pull request?
      
      Right now in the UI, after SPARK-20213, we can show the operations to write data out. However, there is no way to associate metrics with data writes. We should show relative metrics on the operations.
      
      #### Supported commands
      
      This change supports updating metrics for file-based data writing operations, including `InsertIntoHadoopFsRelationCommand`, `InsertIntoHiveTable`.
      
      Supported metrics:
      
      * number of written files
      * number of dynamic partitions
      * total bytes of written data
      * total number of output rows
      * average writing data out time (ms)
      * (TODO) min/med/max number of output rows per file/partition
      * (TODO) min/med/max bytes of written data per file/partition
      
      ####  Commands not supported
      
      `InsertIntoDataSourceCommand`, `SaveIntoDataSourceCommand`:
      
      The two commands uses DataSource APIs to write data out, i.e., the logic of writing data out is delegated to the DataSource implementations, such as  `InsertableRelation.insert` and `CreatableRelationProvider.createRelation`. So we can't obtain metrics from delegated methods for now.
      
      `CreateHiveTableAsSelectCommand`, `CreateDataSourceTableAsSelectCommand` :
      
      The two commands invokes other commands to write data out. The invoked commands can even write to non file-based data source. We leave them as future TODO.
      
      #### How to update metrics of writing files out
      
      A `RunnableCommand` which wants to update metrics, needs to override its `metrics` and provide the metrics data structure to `ExecutedCommandExec`.
      
      The metrics are prepared during the execution of `FileFormatWriter`. The callback function passed to `FileFormatWriter` will accept the metrics and update accordingly.
      
      There is a metrics updating function in `RunnableCommand`. In runtime, the function will be bound to the spark context and `metrics` of `ExecutedCommandExec` and pass to `FileFormatWriter`.
      
      ## How was this patch tested?
      
      Updated unit tests.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #18159 from viirya/SPARK-20703-2.
      6ff05a66
    • Tathagata Das's avatar
      [SS][MINOR] Fix flaky test in DatastreamReaderWriterSuite. temp checkpoint dir should be deleted · 60043f22
      Tathagata Das authored
      ## What changes were proposed in this pull request?
      
      Stopping query while it is being initialized can throw interrupt exception, in which case temporary checkpoint directories will not be deleted, and the test will fail.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #18442 from tdas/DatastreamReaderWriterSuite-fix.
      60043f22
    • Sumedh Wale's avatar
      [SPARK-21312][SQL] correct offsetInBytes in UnsafeRow.writeToStream · 14a3bb3a
      Sumedh Wale authored
      ## What changes were proposed in this pull request?
      
      Corrects offsetInBytes calculation in UnsafeRow.writeToStream. Known failures include writes to some DataSources that have own SparkPlan implementations and cause EXCHANGE in writes.
      
      ## How was this patch tested?
      
      Extended UnsafeRowSuite.writeToStream to include an UnsafeRow over byte array having non-zero offset.
      
      Author: Sumedh Wale <swale@snappydata.io>
      
      Closes #18535 from sumwale/SPARK-21312.
      14a3bb3a
    • gatorsmile's avatar
      [SPARK-21308][SQL] Remove SQLConf parameters from the optimizer · 75b168fd
      gatorsmile authored
      ### What changes were proposed in this pull request?
      This PR removes SQLConf parameters from the optimizer rules
      
      ### How was this patch tested?
      The existing test cases
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #18533 from gatorsmile/rmSQLConfOptimizer.
      75b168fd
  6. Jul 05, 2017
    • Shixiong Zhu's avatar
      [SPARK-21248][SS] The clean up codes in StreamExecution should not be interrupted · ab866f11
      Shixiong Zhu authored
      ## What changes were proposed in this pull request?
      
      This PR uses `runUninterruptibly` to avoid that the clean up codes in StreamExecution is interrupted. It also removes an optimization in `runUninterruptibly` to make sure this method never throw `InterruptedException`.
      
      ## How was this patch tested?
      
      Jenkins
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #18461 from zsxwing/SPARK-21248.
      ab866f11
    • gatorsmile's avatar
      [SPARK-21307][SQL] Remove SQLConf parameters from the parser-related classes. · c8e7f445
      gatorsmile authored
      ### What changes were proposed in this pull request?
      This PR is to remove SQLConf parameters from the parser-related classes.
      
      ### How was this patch tested?
      The existing test cases.
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #18531 from gatorsmile/rmSQLConfParser.
      c8e7f445
    • Jeff Zhang's avatar
      [SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFunction Should Support UDAFs · 742da086
      Jeff Zhang authored
      ## What changes were proposed in this pull request?
      
      Support register Java UDAFs in PySpark so that user can use Java UDAF in PySpark. Besides that I also add api in `UDFRegistration`
      
      ## How was this patch tested?
      
      Unit test is added
      
      Author: Jeff Zhang <zjffdu@apache.org>
      
      Closes #17222 from zjffdu/SPARK-19439.
      742da086
    • ouyangxiaochen's avatar
      [SPARK-20383][SQL] Supporting Create [temporary] Function with the keyword 'OR... · 5787ace4
      ouyangxiaochen authored
      [SPARK-20383][SQL] Supporting Create [temporary] Function with the keyword 'OR REPLACE' and 'IF NOT EXISTS'
      
      ## What changes were proposed in this pull request?
      
      support to create [temporary] function with the keyword 'OR REPLACE' and 'IF NOT EXISTS'
      
      ## How was this patch tested?
      manual test and added test cases
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: ouyangxiaochen <ou.yangxiaochen@zte.com.cn>
      
      Closes #17681 from ouyangxiaochen/spark-419.
      5787ace4
    • Takuya UESHIN's avatar
      [SPARK-16167][SQL] RowEncoder should preserve array/map type nullability. · 873f3ad2
      Takuya UESHIN authored
      ## What changes were proposed in this pull request?
      
      Currently `RowEncoder` doesn't preserve nullability of `ArrayType` or `MapType`.
      It returns always `containsNull = true` for `ArrayType`, `valueContainsNull = true` for `MapType` and also the nullability of itself is always `true`.
      
      This pr fixes the nullability of them.
      ## How was this patch tested?
      
      Add tests to check if `RowEncoder` preserves array/map nullability.
      
      Author: Takuya UESHIN <ueshin@happy-camper.st>
      Author: Takuya UESHIN <ueshin@databricks.com>
      
      Closes #13873 from ueshin/issues/SPARK-16167.
      873f3ad2
    • Takuya UESHIN's avatar
      [SPARK-18623][SQL] Add `returnNullable` to `StaticInvoke` and modify it to handle properly. · a3864325
      Takuya UESHIN authored
      ## What changes were proposed in this pull request?
      
      Add `returnNullable` to `StaticInvoke` the same as #15780 is trying to add to `Invoke` and modify to handle properly.
      
      ## How was this patch tested?
      
      Existing tests.
      
      Author: Takuya UESHIN <ueshin@happy-camper.st>
      Author: Takuya UESHIN <ueshin@databricks.com>
      
      Closes #16056 from ueshin/issues/SPARK-18623.
      a3864325
    • Wenchen Fan's avatar
      [SPARK-21304][SQL] remove unnecessary isNull variable for collection related encoder expressions · f2c3b1dd
      Wenchen Fan authored
      ## What changes were proposed in this pull request?
      
      For these collection-related encoder expressions, we don't need to create `isNull` variable if the loop element is not nullable.
      
      ## How was this patch tested?
      
      existing tests.
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #18529 from cloud-fan/minor.
      f2c3b1dd
  7. Jul 04, 2017
    • Takuya UESHIN's avatar
      [SPARK-21300][SQL] ExternalMapToCatalyst should null-check map key prior to... · ce10545d
      Takuya UESHIN authored
      [SPARK-21300][SQL] ExternalMapToCatalyst should null-check map key prior to converting to internal value.
      
      ## What changes were proposed in this pull request?
      
      `ExternalMapToCatalyst` should null-check map key prior to converting to internal value to throw an appropriate Exception instead of something like NPE.
      
      ## How was this patch tested?
      
      Added a test and existing tests.
      
      Author: Takuya UESHIN <ueshin@databricks.com>
      
      Closes #18524 from ueshin/issues/SPARK-21300.
      ce10545d
    • gatorsmile's avatar
      [SPARK-21295][SQL] Use qualified names in error message for missing references · de14086e
      gatorsmile authored
      ### What changes were proposed in this pull request?
      It is strange to see the following error message. Actually, the column is from another table.
      ```
      cannot resolve '`right.a`' given input columns: [a, c, d];
      ```
      
      After the PR, the error message looks like
      ```
      cannot resolve '`right.a`' given input columns: [left.a, right.c, right.d];
      ```
      
      ### How was this patch tested?
      Added a test case
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #18520 from gatorsmile/removeSQLConf.
      de14086e
    • Dongjoon Hyun's avatar
      [SPARK-20256][SQL] SessionState should be created more lazily · 1b50e0e0
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      `SessionState` is designed to be created lazily. However, in reality, it created immediately in `SparkSession.Builder.getOrCreate` ([here](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala#L943)).
      
      This PR aims to recover the lazy behavior by keeping the options into `initialSessionOptions`. The benefit is like the following. Users can start `spark-shell` and use RDD operations without any problems.
      
      **BEFORE**
      ```scala
      $ bin/spark-shell
      java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder'
      ...
      Caused by: org.apache.spark.sql.AnalysisException:
          org.apache.hadoop.hive.ql.metadata.HiveException:
             MetaException(message:java.security.AccessControlException:
                Permission denied: user=spark, access=READ,
                   inode="/apps/hive/warehouse":hive:hdfs:drwx------
      ```
      As reported in SPARK-20256, this happens when the warehouse directory is not allowed for this user.
      
      **AFTER**
      ```scala
      $ bin/spark-shell
      ...
      Welcome to
            ____              __
           / __/__  ___ _____/ /__
          _\ \/ _ \/ _ `/ __/  '_/
         /___/ .__/\_,_/_/ /_/\_\   version 2.3.0-SNAPSHOT
            /_/
      
      Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112)
      Type in expressions to have them evaluated.
      Type :help for more information.
      
      scala> sc.range(0, 10, 1).count()
      res0: Long = 10
      ```
      
      ## How was this patch tested?
      
      Manual.
      
      This closes #18512 .
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #18501 from dongjoon-hyun/SPARK-20256.
      1b50e0e0
    • YIHAODIAN\wangshuangshuang's avatar
      [SPARK-19726][SQL] Faild to insert null timestamp value to mysql using spark jdbc · a3c29fcb
      YIHAODIAN\wangshuangshuang authored
      ## What changes were proposed in this pull request?
      
      when creating table like following:
      > create table timestamp_test(id int(11), time_stamp timestamp not null default current_timestamp);
      
      The result of Excuting "insert into timestamp_test values (111, null)" is different between Spark and JDBC.
      ```
      mysql> select * from timestamp_test;
      +------+---------------------+
      | id   | time_stamp          |
      +------+---------------------+
      |  111 | 1970-01-01 00:00:00 | -> spark
      |  111 | 2017-06-27 19:32:38 | -> mysql
      +------+---------------------+
      2 rows in set (0.00 sec)
      ```
         Because in such case ```StructField.nullable``` is false, so the generated codes of ```InvokeLike``` and ```BoundReference``` don't check whether the field is null or not. Instead, they directly use ```CodegenContext.INPUT_ROW.getLong(1)```, however, ```UnsafeRow.setNullAt(1)``` will put 0 in the underlying memory.
      
         The PR will ```always``` set ```StructField.nullable```  true after obtaining metadata from jdbc connection, Since we can insert null to not null timestamp column in MySQL. In this way, spark will propagate null to underlying DB engine, and let DB to choose how to process NULL.
      
      ## How was this patch tested?
      
      Added tests.
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: YIHAODIAN\wangshuangshuang <wangshuangshuang@yihaodian.com>
      Author: Shuangshuang Wang <wsszone@gmail.com>
      
      Closes #18445 from shuangshuangwang/SPARK-19726.
      a3c29fcb
    • gatorsmile's avatar
      [SPARK-21256][SQL] Add withSQLConf to Catalyst Test · 29b1f6b0
      gatorsmile authored
      ### What changes were proposed in this pull request?
      SQLConf is moved to Catalyst. We are adding more and more test cases for verifying the conf-specific behaviors. It is nice to add a helper function to simplify the test cases.
      
      ### How was this patch tested?
      N/A
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #18469 from gatorsmile/withSQLConf.
      29b1f6b0
  8. Jul 03, 2017
    • gatorsmile's avatar
      [TEST] Different behaviors of SparkContext Conf when building SparkSession · c79c10eb
      gatorsmile authored
      ## What changes were proposed in this pull request?
      If the created ACTIVE sparkContext is not EXPLICITLY passed through the Builder's API `sparkContext()`, the conf of this sparkContext will also contain the conf set through the API `config()`; otherwise, the conf of this sparkContext will NOT contain the conf set through the API `config()`
      
      ## How was this patch tested?
      N/A
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #18517 from gatorsmile/fixTestCase2.
      c79c10eb
    • Wenchen Fan's avatar
      [SPARK-21284][SQL] rename SessionCatalog.registerFunction parameter name · f953ca56
      Wenchen Fan authored
      ## What changes were proposed in this pull request?
      
      Looking at the code in `SessionCatalog.registerFunction`, the parameter `ignoreIfExists` is a wrong name. When `ignoreIfExists` is true, we will override the function if it already exists. So `overrideIfExists` should be the corrected name.
      
      ## How was this patch tested?
      
      N/A
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #18510 from cloud-fan/minor.
      f953ca56
    • Takeshi Yamamuro's avatar
      [SPARK-20073][SQL] Prints an explicit warning message in case of NULL-safe equals · 363bfe30
      Takeshi Yamamuro authored
      ## What changes were proposed in this pull request?
      This pr added code to print the same warning messages with `===` cases when using NULL-safe equals (`<=>`).
      
      ## How was this patch tested?
      Existing tests.
      
      Author: Takeshi Yamamuro <yamamuro@apache.org>
      
      Closes #18436 from maropu/SPARK-20073.
      363bfe30
Loading