Skip to content
Snippets Groups Projects
  1. Jul 30, 2017
  2. Jun 12, 2017
    • Dongjoon Hyun's avatar
      [SPARK-20345][SQL] Fix STS error handling logic on HiveSQLException · 32818d9b
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      [SPARK-5100](https://github.com/apache/spark/commit/343d3bfafd449a0371feb6a88f78e07302fa7143) added Spark Thrift Server(STS) UI and the following logic to handle exceptions on case `Throwable`.
      
      ```scala
      HiveThriftServer2.listener.onStatementError(
        statementId, e.getMessage, SparkUtils.exceptionString(e))
      ```
      
      However, there occurred a missed case after implementing [SPARK-6964](https://github.com/apache/spark/commit/eb19d3f75cbd002f7e72ce02017a8de67f562792)'s `Support Cancellation in the Thrift Server` by adding case `HiveSQLException` before case `Throwable`.
      
      ```scala
      case e: HiveSQLException =>
        if (getStatus().getState() == OperationState.CANCELED) {
          return
        } else {
          setState(OperationState.ERROR)
          throw e
        }
        // Actually do need to catch Throwable as some failures don't inherit from Exception and
        // HiveServer will silently swallow them.
      case e: Throwable =>
        val currentState = getStatus().getState()
        logError(s"Error executing query, currentState $currentState, ", e)
        setState(OperationState.ERROR)
        HiveThriftServer2.listener.onStatementError(
          statementId, e.getMessage, SparkUtils.exceptionString(e))
        throw new HiveSQLException(e.toString)
      ```
      
      Logically, we had better add `HiveThriftServer2.listener.onStatementError` on case `HiveSQLException`, too.
      
      ## How was this patch tested?
      
      N/A
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #17643 from dongjoon-hyun/SPARK-20345.
      32818d9b
  3. Jun 07, 2017
  4. May 30, 2017
  5. May 10, 2017
    • NICHOLAS T. MARION's avatar
      [SPARK-20393][WEBU UI] Strengthen Spark to prevent XSS vulnerabilities · b512233a
      NICHOLAS T. MARION authored
      ## What changes were proposed in this pull request?
      
      Add stripXSS and stripXSSMap to Spark Core's UIUtils. Calling these functions at any point that getParameter is called against a HttpServletRequest.
      
      ## How was this patch tested?
      
      Unit tests, IBM Security AppScan Standard no longer showing vulnerabilities, manual verification of WebUI pages.
      
      Author: NICHOLAS T. MARION <nmarion@us.ibm.com>
      
      Closes #17686 from n-marion/xss-fix.
      b512233a
  6. Apr 24, 2017
  7. Apr 15, 2017
  8. Apr 12, 2017
    • hyukjinkwon's avatar
      [SPARK-18692][BUILD][DOCS] Test Java 8 unidoc build on Jenkins · ceaf77ae
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      This PR proposes to run Spark unidoc to test Javadoc 8 build as Javadoc 8 is easily re-breakable.
      
      There are several problems with it:
      
      - It introduces little extra bit of time to run the tests. In my case, it took 1.5 mins more (`Elapsed :[94.8746569157]`). How it was tested is described in "How was this patch tested?".
      
      - > One problem that I noticed was that Unidoc appeared to be processing test sources: if we can find a way to exclude those from being processed in the first place then that might significantly speed things up.
      
        (see  joshrosen's [comment](https://issues.apache.org/jira/browse/SPARK-18692?focusedCommentId=15947627&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15947627))
      
      To complete this automated build, It also suggests to fix existing Javadoc breaks / ones introduced by test codes as described above.
      
      There fixes are similar instances that previously fixed. Please refer https://github.com/apache/spark/pull/15999 and https://github.com/apache/spark/pull/16013
      
      Note that this only fixes **errors** not **warnings**. Please see my observation https://github.com/apache/spark/pull/17389#issuecomment-288438704 for spurious errors by warnings.
      
      ## How was this patch tested?
      
      Manually via `jekyll build` for building tests. Also, tested via running `./dev/run-tests`.
      
      This was tested via manually adding `time.time()` as below:
      
      ```diff
           profiles_and_goals = build_profiles + sbt_goals
      
           print("[info] Building Spark unidoc (w/Hive 1.2.1) using SBT with these arguments: ",
                 " ".join(profiles_and_goals))
      
      +    import time
      +    st = time.time()
           exec_sbt(profiles_and_goals)
      +    print("Elapsed :[%s]" % str(time.time() - st))
      ```
      
      produces
      
      ```
      ...
      ========================================================================
      Building Unidoc API Documentation
      ========================================================================
      ...
      [info] Main Java API documentation successful.
      ...
      Elapsed :[94.8746569157]
      ...
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #17477 from HyukjinKwon/SPARK-18692.
      ceaf77ae
  9. Apr 10, 2017
    • Sean Owen's avatar
      [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java String toLowerCase "Turkish... · a26e3ed5
      Sean Owen authored
      [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java String toLowerCase "Turkish locale bug" causes Spark problems
      
      ## What changes were proposed in this pull request?
      
      Add Locale.ROOT to internal calls to String `toLowerCase`, `toUpperCase`, to avoid inadvertent locale-sensitive variation in behavior (aka the "Turkish locale problem").
      
      The change looks large but it is just adding `Locale.ROOT` (the locale with no country or language specified) to every call to these methods.
      
      ## How was this patch tested?
      
      Existing tests.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #17527 from srowen/SPARK-20156.
      a26e3ed5
  10. Apr 02, 2017
  11. Mar 29, 2017
  12. Mar 28, 2017
  13. Feb 25, 2017
  14. Jan 10, 2017
  15. Jan 04, 2017
    • Niranjan Padmanabhan's avatar
      [MINOR][DOCS] Remove consecutive duplicated words/typo in Spark Repo · a1e40b1f
      Niranjan Padmanabhan authored
      ## What changes were proposed in this pull request?
      There are many locations in the Spark repo where the same word occurs consecutively. Sometimes they are appropriately placed, but many times they are not. This PR removes the inappropriately duplicated words.
      
      ## How was this patch tested?
      N/A since only docs or comments were updated.
      
      Author: Niranjan Padmanabhan <niranjan.padmanabhan@gmail.com>
      
      Closes #16455 from neurons/np.structure_streaming_doc.
      Unverified
      a1e40b1f
  16. Dec 27, 2016
    • gatorsmile's avatar
      [SPARK-18992][SQL] Move spark.sql.hive.thriftServer.singleSession to SQLConf · 5ac62043
      gatorsmile authored
      ### What changes were proposed in this pull request?
      
      Since `spark.sql.hive.thriftServer.singleSession` is a configuration of SQL component, this conf can be moved from `SparkConf` to `StaticSQLConf`.
      
      When we introduced `spark.sql.hive.thriftServer.singleSession`, all the SQL configuration are session specific. They can be modified in different sessions.
      
      In Spark 2.1, static SQL configuration is added. It is a perfect fit for `spark.sql.hive.thriftServer.singleSession`. Previously, we did the same move for `spark.sql.warehouse.dir` from `SparkConf` to `StaticSQLConf`
      
      ### How was this patch tested?
      Added test cases in HiveThriftServer2Suites.scala
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #16392 from gatorsmile/hiveThriftServerSingleSession.
      5ac62043
  17. Dec 21, 2016
    • Ryan Williams's avatar
      [SPARK-17807][CORE] split test-tags into test-JAR · afd9bc1d
      Ryan Williams authored
      Remove spark-tag's compile-scope dependency (and, indirectly, spark-core's compile-scope transitive-dependency) on scalatest by splitting test-oriented tags into spark-tags' test JAR.
      
      Alternative to #16303.
      
      Author: Ryan Williams <ryan.blake.williams@gmail.com>
      
      Closes #16311 from ryan-williams/tt.
      afd9bc1d
  18. Dec 02, 2016
  19. Nov 10, 2016
  20. Nov 07, 2016
    • Ryan Blue's avatar
      [SPARK-18086] Add support for Hive session vars. · 9b0593d5
      Ryan Blue authored
      ## What changes were proposed in this pull request?
      
      This adds support for Hive variables:
      
      * Makes values set via `spark-sql --hivevar name=value` accessible
      * Adds `getHiveVar` and `setHiveVar` to the `HiveClient` interface
      * Adds a SessionVariables trait for sessions like Hive that support variables (including Hive vars)
      * Adds SessionVariables support to variable substitution
      * Adds SessionVariables support to the SET command
      
      ## How was this patch tested?
      
      * Adds a test to all supported Hive versions for accessing Hive variables
      * Adds HiveVariableSubstitutionSuite
      
      Author: Ryan Blue <blue@apache.org>
      
      Closes #15738 from rdblue/SPARK-18086-add-hivevar-support.
      9b0593d5
  21. Nov 01, 2016
    • Josh Rosen's avatar
      [SPARK-17350][SQL] Disable default use of KryoSerializer in Thrift Server · 6e629815
      Josh Rosen authored
      In SPARK-4761 / #3621 (December 2014) we enabled Kryo serialization by default in the Spark Thrift Server. However, I don't think that the original rationale for doing this still holds now that most Spark SQL serialization is now performed via encoders and our UnsafeRow format.
      
      In addition, the use of Kryo as the default serializer can introduce performance problems because the creation of new KryoSerializer instances is expensive and we haven't performed instance-reuse optimizations in several code paths (including DirectTaskResult deserialization).
      
      Given all of this, I propose to revert back to using JavaSerializer as the default serializer in the Thrift Server.
      
      /cc liancheng
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #14906 from JoshRosen/disable-kryo-in-thriftserver.
      6e629815
  22. Oct 16, 2016
    • Dongjoon Hyun's avatar
      [SPARK-17819][SQL] Support default database in connection URIs for Spark Thrift Server · 59e3eb5a
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      Currently, Spark Thrift Server ignores the default database in URI. This PR supports that like the following.
      
      ```sql
      $ bin/beeline -u jdbc:hive2://localhost:10000 -e "create database testdb"
      $ bin/beeline -u jdbc:hive2://localhost:10000/testdb -e "create table t(a int)"
      $ bin/beeline -u jdbc:hive2://localhost:10000/testdb -e "show tables"
      ...
      +------------+--------------+--+
      | tableName  | isTemporary  |
      +------------+--------------+--+
      | t          | false        |
      +------------+--------------+--+
      1 row selected (0.347 seconds)
      $ bin/beeline -u jdbc:hive2://localhost:10000 -e "show tables"
      ...
      +------------+--------------+--+
      | tableName  | isTemporary  |
      +------------+--------------+--+
      +------------+--------------+--+
      No rows selected (0.098 seconds)
      ```
      
      ## How was this patch tested?
      
      Manual.
      
      Note: I tried to add a test case for this, but I cannot found a suitable testsuite for this. I'll add the testcase if some advice is given.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #15399 from dongjoon-hyun/SPARK-17819.
      59e3eb5a
  23. Oct 07, 2016
    • Sean Owen's avatar
      [SPARK-17707][WEBUI] Web UI prevents spark-submit application to be finished · cff56075
      Sean Owen authored
      ## What changes were proposed in this pull request?
      
      This expands calls to Jetty's simple `ServerConnector` constructor to explicitly specify a `ScheduledExecutorScheduler` that makes daemon threads. It should otherwise result in exactly the same configuration, because the other args are copied from the constructor that is currently called.
      
      (I'm not sure we should change the Hive Thriftserver impl, but I did anyway.)
      
      This also adds `sc.stop()` to the quick start guide example.
      
      ## How was this patch tested?
      
      Existing tests; _pending_ at least manual verification of the fix.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #15381 from srowen/SPARK-17707.
      cff56075
  24. Oct 03, 2016
    • Dongjoon Hyun's avatar
      [SPARK-17112][SQL] "select null" via JDBC triggers IllegalArgumentException in Thriftserver · c571cfb2
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      Currently, Spark Thrift Server raises `IllegalArgumentException` for queries whose column types are `NullType`, e.g., `SELECT null` or `SELECT if(true,null,null)`. This PR fixes that by returning `void` like Hive 1.2.
      
      **Before**
      ```sql
      $ bin/beeline -u jdbc:hive2://localhost:10000 -e "select null"
      Connecting to jdbc:hive2://localhost:10000
      Connected to: Spark SQL (version 2.1.0-SNAPSHOT)
      Driver: Hive JDBC (version 1.2.1.spark2)
      Transaction isolation: TRANSACTION_REPEATABLE_READ
      Error: java.lang.IllegalArgumentException: Unrecognized type name: null (state=,code=0)
      Closing: 0: jdbc:hive2://localhost:10000
      
      $ bin/beeline -u jdbc:hive2://localhost:10000 -e "select if(true,null,null)"
      Connecting to jdbc:hive2://localhost:10000
      Connected to: Spark SQL (version 2.1.0-SNAPSHOT)
      Driver: Hive JDBC (version 1.2.1.spark2)
      Transaction isolation: TRANSACTION_REPEATABLE_READ
      Error: java.lang.IllegalArgumentException: Unrecognized type name: null (state=,code=0)
      Closing: 0: jdbc:hive2://localhost:10000
      ```
      
      **After**
      ```sql
      $ bin/beeline -u jdbc:hive2://localhost:10000 -e "select null"
      Connecting to jdbc:hive2://localhost:10000
      Connected to: Spark SQL (version 2.1.0-SNAPSHOT)
      Driver: Hive JDBC (version 1.2.1.spark2)
      Transaction isolation: TRANSACTION_REPEATABLE_READ
      +-------+--+
      | NULL  |
      +-------+--+
      | NULL  |
      +-------+--+
      1 row selected (3.242 seconds)
      Beeline version 1.2.1.spark2 by Apache Hive
      Closing: 0: jdbc:hive2://localhost:10000
      
      $ bin/beeline -u jdbc:hive2://localhost:10000 -e "select if(true,null,null)"
      Connecting to jdbc:hive2://localhost:10000
      Connected to: Spark SQL (version 2.1.0-SNAPSHOT)
      Driver: Hive JDBC (version 1.2.1.spark2)
      Transaction isolation: TRANSACTION_REPEATABLE_READ
      +-------------------------+--+
      | (IF(true, NULL, NULL))  |
      +-------------------------+--+
      | NULL                    |
      +-------------------------+--+
      1 row selected (0.201 seconds)
      Beeline version 1.2.1.spark2 by Apache Hive
      Closing: 0: jdbc:hive2://localhost:10000
      ```
      
      ## How was this patch tested?
      
      * Pass the Jenkins test with a new testsuite.
      * Also, Manually, after starting Spark Thrift Server, run the following command.
      ```sql
      $ bin/beeline -u jdbc:hive2://localhost:10000 -e "select null"
      $ bin/beeline -u jdbc:hive2://localhost:10000 -e "select if(true,null,null)"
      ```
      
      **Hive 1.2**
      ```sql
      hive> create table null_table as select null;
      hive> desc null_table;
      OK
      _c0                     void
      ```
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #15325 from dongjoon-hyun/SPARK-17112.
      c571cfb2
  25. Aug 24, 2016
    • gatorsmile's avatar
      [SPARK-17190][SQL] Removal of HiveSharedState · 4d0706d6
      gatorsmile authored
      ### What changes were proposed in this pull request?
      Since `HiveClient` is used to interact with the Hive metastore, it should be hidden in `HiveExternalCatalog`. After moving `HiveClient` into `HiveExternalCatalog`, `HiveSharedState` becomes a wrapper of `HiveExternalCatalog`. Thus, removal of `HiveSharedState` becomes straightforward. After removal of `HiveSharedState`, the reflection logic is directly applied on the choice of `ExternalCatalog` types, based on the configuration of `CATALOG_IMPLEMENTATION`.
      
      ~~`HiveClient` is also used/invoked by the other entities besides HiveExternalCatalog, we defines the following two APIs: getClient and getNewClient~~
      
      ### How was this patch tested?
      The existing test cases
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #14757 from gatorsmile/removeHiveClient.
      4d0706d6
  26. Aug 11, 2016
  27. Aug 08, 2016
    • Alice's avatar
      [SPARK-16563][SQL] fix spark sql thrift server FetchResults bug · e17a76ef
      Alice authored
      ## What changes were proposed in this pull request?
      
      Add a constant iterator which point to head of result. The header will be used to reset iterator when fetch result from first row repeatedly.
      JIRA ticket https://issues.apache.org/jira/browse/SPARK-16563
      
      ## How was this patch tested?
      
      This bug was found when using Cloudera HUE connecting to spark sql thrift server, currently SQL statement result can be only fetched for once. The fix was tested manually with Cloudera HUE, With this fix, HUE can fetch spark SQL results repeatedly through thrift server.
      
      Author: Alice <alice.gugu@gmail.com>
      Author: Alice <guhq@garena.com>
      
      Closes #14218 from alicegugu/SparkSQLFetchResultsBug.
      e17a76ef
  28. Jul 19, 2016
  29. Jul 11, 2016
    • Reynold Xin's avatar
      [SPARK-16477] Bump master version to 2.1.0-SNAPSHOT · ffcb6e05
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      After SPARK-16476 (committed earlier today as #14128), we can finally bump the version number.
      
      ## How was this patch tested?
      N/A
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #14130 from rxin/SPARK-16477.
      ffcb6e05
  30. Jul 05, 2016
    • Cheng Hao's avatar
      [SPARK-15730][SQL] Respect the --hiveconf in the spark-sql command line · 920cb5fe
      Cheng Hao authored
      ## What changes were proposed in this pull request?
      This PR makes spark-sql (backed by SparkSQLCLIDriver) respects confs set by hiveconf, which is what we do in previous versions. The change is that when we start SparkSQLCLIDriver, we explicitly set confs set through --hiveconf to SQLContext's conf (basically treating those confs as a SparkSQL conf).
      
      ## How was this patch tested?
      A new test in CliSuite.
      
      Closes #13542
      
      Author: Cheng Hao <hao.cheng@intel.com>
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #14058 from yhuai/hiveConfThriftServer.
      920cb5fe
  31. Jun 24, 2016
  32. Jun 15, 2016
  33. Jun 14, 2016
    • Jeff Zhang's avatar
      doc fix of HiveThriftServer · 53bb0308
      Jeff Zhang authored
      ## What changes were proposed in this pull request?
      
      Just minor doc fix.
      
      \cc yhuai
      
      Author: Jeff Zhang <zjffdu@apache.org>
      
      Closes #13659 from zjffdu/doc_fix.
      53bb0308
  34. May 27, 2016
  35. May 26, 2016
    • Reynold Xin's avatar
      [SPARK-15552][SQL] Remove unnecessary private[sql] methods in SparkSession · 0f61d6ef
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      SparkSession has a list of unnecessary private[sql] methods. These methods cause some trouble because private[sql] doesn't apply in Java. In the cases that they are easy to remove, we can simply remove them. This patch does that.
      
      As part of this pull request, I also replaced a bunch of protected[sql] with private[sql], to tighten up visibility.
      
      ## How was this patch tested?
      Updated test cases to reflect the changes.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #13319 from rxin/SPARK-15552.
      0f61d6ef
  36. May 25, 2016
    • lfzCarlosC's avatar
      [MINOR][MLLIB][STREAMING][SQL] Fix typos · 02c8072e
      lfzCarlosC authored
      fixed typos for source code for components [mllib] [streaming] and [SQL]
      
      None and obvious.
      
      Author: lfzCarlosC <lfz.carlos@gmail.com>
      
      Closes #13298 from lfzCarlosC/master.
      02c8072e
Loading