Skip to content
Snippets Groups Projects
  1. Jun 19, 2017
    • Dongjoon Hyun's avatar
      [MINOR][BUILD] Fix Java linter errors · ecc56313
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      This PR cleans up a few Java linter errors for Apache Spark 2.2 release.
      
      ## How was this patch tested?
      
      ```bash
      $ dev/lint-java
      Using `mvn` from path: /usr/local/bin/mvn
      Checkstyle checks passed.
      ```
      
      We can check the result at Travis CI, [here](https://travis-ci.org/dongjoon-hyun/spark/builds/244297894).
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #18345 from dongjoon-hyun/fix_lint_java_2.
      ecc56313
    • Yong Tang's avatar
      [SPARK-19975][PYTHON][SQL] Add map_keys and map_values functions to Python · e5387018
      Yong Tang authored
      ## What changes were proposed in this pull request?
      
      This fix tries to address the issue in SPARK-19975 where we
      have `map_keys` and `map_values` functions in SQL yet there
      is no Python equivalent functions.
      
      This fix adds `map_keys` and `map_values` functions to Python.
      
      ## How was this patch tested?
      
      This fix is tested manually (See Python docs for examples).
      
      Author: Yong Tang <yong.tang.github@outlook.com>
      
      Closes #17328 from yongtang/SPARK-19975.
      e5387018
    • assafmendelson's avatar
      [SPARK-21123][DOCS][STRUCTURED STREAMING] Options for file stream source are in a wrong table · 66a792cd
      assafmendelson authored
      ## What changes were proposed in this pull request?
      
      The description for several options of File Source for structured streaming appeared in the File Sink description instead.
      
      This pull request has two commits: The first includes changes to the version as it appeared in spark 2.1 and the second handled an additional option added for spark 2.2
      
      ## How was this patch tested?
      
      Built the documentation by SKIP_API=1 jekyll build and visually inspected the structured streaming programming guide.
      
      The original documentation was written by tdas and lw-lin
      
      Author: assafmendelson <assaf.mendelson@gmail.com>
      
      Closes #18342 from assafmendelson/spark-21123.
      66a792cd
    • saturday_s's avatar
      [SPARK-19688][STREAMING] Not to read `spark.yarn.credentials.file` from checkpoint. · e92ffe6f
      saturday_s authored
      ## What changes were proposed in this pull request?
      
      Reload the `spark.yarn.credentials.file` property when restarting a streaming application from checkpoint.
      
      ## How was this patch tested?
      
      Manual tested with 1.6.3 and 2.1.1.
      I didn't test this with master because of some compile problems, but I think it will be the same result.
      
      ## Notice
      
      This should be merged into maintenance branches too.
      
      jira: [SPARK-21008](https://issues.apache.org/jira/browse/SPARK-21008)
      
      Author: saturday_s <shi.indetail@gmail.com>
      
      Closes #18230 from saturday-shi/SPARK-21008.
      e92ffe6f
    • hyukjinkwon's avatar
      [MINOR] Bump SparkR and PySpark version to 2.3.0. · 9a145fd7
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      #17753 bumps master branch version to 2.3.0-SNAPSHOT, but it seems SparkR and PySpark version were omitted.
      
      ditto of https://github.com/apache/spark/pull/16488 / https://github.com/apache/spark/pull/17523
      
      ## How was this patch tested?
      
      N/A
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #18341 from HyukjinKwon/r-version.
      9a145fd7
    • Xiao Li's avatar
      [SPARK-21132][SQL] DISTINCT modifier of function arguments should not be silently ignored · 9413b84b
      Xiao Li authored
      ### What changes were proposed in this pull request?
      We should not silently ignore `DISTINCT` when they are not supported in the function arguments. This PR is to block these cases and issue the error messages.
      
      ### How was this patch tested?
      Added test cases for both regular functions and window functions
      
      Author: Xiao Li <gatorsmile@gmail.com>
      
      Closes #18340 from gatorsmile/firstCount.
      9413b84b
    • Xingbo Jiang's avatar
      [SPARK-19824][CORE] Update JsonProtocol to keep consistent with the UI · ea542d29
      Xingbo Jiang authored
      ## What changes were proposed in this pull request?
      
      Fix any inconsistent part in JsonProtocol with the UI.
      This PR also contains the modifications in #17181
      
      ## How was this patch tested?
      
      Updated JsonProtocolSuite.
      
      Before this change, localhost:8080/json shows:
      ```
      {
        "url" : "spark://xingbos-MBP.local:7077",
        "workers" : [ {
          "id" : "worker-20170615172946-192.168.0.101-49450",
          "host" : "192.168.0.101",
          "port" : 49450,
          "webuiaddress" : "http://192.168.0.101:8081",
          "cores" : 8,
          "coresused" : 8,
          "coresfree" : 0,
          "memory" : 15360,
          "memoryused" : 1024,
          "memoryfree" : 14336,
          "state" : "ALIVE",
          "lastheartbeat" : 1497519481722
        }, {
          "id" : "worker-20170615172948-192.168.0.101-49452",
          "host" : "192.168.0.101",
          "port" : 49452,
          "webuiaddress" : "http://192.168.0.101:8082",
          "cores" : 8,
          "coresused" : 8,
          "coresfree" : 0,
          "memory" : 15360,
          "memoryused" : 1024,
          "memoryfree" : 14336,
          "state" : "ALIVE",
          "lastheartbeat" : 1497519484160
        }, {
          "id" : "worker-20170615172951-192.168.0.101-49469",
          "host" : "192.168.0.101",
          "port" : 49469,
          "webuiaddress" : "http://192.168.0.101:8083",
          "cores" : 8,
          "coresused" : 8,
          "coresfree" : 0,
          "memory" : 15360,
          "memoryused" : 1024,
          "memoryfree" : 14336,
          "state" : "ALIVE",
          "lastheartbeat" : 1497519486905
        } ],
        "cores" : 24,
        "coresused" : 24,
        "memory" : 46080,
        "memoryused" : 3072,
        "activeapps" : [ {
          "starttime" : 1497519426990,
          "id" : "app-20170615173706-0001",
          "name" : "Spark shell",
          "user" : "xingbojiang",
          "memoryperslave" : 1024,
          "submitdate" : "Thu Jun 15 17:37:06 CST 2017",
          "state" : "RUNNING",
          "duration" : 65362
        } ],
        "completedapps" : [ {
          "starttime" : 1497519250893,
          "id" : "app-20170615173410-0000",
          "name" : "Spark shell",
          "user" : "xingbojiang",
          "memoryperslave" : 1024,
          "submitdate" : "Thu Jun 15 17:34:10 CST 2017",
          "state" : "FINISHED",
          "duration" : 116895
        } ],
        "activedrivers" : [ ],
        "status" : "ALIVE"
      }
      ```
      
      After the change:
      ```
      {
        "url" : "spark://xingbos-MBP.local:7077",
        "workers" : [ {
          "id" : "worker-20170615175032-192.168.0.101-49951",
          "host" : "192.168.0.101",
          "port" : 49951,
          "webuiaddress" : "http://192.168.0.101:8081",
          "cores" : 8,
          "coresused" : 8,
          "coresfree" : 0,
          "memory" : 15360,
          "memoryused" : 1024,
          "memoryfree" : 14336,
          "state" : "ALIVE",
          "lastheartbeat" : 1497520292900
        }, {
          "id" : "worker-20170615175034-192.168.0.101-49953",
          "host" : "192.168.0.101",
          "port" : 49953,
          "webuiaddress" : "http://192.168.0.101:8082",
          "cores" : 8,
          "coresused" : 8,
          "coresfree" : 0,
          "memory" : 15360,
          "memoryused" : 1024,
          "memoryfree" : 14336,
          "state" : "ALIVE",
          "lastheartbeat" : 1497520280301
        }, {
          "id" : "worker-20170615175037-192.168.0.101-49955",
          "host" : "192.168.0.101",
          "port" : 49955,
          "webuiaddress" : "http://192.168.0.101:8083",
          "cores" : 8,
          "coresused" : 8,
          "coresfree" : 0,
          "memory" : 15360,
          "memoryused" : 1024,
          "memoryfree" : 14336,
          "state" : "ALIVE",
          "lastheartbeat" : 1497520282884
        } ],
        "aliveworkers" : 3,
        "cores" : 24,
        "coresused" : 24,
        "memory" : 46080,
        "memoryused" : 3072,
        "activeapps" : [ {
          "id" : "app-20170615175122-0001",
          "starttime" : 1497520282115,
          "name" : "Spark shell",
          "cores" : 24,
          "user" : "xingbojiang",
          "memoryperslave" : 1024,
          "submitdate" : "Thu Jun 15 17:51:22 CST 2017",
          "state" : "RUNNING",
          "duration" : 10805
        } ],
        "completedapps" : [ {
          "id" : "app-20170615175058-0000",
          "starttime" : 1497520258766,
          "name" : "Spark shell",
          "cores" : 24,
          "user" : "xingbojiang",
          "memoryperslave" : 1024,
          "submitdate" : "Thu Jun 15 17:50:58 CST 2017",
          "state" : "FINISHED",
          "duration" : 9876
        } ],
        "activedrivers" : [ ],
        "completeddrivers" : [ ],
        "status" : "ALIVE"
      }
      ```
      
      Author: Xingbo Jiang <xingbo.jiang@databricks.com>
      
      Closes #18303 from jiangxb1987/json-protocol.
      ea542d29
  2. Jun 18, 2017
    • liuxian's avatar
      [SPARK-21090][CORE] Optimize the unified memory manager code · 112bd9bf
      liuxian authored
      ## What changes were proposed in this pull request?
      1.In `acquireStorageMemory`, when the Memory Mode is OFF_HEAP ,the `maxOffHeapMemory` should be modified to `maxOffHeapStorageMemory`. after this PR,it will same as ON_HEAP Memory Mode.
      Because when acquire memory is between `maxOffHeapStorageMemory` and `maxOffHeapMemory`,it will fail surely, so if acquire memory is greater than  `maxOffHeapStorageMemory`(not greater than `maxOffHeapMemory`),we should fail fast.
      2. Borrow memory from execution, `numBytes` modified to `numBytes - storagePool.memoryFree` will be more reasonable.
      Because we just acquire `(numBytes - storagePool.memoryFree)`, unnecessary borrowed `numBytes` from execution
      
      ## How was this patch tested?
      added unit test case
      
      Author: liuxian <liu.xian3@zte.com.cn>
      
      Closes #18296 from 10110346/wip-lx-0614.
      112bd9bf
    • Yuming Wang's avatar
      [SPARK-20948][SQL] Built-in SQL Function UnaryMinus/UnaryPositive support string type · f913f158
      Yuming Wang authored
      ## What changes were proposed in this pull request?
      
      Built-in SQL Function UnaryMinus/UnaryPositive support string type, if it's string type, convert it to double type, after this PR:
      ```sql
      spark-sql> select positive('-1.11'), negative('-1.11');
      -1.11   1.11
      spark-sql>
      ```
      
      ## How was this patch tested?
      
      unit tests
      
      Author: Yuming Wang <wgyumg@gmail.com>
      
      Closes #18173 from wangyum/SPARK-20948.
      f913f158
    • Yuming Wang's avatar
      [SPARK-20749][SQL][FOLLOWUP] Support character_length · ce49428e
      Yuming Wang authored
      ## What changes were proposed in this pull request?
      
      The function `char_length` is shorthand for `character_length` function. Both Hive and Postgresql support `character_length`,  This PR add support for `character_length`.
      
      Ref:
      https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-StringFunctions
      https://www.postgresql.org/docs/current/static/functions-string.html
      
      ## How was this patch tested?
      
      unit tests
      
      Author: Yuming Wang <wgyumg@gmail.com>
      
      Closes #18330 from wangyum/SPARK-20749-character_length.
      ce49428e
    • actuaryzhang's avatar
      [SPARK-20892][SPARKR] Add SQL trunc function to SparkR · 110ce1f2
      actuaryzhang authored
      ## What changes were proposed in this pull request?
      
      Add SQL trunc function
      
      ## How was this patch tested?
      standard test
      
      Author: actuaryzhang <actuaryzhang10@gmail.com>
      
      Closes #18291 from actuaryzhang/sparkRTrunc2.
      110ce1f2
    • hyukjinkwon's avatar
      [SPARK-21128][R] Remove both "spark-warehouse" and "metastore_db" before listing files in R tests · 05f83c53
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      This PR proposes to list the files in test _after_ removing both "spark-warehouse" and "metastore_db" so that the next run of R tests pass fine. This is sometimes a bit annoying.
      
      ## How was this patch tested?
      
      Manually running multiple times R tests via `./R/run-tests.sh`.
      
      **Before**
      
      Second run:
      
      ```
      SparkSQL functions: Spark package found in SPARK_HOME: .../spark
      ...............................................................................................................................................................
      ...............................................................................................................................................................
      ...............................................................................................................................................................
      ...............................................................................................................................................................
      ...............................................................................................................................................................
      ....................................................................................................1234.......................
      
      Failed -------------------------------------------------------------------------
      1. Failure: No extra files are created in SPARK_HOME by starting session and making calls (test_sparkSQL.R#3384)
      length(list1) not equal to length(list2).
      1/1 mismatches
      [1] 25 - 23 == 2
      
      2. Failure: No extra files are created in SPARK_HOME by starting session and making calls (test_sparkSQL.R#3384)
      sort(list1, na.last = TRUE) not equal to sort(list2, na.last = TRUE).
      10/25 mismatches
      x[16]: "metastore_db"
      y[16]: "pkg"
      
      x[17]: "pkg"
      y[17]: "R"
      
      x[18]: "R"
      y[18]: "README.md"
      
      x[19]: "README.md"
      y[19]: "run-tests.sh"
      
      x[20]: "run-tests.sh"
      y[20]: "SparkR_2.2.0.tar.gz"
      
      x[21]: "metastore_db"
      y[21]: "pkg"
      
      x[22]: "pkg"
      y[22]: "R"
      
      x[23]: "R"
      y[23]: "README.md"
      
      x[24]: "README.md"
      y[24]: "run-tests.sh"
      
      x[25]: "run-tests.sh"
      y[25]: "SparkR_2.2.0.tar.gz"
      
      3. Failure: No extra files are created in SPARK_HOME by starting session and making calls (test_sparkSQL.R#3388)
      length(list1) not equal to length(list2).
      1/1 mismatches
      [1] 25 - 23 == 2
      
      4. Failure: No extra files are created in SPARK_HOME by starting session and making calls (test_sparkSQL.R#3388)
      sort(list1, na.last = TRUE) not equal to sort(list2, na.last = TRUE).
      10/25 mismatches
      x[16]: "metastore_db"
      y[16]: "pkg"
      
      x[17]: "pkg"
      y[17]: "R"
      
      x[18]: "R"
      y[18]: "README.md"
      
      x[19]: "README.md"
      y[19]: "run-tests.sh"
      
      x[20]: "run-tests.sh"
      y[20]: "SparkR_2.2.0.tar.gz"
      
      x[21]: "metastore_db"
      y[21]: "pkg"
      
      x[22]: "pkg"
      y[22]: "R"
      
      x[23]: "R"
      y[23]: "README.md"
      
      x[24]: "README.md"
      y[24]: "run-tests.sh"
      
      x[25]: "run-tests.sh"
      y[25]: "SparkR_2.2.0.tar.gz"
      
      DONE ===========================================================================
      ```
      
      **After**
      
      Second run:
      
      ```
      SparkSQL functions: Spark package found in SPARK_HOME: .../spark
      ...............................................................................................................................................................
      ...............................................................................................................................................................
      ...............................................................................................................................................................
      ...............................................................................................................................................................
      ...............................................................................................................................................................
      ...............................................................................................................................
      ```
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #18335 from HyukjinKwon/SPARK-21128.
      05f83c53
    • hyukjinkwon's avatar
      [MINOR][R] Add knitr and rmarkdown packages/improve output for version info in AppVeyor tests · 75a6d058
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      This PR proposes three things as below:
      
      **Install packages per documentation** - this does not affect the tests itself (but CRAN which we are not doing via AppVeyor) up to my knowledge.
      
      This adds `knitr` and `rmarkdown` per https://github.com/apache/spark/blob/45824fb608930eb461e7df53bb678c9534c183a9/R/WINDOWS.md#unit-tests (please see https://github.com/apache/spark/commit/45824fb608930eb461e7df53bb678c9534c183a9)
      
      **Improve logs/shorten logs** - actually, long logs can be a problem on AppVeyor (e.g., see https://github.com/apache/spark/pull/17873)
      
      `R -e ...` repeats printing R information for each invocation as below:
      
      ```
      R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
      Copyright (C) 2016 The R Foundation for Statistical Computing
      Platform: i386-w64-mingw32/i386 (32-bit)
      
      R is free software and comes with ABSOLUTELY NO WARRANTY.
      You are welcome to redistribute it under certain conditions.
      Type 'license()' or 'licence()' for distribution details.
      
        Natural language support but running in an English locale
      
      R is a collaborative project with many contributors.
      Type 'contributors()' for more information and
      'citation()' on how to cite R or R packages in publications.
      
      Type 'demo()' for some demos, 'help()' for on-line help, or
      'help.start()' for an HTML browser interface to help.
      Type 'q()' to quit R.
      ```
      
      It looks reducing the call might be slightly better and print out the versions together looks more readable.
      
      Before:
      
      ```
      # R information ...
      > packageVersion('testthat')
      [1] '1.0.2'
      >
      >
      
      # R information ...
      > packageVersion('e1071')
      [1] '1.6.8'
      >
      >
      ... 3 more times
      ```
      
      After:
      
      ```
      # R information ...
      > packageVersion('knitr'); packageVersion('rmarkdown'); packageVersion('testthat'); packageVersion('e1071'); packageVersion('survival')
      [1] ‘1.16’
      [1] ‘1.6’
      [1] ‘1.0.2’
      [1] ‘1.6.8’
      [1] ‘2.41.3’
      ```
      
      **Add`appveyor.yml`/`dev/appveyor-install-dependencies.ps1` for triggering the test**
      
      Changing this file might break the test, e.g., https://github.com/apache/spark/pull/16927
      
      ## How was this patch tested?
      
      Before (please see https://ci.appveyor.com/project/HyukjinKwon/spark/build/169-master)
      After (please see the AppVeyor build in this PR):
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #18336 from HyukjinKwon/minor-add-knitr-and-rmarkdown.
      75a6d058
    • liuzhaokun's avatar
      [SPARK-21126] The configuration which named... · 0d8604bb
      liuzhaokun authored
      [SPARK-21126] The configuration which named "spark.core.connection.auth.wait.timeout" hasn't been used in spark
      
      [https://issues.apache.org/jira/browse/SPARK-21126](https://issues.apache.org/jira/browse/SPARK-21126)
      The configuration which named "spark.core.connection.auth.wait.timeout" hasn't been used in spark,so I think it should be removed from configuration.md.
      
      Author: liuzhaokun <liu.zhaokun@zte.com.cn>
      
      Closes #18333 from liu-zhaokun/new3.
      0d8604bb
  3. Jun 16, 2017
  4. Jun 15, 2017
    • Xianyang Liu's avatar
      [SPARK-21072][SQL] TreeNode.mapChildren should only apply to the children node. · 87ab0cec
      Xianyang Liu authored
      ## What changes were proposed in this pull request?
      
      Just as the function name and comments of `TreeNode.mapChildren` mentioned, the function should be apply to all currently node children. So, the follow code should judge whether it is the children node.
      
      https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala#L342
      
      ## How was this patch tested?
      
      Existing tests.
      
      Author: Xianyang Liu <xianyang.liu@intel.com>
      
      Closes #18284 from ConeyLiu/treenode.
      87ab0cec
    • Xiao Li's avatar
      [SPARK-21112][SQL] ALTER TABLE SET TBLPROPERTIES should not overwrite COMMENT · 5d35d5c1
      Xiao Li authored
      ### What changes were proposed in this pull request?
      `ALTER TABLE SET TBLPROPERTIES` should not overwrite `COMMENT` even if the input property does not have the property of `COMMENT`. This PR is to fix the issue.
      
      ### How was this patch tested?
      Covered by the existing tests.
      
      Author: Xiao Li <gatorsmile@gmail.com>
      
      Closes #18318 from gatorsmile/fixTableComment.
      5d35d5c1
    • Michael Gummelt's avatar
      [SPARK-20434][YARN][CORE] Move Hadoop delegation token code from yarn to core · a18d6371
      Michael Gummelt authored
      ## What changes were proposed in this pull request?
      
      Move Hadoop delegation token code from `spark-yarn` to `spark-core`, so that other schedulers (such as Mesos), may use it.  In order to avoid exposing Hadoop interfaces in spark-core, the new Hadoop delegation token classes are kept private.  In order to provider backward compatiblity, and to allow YARN users to continue to load their own delegation token providers via Java service loading, the old YARN interfaces, as well as the client code that uses them, have been retained.
      
      Summary:
      - Move registered `yarn.security.ServiceCredentialProvider` classes from `spark-yarn` to `spark-core`.  Moved them into a new, private hierarchy under `HadoopDelegationTokenProvider`.  Client code in `HadoopDelegationTokenManager` now loads credentials from a whitelist of three providers (`HadoopFSDelegationTokenProvider`, `HiveDelegationTokenProvider`, `HBaseDelegationTokenProvider`), instead of service loading, which means that users are not able to implement their own delegation token providers, as they are in the `spark-yarn` module.
      
      - The `yarn.security.ServiceCredentialProvider` interface has been kept for backwards compatibility, and to continue to allow YARN users to implement their own delegation token provider implementations.  Client code in YARN now fetches tokens via the new `YARNHadoopDelegationTokenManager` class, which fetches tokens from the core providers through `HadoopDelegationTokenManager`, as well as service loads them from `yarn.security.ServiceCredentialProvider`.
      
      Old Hierarchy:
      
      ```
      yarn.security.ServiceCredentialProvider (service loaded)
        HadoopFSCredentialProvider
        HiveCredentialProvider
        HBaseCredentialProvider
      yarn.security.ConfigurableCredentialManager
      ```
      
      New Hierarchy:
      
      ```
      HadoopDelegationTokenManager
      HadoopDelegationTokenProvider (not service loaded)
        HadoopFSDelegationTokenProvider
        HiveDelegationTokenProvider
        HBaseDelegationTokenProvider
      
      yarn.security.ServiceCredentialProvider (service loaded)
      yarn.security.YARNHadoopDelegationTokenManager
      ```
      ## How was this patch tested?
      
      unit tests
      
      Author: Michael Gummelt <mgummelt@mesosphere.io>
      Author: Dr. Stefan Schimanski <sttts@mesosphere.io>
      
      Closes #17723 from mgummelt/SPARK-20434-refactor-kerberos.
      a18d6371
    • Xingbo Jiang's avatar
      [SPARK-16251][SPARK-20200][CORE][TEST] Flaky test:... · 7dc3e697
      Xingbo Jiang authored
      [SPARK-16251][SPARK-20200][CORE][TEST] Flaky test: org.apache.spark.rdd.LocalCheckpointSuite.missing checkpoint block fails with informative message
      
      ## What changes were proposed in this pull request?
      
      Currently we don't wait to confirm the removal of the block from the slave's BlockManager, if the removal takes too much time, we will fail the assertion in this test case.
      The failure can be easily reproduced if we sleep for a while before we remove the block in BlockManagerSlaveEndpoint.receiveAndReply().
      
      ## How was this patch tested?
      N/A
      
      Author: Xingbo Jiang <xingbo.jiang@databricks.com>
      
      Closes #18314 from jiangxb1987/LocalCheckpointSuite.
      7dc3e697
    • Felix Cheung's avatar
      [SPARK-20980][DOCS] update doc to reflect multiLine change · 1bf55e39
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      doc only change
      
      ## How was this patch tested?
      
      manually
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #18312 from felixcheung/sqljsonwholefiledoc.
      1bf55e39
    • ALeksander Eskilson's avatar
      [SPARK-18016][SQL][CATALYST] Code Generation: Constant Pool Limit - Class Splitting · b32b2123
      ALeksander Eskilson authored
      ## What changes were proposed in this pull request?
      
      This pull-request exclusively includes the class splitting feature described in #16648. When code for a given class would grow beyond 1600k bytes, a private, nested sub-class is generated into which subsequent functions are inlined. Additional sub-classes are generated as the code threshold is met subsequent times. This code includes 3 changes:
      
      1. Includes helper maps, lists, and functions for keeping track of sub-classes during code generation (included in the `CodeGenerator` class). These helper functions allow nested classes and split functions to be initialized/declared/inlined to the appropriate locations in the various projection classes.
      2. Changes `addNewFunction` to return a string to support instances where a split function is inlined to a nested class and not the outer class (and so must be invoked using the class-qualified name). Uses of `addNewFunction` throughout the codebase are modified so that the returned name is properly used.
      3. Removes instances of the `this` keyword when used on data inside generated classes. All state declared in the outer class is by default global and accessible to the nested classes. However, if a reference to global state in a nested class is prepended with the `this` keyword, it would attempt to reference state belonging to the nested class (which would not exist), rather than the correct variable belonging to the outer class.
      
      ## How was this patch tested?
      
      Added a test case to the `GeneratedProjectionSuite` that increases the number of columns tested in various projections to a threshold that would previously have triggered a `JaninoRuntimeException` for the Constant Pool.
      
      Note: This PR does not address the second Constant Pool issue with code generation (also mentioned in #16648): excess global mutable state. A second PR may be opened to resolve that issue.
      
      Author: ALeksander Eskilson <alek.eskilson@cerner.com>
      
      Closes #18075 from bdrillard/class_splitting_only.
      b32b2123
    • Xiao Li's avatar
      [SPARK-20980][SQL] Rename `wholeFile` to `multiLine` for both CSV and JSON · 20514281
      Xiao Li authored
      ### What changes were proposed in this pull request?
      The current option name `wholeFile` is misleading for CSV users. Currently, it is not representing a record per file. Actually, one file could have multiple records. Thus, we should rename it. Now, the proposal is `multiLine`.
      
      ### How was this patch tested?
      N/A
      
      Author: Xiao Li <gatorsmile@gmail.com>
      
      Closes #18202 from gatorsmile/renameCVSOption.
      20514281
    • Reynold Xin's avatar
      [SPARK-21092][SQL] Wire SQLConf in logical plan and expressions · fffeb6d7
      Reynold Xin authored
      ## What changes were proposed in this pull request?
      It is really painful to not have configs in logical plan and expressions. We had to add all sorts of hacks (e.g. pass SQLConf explicitly in functions). This patch exposes SQLConf in logical plan, using a thread local variable and a getter closure that's set once there is an active SparkSession.
      
      The implementation is a bit of a hack, since we didn't anticipate this need in the beginning (config was only exposed in physical plan). The implementation is described in `SQLConf.get`.
      
      In terms of future work, we should follow up to clean up CBO (remove the need for passing in config).
      
      ## How was this patch tested?
      Updated relevant tests for constraint propagation.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #18299 from rxin/SPARK-21092.
      fffeb6d7
  5. Jun 14, 2017
  6. Jun 13, 2017
    • Sital Kedia's avatar
      [SPARK-19753][CORE] Un-register all shuffle output on a host in case of slave lost or fetch failure · dccc0aa3
      Sital Kedia authored
      ## What changes were proposed in this pull request?
      
      Currently, when we detect fetch failure, we only remove the shuffle files produced by the executor, while the host itself might be down and all the shuffle files are not accessible. In case we are running multiple executors on a host, any host going down currently results in multiple fetch failures and multiple retries of the stage, which is very inefficient. If we remove all the shuffle files on that host, on first fetch failure, we can rerun all the tasks on that host in a single stage retry.
      
      ## How was this patch tested?
      
      Unit testing and also ran a job on the cluster and made sure multiple retries are gone.
      
      Author: Sital Kedia <skedia@fb.com>
      Author: Imran Rashid <irashid@cloudera.com>
      
      Closes #18150 from sitalkedia/cleanup_shuffle.
      dccc0aa3
    • lianhuiwang's avatar
      [SPARK-20986][SQL] Reset table's statistics after PruneFileSourcePartitions rule. · 8b5b2e27
      lianhuiwang authored
      ## What changes were proposed in this pull request?
      After PruneFileSourcePartitions rule, It needs reset table's statistics because PruneFileSourcePartitions can filter some unnecessary partitions. So the statistics need to be changed.
      
      ## How was this patch tested?
      add unit test.
      
      Author: lianhuiwang <lianhuiwang09@gmail.com>
      
      Closes #18205 from lianhuiwang/SPARK-20986.
      8b5b2e27
    • jerryshao's avatar
      [SPARK-12552][CORE] Correctly count the driver resource when recovering from failure for Master · 9eb09524
      jerryshao authored
      Currently in Standalone HA mode, the resource usage of driver is not correctly counted in Master when recovering from failure, this will lead to some unexpected behaviors like negative value in UI.
      
      So here fix this to also count the driver's resource usage.
      
      Also changing the recovered app's state to `RUNNING` when fully recovered. Previously it will always be WAITING even fully recovered.
      
      andrewor14 please help to review, thanks a lot.
      
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #10506 from jerryshao/SPARK-12552.
      9eb09524
    • liuxian's avatar
      [SPARK-21016][CORE] Improve code fault tolerance for converting string to number · 7ba8bf28
      liuxian authored
      ## What changes were proposed in this pull request?
      When converting `string` to `number`(int, long or double),  if the string has a space before or after,will lead to unnecessary mistakes.
      
      ## How was this patch tested?
      unit test
      
      Author: liuxian <liu.xian3@zte.com.cn>
      
      Closes #18238 from 10110346/lx-wip-0608.
      7ba8bf28
Loading