Skip to content
Snippets Groups Projects
  1. Jan 10, 2017
    • Sean Owen's avatar
      [SPARK-18997][CORE] Recommended upgrade libthrift to 0.9.3 · 81c94309
      Sean Owen authored
      
      ## What changes were proposed in this pull request?
      
      Updates to libthrift 0.9.3 to address a CVE.
      
      ## How was this patch tested?
      
      Existing tests.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #16530 from srowen/SPARK-18997.
      
      (cherry picked from commit 856bae6a)
      Signed-off-by: default avatarMarcelo Vanzin <vanzin@cloudera.com>
      81c94309
    • Shixiong Zhu's avatar
      [SPARK-19113][SS][TESTS] Set UncaughtExceptionHandler in onQueryStarted to... · e0af4b72
      Shixiong Zhu authored
      [SPARK-19113][SS][TESTS] Set UncaughtExceptionHandler in onQueryStarted to ensure catching fatal errors during query initialization
      
      ## What changes were proposed in this pull request?
      
      StreamTest sets `UncaughtExceptionHandler` after starting the query now. It may not be able to catch fatal errors during query initialization. This PR uses `onQueryStarted` callback to fix it.
      
      ## How was this patch tested?
      
      Jenkins
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #16492 from zsxwing/SPARK-19113.
      e0af4b72
    • Dongjoon Hyun's avatar
      [SPARK-19137][SQL] Fix `withSQLConf` to reset `OptionalConfigEntry` correctly · 69d1c4c5
      Dongjoon Hyun authored
      
      ## What changes were proposed in this pull request?
      
      `DataStreamReaderWriterSuite` makes test files in source folder like the followings. Interestingly, the root cause is `withSQLConf` fails to reset `OptionalConfigEntry` correctly. In other words, it resets the config into `Some(undefined)`.
      
      ```bash
      $ git status
      Untracked files:
        (use "git add <file>..." to include in what will be committed)
      
              sql/core/%253Cundefined%253E/
              sql/core/%3Cundefined%3E/
      ```
      
      ## How was this patch tested?
      
      Manual.
      ```
      build/sbt "project sql" test
      git status
      ```
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #16522 from dongjoon-hyun/SPARK-19137.
      
      (cherry picked from commit d5b1dc93)
      Signed-off-by: default avatarShixiong Zhu <shixiong@databricks.com>
      69d1c4c5
    • Liwei Lin's avatar
      [SPARK-16845][SQL] `GeneratedClass$SpecificOrdering` grows beyond 64 KB · 65c866ef
      Liwei Lin authored
      
      ## What changes were proposed in this pull request?
      
      Prior to this patch, we'll generate `compare(...)` for `GeneratedClass$SpecificOrdering` like below, leading to Janino exceptions saying the code grows beyond 64 KB.
      
      ``` scala
      /* 005 */ class SpecificOrdering extends o.a.s.sql.catalyst.expressions.codegen.BaseOrdering {
      /* ..... */   ...
      /* 10969 */   private int compare(InternalRow a, InternalRow b) {
      /* 10970 */     InternalRow i = null;  // Holds current row being evaluated.
      /* 10971 */
      /* 1.... */     code for comparing field0
      /* 1.... */     code for comparing field1
      /* 1.... */     ...
      /* 1.... */     code for comparing field449
      /* 15012 */
      /* 15013 */     return 0;
      /* 15014 */   }
      /* 15015 */ }
      ```
      
      This patch would break `compare(...)` into smaller `compare_xxx(...)` methods when necessary; then we'll get generated `compare(...)` like:
      
      ``` scala
      /* 001 */ public SpecificOrdering generate(Object[] references) {
      /* 002 */   return new SpecificOrdering(references);
      /* 003 */ }
      /* 004 */
      /* 005 */ class SpecificOrdering extends o.a.s.sql.catalyst.expressions.codegen.BaseOrdering {
      /* 006 */
      /* 007 */     ...
      /* 1.... */
      /* 11290 */   private int compare_0(InternalRow a, InternalRow b) {
      /* 11291 */     InternalRow i = null;  // Holds current row being evaluated.
      /* 11292 */
      /* 11293 */     i = a;
      /* 11294 */     boolean isNullA;
      /* 11295 */     UTF8String primitiveA;
      /* 11296 */     {
      /* 11297 */
      /* 11298 */       Object obj = ((Expression) references[0]).eval(null);
      /* 11299 */       UTF8String value = (UTF8String) obj;
      /* 11300 */       isNullA = false;
      /* 11301 */       primitiveA = value;
      /* 11302 */     }
      /* 11303 */     i = b;
      /* 11304 */     boolean isNullB;
      /* 11305 */     UTF8String primitiveB;
      /* 11306 */     {
      /* 11307 */
      /* 11308 */       Object obj = ((Expression) references[0]).eval(null);
      /* 11309 */       UTF8String value = (UTF8String) obj;
      /* 11310 */       isNullB = false;
      /* 11311 */       primitiveB = value;
      /* 11312 */     }
      /* 11313 */     if (isNullA && isNullB) {
      /* 11314 */       // Nothing
      /* 11315 */     } else if (isNullA) {
      /* 11316 */       return -1;
      /* 11317 */     } else if (isNullB) {
      /* 11318 */       return 1;
      /* 11319 */     } else {
      /* 11320 */       int comp = primitiveA.compare(primitiveB);
      /* 11321 */       if (comp != 0) {
      /* 11322 */         return comp;
      /* 11323 */       }
      /* 11324 */     }
      /* 11325 */
      /* 11326 */
      /* 11327 */     i = a;
      /* 11328 */     boolean isNullA1;
      /* 11329 */     UTF8String primitiveA1;
      /* 11330 */     {
      /* 11331 */
      /* 11332 */       Object obj1 = ((Expression) references[1]).eval(null);
      /* 11333 */       UTF8String value1 = (UTF8String) obj1;
      /* 11334 */       isNullA1 = false;
      /* 11335 */       primitiveA1 = value1;
      /* 11336 */     }
      /* 11337 */     i = b;
      /* 11338 */     boolean isNullB1;
      /* 11339 */     UTF8String primitiveB1;
      /* 11340 */     {
      /* 11341 */
      /* 11342 */       Object obj1 = ((Expression) references[1]).eval(null);
      /* 11343 */       UTF8String value1 = (UTF8String) obj1;
      /* 11344 */       isNullB1 = false;
      /* 11345 */       primitiveB1 = value1;
      /* 11346 */     }
      /* 11347 */     if (isNullA1 && isNullB1) {
      /* 11348 */       // Nothing
      /* 11349 */     } else if (isNullA1) {
      /* 11350 */       return -1;
      /* 11351 */     } else if (isNullB1) {
      /* 11352 */       return 1;
      /* 11353 */     } else {
      /* 11354 */       int comp = primitiveA1.compare(primitiveB1);
      /* 11355 */       if (comp != 0) {
      /* 11356 */         return comp;
      /* 11357 */       }
      /* 11358 */     }
      /* 1.... */
      /* 1.... */   ...
      /* 1.... */
      /* 12652 */     return 0;
      /* 12653 */   }
      /* 1.... */
      /* 1.... */   ...
      /* 15387 */
      /* 15388 */   public int compare(InternalRow a, InternalRow b) {
      /* 15389 */
      /* 15390 */     int comp_0 = compare_0(a, b);
      /* 15391 */     if (comp_0 != 0) {
      /* 15392 */       return comp_0;
      /* 15393 */     }
      /* 15394 */
      /* 15395 */     int comp_1 = compare_1(a, b);
      /* 15396 */     if (comp_1 != 0) {
      /* 15397 */       return comp_1;
      /* 15398 */     }
      /* 1.... */
      /* 1.... */     ...
      /* 1.... */
      /* 15450 */     return 0;
      /* 15451 */   }
      /* 15452 */ }
      ```
      ## How was this patch tested?
      - a new added test case which
        - would fail prior to this patch
        - would pass with this patch
      - ordering correctness should already be covered by existing tests like those in `OrderingSuite`
      
      ## Acknowledgement
      
      A major part of this PR - the refactoring work of `splitExpression()` - has been done by ueshin.
      
      Author: Liwei Lin <lwlin7@gmail.com>
      Author: Takuya UESHIN <ueshin@happy-camper.st>
      Author: Takuya Ueshin <ueshin@happy-camper.st>
      
      Closes #15480 from lw-lin/spec-ordering-64k-.
      
      (cherry picked from commit acfc5f35)
      Signed-off-by: default avatarWenchen Fan <wenchen@databricks.com>
      65c866ef
  2. Jan 09, 2017
  3. Jan 08, 2017
  4. Jan 07, 2017
  5. Jan 06, 2017
  6. Jan 04, 2017
    • Dongjoon Hyun's avatar
      [SPARK-18877][SQL][BACKPORT-2.1] CSVInferSchema.inferField` on DecimalType... · 1ecf1a95
      Dongjoon Hyun authored
      [SPARK-18877][SQL][BACKPORT-2.1] CSVInferSchema.inferField` on DecimalType should find a common type with `typeSoFar`
      
      ## What changes were proposed in this pull request?
      
      CSV type inferencing causes `IllegalArgumentException` on decimal numbers with heterogeneous precisions and scales because the current logic uses the last decimal type in a **partition**. Specifically, `inferRowType`, the **seqOp** of **aggregate**, returns the last decimal type. This PR fixes it to use `findTightestCommonType`.
      
      **decimal.csv**
      ```
      9.03E+12
      1.19E+11
      ```
      
      **BEFORE**
      ```scala
      scala> spark.read.format("csv").option("inferSchema", true).load("decimal.csv").printSchema
      root
       |-- _c0: decimal(3,-9) (nullable = true)
      
      scala> spark.read.format("csv").option("inferSchema", true).load("decimal.csv").show
      16/12/16 14:32:49 ERROR Executor: Exception in task 0.0 in stage 4.0 (TID 4)
      java.lang.IllegalArgumentException: requirement failed: Decimal precision 4 exceeds max precision 3
      ```
      
      **AFTER**
      ```scala
      scala> spark.read.format("csv").option("inferSchema", true).load("decimal.csv").printSchema
      root
       |-- _c0: decimal(4,-9) (nullable = true)
      
      scala> spark.read.format("csv").option("inferSchema", true).load("decimal.csv").show
      +---------+
      |      _c0|
      +---------+
      |9.030E+12|
      | 1.19E+11|
      +---------+
      ```
      
      ## How was this patch tested?
      
      Pass the newly add test case.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #16463 from dongjoon-hyun/SPARK-18877-BACKPORT-21.
      1ecf1a95
  7. Jan 03, 2017
    • gatorsmile's avatar
      [SPARK-19048][SQL] Delete Partition Location when Dropping Managed Partitioned... · 77625506
      gatorsmile authored
      [SPARK-19048][SQL] Delete Partition Location when Dropping Managed Partitioned Tables in InMemoryCatalog
      
      ### What changes were proposed in this pull request?
      The data in the managed table should be deleted after table is dropped. However, if the partition location is not under the location of the partitioned table, it is not deleted as expected. Users can specify any location for the partition when they adding a partition.
      
      This PR is to delete partition location when dropping managed partitioned tables stored in `InMemoryCatalog`.
      
      ### How was this patch tested?
      Added test cases for both HiveExternalCatalog and InMemoryCatalog
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #16448 from gatorsmile/unsetSerdeProp.
      
      (cherry picked from commit b67b35f7)
      Signed-off-by: default avatargatorsmile <gatorsmile@gmail.com>
      77625506
  8. Jan 02, 2017
  9. Jan 01, 2017
  10. Dec 30, 2016
    • Cheng Lian's avatar
      [SPARK-19016][SQL][DOC] Document scalable partition handling · 20ae1172
      Cheng Lian authored
      
      This PR documents the scalable partition handling feature in the body of the programming guide.
      
      Before this PR, we only mention it in the migration guide. It's not super clear that external datasource tables require an extra `MSCK REPAIR TABLE` command is to have per-partition information persisted since 2.1.
      
      N/A.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #16424 from liancheng/scalable-partition-handling-doc.
      
      (cherry picked from commit 871f6114)
      Signed-off-by: default avatarCheng Lian <lian@databricks.com>
      20ae1172
  11. Dec 29, 2016
    • adesharatushar's avatar
      [SPARK-19003][DOCS] Add Java example in Spark Streaming Guide, section Design... · 47ab4afe
      adesharatushar authored
      [SPARK-19003][DOCS] Add Java example in Spark Streaming Guide, section Design Patterns for using foreachRDD
      
      ## What changes were proposed in this pull request?
      
      Added missing Java example under section "Design Patterns for using foreachRDD". Now this section has examples in all 3 languages, improving consistency of documentation.
      
      ## How was this patch tested?
      
      Manual.
      Generated docs using command "SKIP_API=1 jekyll build" and verified generated HTML page manually.
      
      The syntax of example has been tested for correctness using sample code on Java1.7 and Spark 2.2.0-SNAPSHOT.
      
      Author: adesharatushar <tushar_adeshara@persistent.com>
      
      Closes #16408 from adesharatushar/streaming-doc-fix.
      
      (cherry picked from commit dba81e1d)
      Signed-off-by: default avatarSean Owen <sowen@cloudera.com>
      Unverified
      47ab4afe
  12. Dec 28, 2016
  13. Dec 24, 2016
  14. Dec 23, 2016
    • Shixiong Zhu's avatar
      [SPARK-18991][CORE] Change ContextCleaner.referenceBuffer to use... · 5bafdc45
      Shixiong Zhu authored
      [SPARK-18991][CORE] Change ContextCleaner.referenceBuffer to use ConcurrentHashMap to make it faster
      
      ## What changes were proposed in this pull request?
      
      The time complexity of ConcurrentHashMap's `remove` is O(1). Changing ContextCleaner.referenceBuffer's type from `ConcurrentLinkedQueue` to `ConcurrentHashMap's` will make the removal much faster.
      
      ## How was this patch tested?
      
      Jenkins
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #16390 from zsxwing/SPARK-18991.
      
      (cherry picked from commit a848f0ba)
      Signed-off-by: default avatarShixiong Zhu <shixiong@databricks.com>
      5bafdc45
  15. Dec 22, 2016
    • Shixiong Zhu's avatar
      [SPARK-18972][CORE] Fix the netty thread names for RPC · 1857acc7
      Shixiong Zhu authored
      
      ## What changes were proposed in this pull request?
      
      Right now the name of threads created by Netty for Spark RPC are `shuffle-client-**` and `shuffle-server-**`. It's pretty confusing.
      
      This PR just uses the module name in TransportConf to set the thread name. In addition, it also includes the following minor fixes:
      
      - TransportChannelHandler.channelActive and channelInactive should call the corresponding super methods.
      - Make ShuffleBlockFetcherIterator throw NoSuchElementException if it has no more elements. Otherwise,  if the caller calls `next` without `hasNext`, it will just hang.
      
      ## How was this patch tested?
      
      Jenkins
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #16380 from zsxwing/SPARK-18972.
      
      (cherry picked from commit f252cb5d)
      Signed-off-by: default avatarShixiong Zhu <shixiong@databricks.com>
      1857acc7
    • Shixiong Zhu's avatar
      [SPARK-18985][SS] Add missing @InterfaceStability.Evolving for Structured Streaming APIs · 5e801034
      Shixiong Zhu authored
      
      ## What changes were proposed in this pull request?
      
      Add missing InterfaceStability.Evolving for Structured Streaming APIs
      
      ## How was this patch tested?
      
      Compiling the codes.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #16385 from zsxwing/SPARK-18985.
      
      (cherry picked from commit 2246ce88)
      Signed-off-by: default avatarShixiong Zhu <shixiong@databricks.com>
      5e801034
    • Ryan Williams's avatar
      [SPARK-17807][CORE] split test-tags into test-JAR · 132f2297
      Ryan Williams authored
      
      Remove spark-tag's compile-scope dependency (and, indirectly, spark-core's compile-scope transitive-dependency) on scalatest by splitting test-oriented tags into spark-tags' test JAR.
      
      Alternative to #16303.
      
      Author: Ryan Williams <ryan.blake.williams@gmail.com>
      
      Closes #16311 from ryan-williams/tt.
      
      (cherry picked from commit afd9bc1d)
      Signed-off-by: default avatarMarcelo Vanzin <vanzin@cloudera.com>
      132f2297
    • Reynold Xin's avatar
      [SPARK-18973][SQL] Remove SortPartitions and RedistributeData · f6853b3e
      Reynold Xin authored
      
      ## What changes were proposed in this pull request?
      SortPartitions and RedistributeData logical operators are not actually used and can be removed. Note that we do have a Sort operator (with global flag false) that subsumed SortPartitions.
      
      ## How was this patch tested?
      Also updated test cases to reflect the removal.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #16381 from rxin/SPARK-18973.
      
      (cherry picked from commit 26151000)
      Signed-off-by: default avatarHerman van Hovell <hvanhovell@databricks.com>
      f6853b3e
    • Reynold Xin's avatar
      [DOC] bucketing is applicable to all file-based data sources · ec0d6e21
      Reynold Xin authored
      
      ## What changes were proposed in this pull request?
      Starting Spark 2.1.0, bucketing feature is available for all file-based data sources. This patch fixes some function docs that haven't yet been updated to reflect that.
      
      ## How was this patch tested?
      N/A
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #16349 from rxin/ds-doc.
      
      (cherry picked from commit 2e861df9)
      Signed-off-by: default avatarReynold Xin <rxin@databricks.com>
      ec0d6e21
    • Reynold Xin's avatar
      [SQL] Minor readability improvement for partition handling code · def3690f
      Reynold Xin authored
      
      This patch includes minor changes to improve readability for partition handling code. I'm in the middle of implementing some new feature and found some naming / implicit type inference not as intuitive.
      
      This patch should have no semantic change and the changes should be covered by existing test cases.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #16378 from rxin/minor-fix.
      
      (cherry picked from commit 7c5b7b3a)
      Signed-off-by: default avatarReynold Xin <rxin@databricks.com>
      def3690f
    • Shixiong Zhu's avatar
      [SPARK-18908][SS] Creating StreamingQueryException should check if logicalPlan is created · 07e2a17d
      Shixiong Zhu authored
      
      ## What changes were proposed in this pull request?
      
      This PR audits places using `logicalPlan` in StreamExecution and ensures they all handles the case that `logicalPlan` cannot be created.
      
      In addition, this PR also fixes the following issues in `StreamingQueryException`:
      - `StreamingQueryException` and `StreamExecution` are cycle-dependent because in the `StreamingQueryException`'s constructor, it calls `StreamExecution`'s `toDebugString` which uses `StreamingQueryException`. Hence it will output `null` value in the error message.
      - Duplicated stack trace when calling Throwable.printStackTrace because StreamingQueryException's toString contains the stack trace.
      
      ## How was this patch tested?
      
      The updated `test("max files per trigger - incorrect values")`. I found this issue when I switched from `testStream` to the real codes to verify the failure in this test.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #16322 from zsxwing/SPARK-18907.
      
      (cherry picked from commit ff7d82a2)
      Signed-off-by: default avatarShixiong Zhu <shixiong@databricks.com>
      07e2a17d
  16. Dec 21, 2016
Loading