Skip to content
Snippets Groups Projects
  1. May 05, 2018
  2. May 03, 2018
  3. Apr 15, 2018
  4. Mar 01, 2018
    • KaiXinXiaoLei's avatar
      [SPARK-23405] Generate additional constraints for Join's children · cdcccd7b
      KaiXinXiaoLei authored
      ## What changes were proposed in this pull request?
      
      (Please fill in changes proposed in this fix)
      I run a sql: `select ls.cs_order_number from ls left semi join catalog_sales cs on ls.cs_order_number = cs.cs_order_number`, The `ls` table is a small table ,and the number is one. The `catalog_sales` table is a big table,  and the number is 10 billion. The task will be hang up. And i find the many null values of `cs_order_number` in the `catalog_sales` table. I think the null value should be removed in the logical plan.
      
      >== Optimized Logical Plan ==
      >Join LeftSemi, (cs_order_number#1 = cs_order_number#22)
      >:- Project cs_order_number#1
      >   : +- Filter isnotnull(cs_order_number#1)
      >      : +- MetastoreRelation 100t, ls
      >+- Project cs_order_number#22
      >   +- MetastoreRelation 100t, catalog_sales
      
      Now, use this patch, the plan will be:
      >== Optimized Logical Plan ==
      >Join LeftSemi, (cs_order_number#1 = cs_order_number#22)
      >:- Project cs_order_number#1
      >   : +- Filter isnotnull(cs_order_number#1)
      >      : +- MetastoreRelation 100t, ls
      >+- Project cs_order_number#22
      >   : **+- Filter isnotnull(cs_order_number#22)**
      >     :+- MetastoreRelation 100t, catalog_sales
      
      ## How was this patch tested?
      
      (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
      (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: KaiXinXiaoLei <584620569@qq.com>
      Author: hanghang <584620569@qq.com>
      
      Closes #20670 from KaiXinXiaoLei/Spark-23405.
      cdcccd7b
    • Yuming Wang's avatar
      [SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 metastore · ff148018
      Yuming Wang authored
      ## What changes were proposed in this pull request?
      This is based on https://github.com/apache/spark/pull/20668 for supporting Hive 2.2 and Hive 2.3 metastore.
      
      When we merge the PR, we should give the major credit to wangyum
      
      ## How was this patch tested?
      Added the test cases
      
      Author: Yuming Wang <yumwang@ebay.com>
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #20671 from gatorsmile/pr-20668.
      ff148018
    • liuxian's avatar
      [SPARK-23389][CORE] When the shuffle dependency specifies aggregation ,and... · 22f3d333
      liuxian authored
      [SPARK-23389][CORE] When the shuffle dependency specifies aggregation ,and `dependency.mapSideCombine =false`, we should be able to use serialized sorting.
      
      ## What changes were proposed in this pull request?
      When the shuffle dependency specifies aggregation ,and `dependency.mapSideCombine=false`, in the map side,there is no need for aggregation and sorting, so we should be able to use serialized sorting.
      
      ## How was this patch tested?
      Existing unit test
      
      Author: liuxian <liu.xian3@zte.com.cn>
      
      Closes #20576 from 10110346/mapsidecombine.
      22f3d333
Loading