Skip to content
Snippets Groups Projects
  1. Dec 22, 2014
    • Liang-Chi Hsieh's avatar
      [Minor] Fix scala doc · a61aa669
      Liang-Chi Hsieh authored
      Minor fix for an obvious scala doc error.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #3751 from viirya/fix_scaladoc and squashes the following commits:
      
      03fddaa [Liang-Chi Hsieh] Fix scala doc.
      a61aa669
  2. Dec 19, 2014
    • Cheng Hao's avatar
      [SPARK-4901] [SQL] Hot fix for ByteWritables.copyBytes · 5479450c
      Cheng Hao authored
      HiveInspectors.scala failed in compiling with Hadoop 1, as the BytesWritable.copyBytes is not available in Hadoop 1.
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #3742 from chenghao-intel/settable_oi_hotfix and squashes the following commits:
      
      bb04d1f [Cheng Hao] hot fix for ByteWritables.copyBytes
      5479450c
  3. Dec 18, 2014
    • wangfei's avatar
      [SPARK-4861][SQL] Refactory command in spark sql · c3d91da5
      wangfei authored
      Remove ```Command``` and use ```RunnableCommand``` instead.
      
      Author: wangfei <wangfei1@huawei.com>
      Author: scwf <wangfei1@huawei.com>
      
      Closes #3712 from scwf/cmd and squashes the following commits:
      
      51a82f2 [wangfei] fix test failure
      0e03be8 [wangfei] address comments
      4033bed [scwf] remove CreateTableAsSelect in hivestrategy
      5d20010 [wangfei] address comments
      125f542 [scwf] factory command in spark sql
      c3d91da5
    • Cheng Hao's avatar
      [SPARK-4573] [SQL] Add SettableStructObjectInspector support in "wrap" function · ae9f1286
      Cheng Hao authored
      Hive UDAF may create an customized object constructed by SettableStructObjectInspector, this is critical when integrate Hive UDAF with the refactor-ed UDAF interface.
      
      Performance issue in `wrap/unwrap` since more match cases added, will do it in another PR.
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #3429 from chenghao-intel/settable_oi and squashes the following commits:
      
      9f0aff3 [Cheng Hao] update code style issues as feedbacks
      2b0561d [Cheng Hao] Add more scala doc
      f5a40e8 [Cheng Hao] add scala doc
      2977e9b [Cheng Hao] remove the timezone setting for test suite
      3ed284c [Cheng Hao] fix the date type comparison
      f1b6749 [Cheng Hao] Update the comment
      932940d [Cheng Hao] Add more unit test
      72e4332 [Cheng Hao] Add settable StructObjectInspector support
      ae9f1286
    • ravipesala's avatar
      [SPARK-2554][SQL] Supporting SumDistinct partial aggregation · 7687415c
      ravipesala authored
      Adding support to the partial aggregation of SumDistinct
      
      Author: ravipesala <ravindra.pesala@huawei.com>
      
      Closes #3348 from ravipesala/SPARK-2554 and squashes the following commits:
      
      fd28e4d [ravipesala] Fixed review comments
      e60e67f [ravipesala] Fixed test cases and made it as nullable
      32fe234 [ravipesala] Supporting SumDistinct partial aggregation Conflicts: 	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala
      7687415c
    • YanTangZhai's avatar
      [SPARK-4693] [SQL] PruningPredicates may be wrong if predicates contains an... · e7de7e5f
      YanTangZhai authored
      [SPARK-4693] [SQL] PruningPredicates may be wrong if predicates contains an empty AttributeSet() references
      
      The sql "select * from spark_test::for_test where abs(20141202) is not null" has predicates=List(IS NOT NULL HiveSimpleUdf#org.apache.hadoop.hive.ql.udf.UDFAbs(20141202)) and
      partitionKeyIds=AttributeSet(). PruningPredicates is List(IS NOT NULL HiveSimpleUdf#org.apache.hadoop.hive.ql.udf.UDFAbs(20141202)). Then the exception "java.lang.IllegalArgumentException: requirement failed: Partition pruning predicates only supported for partitioned tables." is thrown.
      The sql "select * from spark_test::for_test_partitioned_table where abs(20141202) is not null and type_id=11 and platform = 3" with partitioned key insert_date has predicates=List(IS NOT NULL HiveSimpleUdf#org.apache.hadoop.hive.ql.udf.UDFAbs(20141202), (type_id#12 = 11), (platform#8 = 3)) and partitionKeyIds=AttributeSet(insert_date#24). PruningPredicates is List(IS NOT NULL HiveSimpleUdf#org.apache.hadoop.hive.ql.udf.UDFAbs(20141202)).
      
      Author: YanTangZhai <hakeemzhai@tencent.com>
      Author: yantangzhai <tyz0303@163.com>
      
      Closes #3556 from YanTangZhai/SPARK-4693 and squashes the following commits:
      
      620ebe3 [yantangzhai] [SPARK-4693] [SQL] PruningPredicates may be wrong if predicates contains an empty AttributeSet() references
      37cfdf5 [yantangzhai] [SPARK-4693] [SQL] PruningPredicates may be wrong if predicates contains an empty AttributeSet() references
      70a3544 [yantangzhai] [SPARK-4693] [SQL] PruningPredicates may be wrong if predicates contains an empty AttributeSet() references
      efa9b03 [YanTangZhai] Update HiveQuerySuite.scala
      72accf1 [YanTangZhai] Update HiveQuerySuite.scala
      e572b9a [YanTangZhai] Update HiveStrategies.scala
      6e643f8 [YanTangZhai] Merge pull request #11 from apache/master
      e249846 [YanTangZhai] Merge pull request #10 from apache/master
      d26d982 [YanTangZhai] Merge pull request #9 from apache/master
      76d4027 [YanTangZhai] Merge pull request #8 from apache/master
      03b62b0 [YanTangZhai] Merge pull request #7 from apache/master
      8a00106 [YanTangZhai] Merge pull request #6 from apache/master
      cbcba66 [YanTangZhai] Merge pull request #3 from apache/master
      cdef539 [YanTangZhai] Merge pull request #1 from apache/master
      e7de7e5f
    • guowei2's avatar
      [SPARK-4756][SQL] FIX: sessionToActivePool grow infinitely, even as sessions expire · 22ddb6e0
      guowei2 authored
      **sessionToActivePool** in **SparkSQLOperationManager** grow infinitely, even as sessions expire.
      we should remove the pool value when the session closed, even though **sessionToActivePool** would not exist in all of sessions.
      
      Author: guowei2 <guowei2@asiainfo.com>
      
      Closes #3617 from guowei2/SPARK-4756 and squashes the following commits:
      
      e9b97b8 [guowei2] fix compile bug with Shim12
      cf0f521 [guowei2] Merge remote-tracking branch 'apache/master' into SPARK-4756
      e070998 [guowei2] fix: remove active pool of the session when it expired
      22ddb6e0
    • Thu Kyaw's avatar
      [SPARK-3928][SQL] Support wildcard matches on Parquet files. · b68bc6d2
      Thu Kyaw authored
      ...arquetFile accept hadoop glob pattern in path.
      
      Author: Thu Kyaw <trk007@gmail.com>
      
      Closes #3407 from tkyaw/master and squashes the following commits:
      
      19115ad [Thu Kyaw] Merge https://github.com/apache/spark
      ceded32 [Thu Kyaw] [SPARK-3928][SQL] Support wildcard matches on Parquet files.
      d322c28 [Thu Kyaw] [SPARK-3928][SQL] Support wildcard matches on Parquet files.
      ce677c6 [Thu Kyaw] [SPARK-3928][SQL] Support wildcard matches on Parquet files.
      b68bc6d2
    • Cheng Hao's avatar
      [SPARK-2663] [SQL] Support the Grouping Set · f728e0fe
      Cheng Hao authored
      Add support for `GROUPING SETS`, `ROLLUP`, `CUBE` and the the virtual column `GROUPING__ID`.
      
      More details on how to use the `GROUPING SETS" can be found at: https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation,+Cube,+Grouping+and+Rollup
      https://issues.apache.org/jira/secure/attachment/12676811/grouping_set.pdf
      
      The generic idea of the implementations are :
      1 Replace the `ROLLUP`, `CUBE` with `GROUPING SETS`
      2 Explode each of the input row, and then feed them to `Aggregate`
        * Each grouping set are represented as the bit mask for the `GroupBy Expression List`, for each bit, `1` means the expression is selected, otherwise `0` (left is the lower bit, and right is the higher bit in the `GroupBy Expression List`)
        * Several of projections are constructed according to the grouping sets, and within each projection(Seq[Expression), we replace those expressions with `Literal(null)` if it's not selected in the grouping set (based on the bit mask)
        * Output Schema of `Explode` is `child.output :+ grouping__id`
        * GroupBy Expressions of `Aggregate` is `GroupBy Expression List :+ grouping__id`
        * Keep the `Aggregation expressions` the same for the `Aggregate`
      
      The expressions substitutions happen in Logic Plan analyzing, so we will benefit from the Logical Plan optimization (e.g. expression constant folding, and map side aggregation etc.), Only an `Explosive` operator added for Physical Plan, which will explode the rows according the pre-set projections.
      
      A known issue will be done in the follow up PR:
      * Optimization `ColumnPruning` is not supported yet for `Explosive` node.
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #1567 from chenghao-intel/grouping_sets and squashes the following commits:
      
      fe65fcc [Cheng Hao] Remove the extra space
      3547056 [Cheng Hao] Add more doc and Simplify the Expand
      a7c869d [Cheng Hao] update code as feedbacks
      d23c672 [Cheng Hao] Add GroupingExpression to replace the Seq[Expression]
      414b165 [Cheng Hao] revert the unnecessary changes
      ec276c6 [Cheng Hao] Support Rollup/Cube/GroupingSets
      f728e0fe
  4. Dec 17, 2014
    • Venkata Ramana Gollamudi's avatar
      [SPARK-3891][SQL] Add array support to percentile, percentile_approx and... · f33d5504
      Venkata Ramana Gollamudi authored
      [SPARK-3891][SQL] Add array support to percentile, percentile_approx and constant inspectors support
      
      Supported passing array to percentile and percentile_approx UDAFs
      To support percentile_approx,  constant inspectors are supported for GenericUDAF
      Constant folding support added to CreateArray expression
      Avoided constant udf expression re-evaluation
      
      Author: Venkata Ramana G <ramana.gollamudihuawei.com>
      
      Author: Venkata Ramana Gollamudi <ramana.gollamudi@huawei.com>
      
      Closes #2802 from gvramana/percentile_array_support and squashes the following commits:
      
      a0182e5 [Venkata Ramana Gollamudi] fixed review comment
      a18f917 [Venkata Ramana Gollamudi] avoid constant udf expression re-evaluation - fixes failure due to return iterator and value type mismatch
      c46db0f [Venkata Ramana Gollamudi] Removed TestHive reset
      4d39105 [Venkata Ramana Gollamudi] Unified inspector creation, style check fixes
      f37fd69 [Venkata Ramana Gollamudi] Fixed review comments
      47f6365 [Venkata Ramana Gollamudi] fixed test
      cb7c61e [Venkata Ramana Gollamudi] Supported ConstantInspector for UDAF Fixed HiveUdaf wrap object issue.
      7f94aff [Venkata Ramana Gollamudi] Added foldable support to CreateArray
      f33d5504
    • Cheng Hao's avatar
      [SPARK-4856] [SQL] NullType instead of StringType when sampling against empty string or nul... · 8d0d2a65
      Cheng Hao authored
      ```
      TestSQLContext.sparkContext.parallelize(
        """{"ip":"27.31.100.29","headers":{"Host":"1.abc.com","Charset":"UTF-8"}}""" ::
        """{"ip":"27.31.100.29","headers":{}}""" ::
        """{"ip":"27.31.100.29","headers":""}""" :: Nil)
      ```
      As empty string (the "headers") will be considered as String in the beginning (in line 2 and 3), it ignores the real nested data type (struct type "headers" in line 1), and also take the line 1 (the "headers") as String Type, which is not our expected.
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #3708 from chenghao-intel/json and squashes the following commits:
      
      e7a72e9 [Cheng Hao] add more concise unit test
      853de51 [Cheng Hao] NullType instead of StringType when sampling against empty string or null value
      8d0d2a65
    • Michael Armbrust's avatar
      [HOTFIX][SQL] Fix parquet filter suite · 19c0faad
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #3727 from marmbrus/parquetNotEq and squashes the following commits:
      
      2157bfc [Michael Armbrust] Fix parquet filter suite
      19c0faad
    • Cheng Hao's avatar
      [SPARK-3739] [SQL] Update the split num base on block size for table scanning · 636d9fc4
      Cheng Hao authored
      In local mode, Hadoop/Hive will ignore the "mapred.map.tasks", hence for small table file, it's always a single input split, however, SparkSQL doesn't honor that in table scanning, and we will get different result when do the Hive Compatibility test. This PR will fix that.
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #2589 from chenghao-intel/source_split and squashes the following commits:
      
      dff38e7 [Cheng Hao] Remove the extra blank line
      160a2b6 [Cheng Hao] fix the compiling bug
      04d67f7 [Cheng Hao] Keep 1 split for small file in table scanning
      636d9fc4
    • Daoyuan Wang's avatar
      [SPARK-4755] [SQL] sqrt(negative value) should return null · 902e4d54
      Daoyuan Wang authored
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      
      Closes #3616 from adrian-wang/sqrt and squashes the following commits:
      
      d877439 [Daoyuan Wang] fix NULLTYPE
      3effa2c [Daoyuan Wang] sqrt(negative value) should return null
      902e4d54
    • Cheng Lian's avatar
      [SPARK-4493][SQL] Don't pushdown Eq, NotEq, Lt, LtEq, Gt and GtEq predicates with nulls for Parquet · 62771353
      Cheng Lian authored
      Predicates like `a = NULL` and `a < NULL` can't be pushed down since Parquet `Lt`, `LtEq`, `Gt`, `GtEq` doesn't accept null value. Note that `Eq` and `NotEq` can only be used with `null` to represent predicates like `a IS NULL` and `a IS NOT NULL`.
      
      However, normally this issue doesn't cause NPE because any value compared to `NULL` results `NULL`, and Spark SQL automatically optimizes out `NULL` predicate in the `SimplifyFilters` rule. Only testing code that intentionally disables the optimizer may trigger this issue. (That's why this issue is not marked as blocker and I do **NOT** think we need to backport this to branch-1.1
      
      This PR restricts `Lt`, `LtEq`, `Gt` and `GtEq` to non-null values only, and only uses `Eq` with null value to pushdown `IsNull` and `IsNotNull`. Also, added support for Parquet `NotEq` filter for completeness and (tiny) performance gain, it's also used to pushdown `IsNotNull`.
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3367)
      <!-- Reviewable:end -->
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #3367 from liancheng/filters-with-null and squashes the following commits:
      
      cc41281 [Cheng Lian] Fixes several styling issues
      de7de28 [Cheng Lian] Adds stricter rules for Parquet filters with null
      62771353
    • Michael Armbrust's avatar
      [SPARK-3698][SQL] Fix case insensitive resolution of GetField. · 7ad579ee
      Michael Armbrust authored
      Based on #2543.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #3724 from marmbrus/resolveGetField and squashes the following commits:
      
      0a47aae [Michael Armbrust] Fix case insensitive resolution of GetField.
      7ad579ee
    • carlmartin's avatar
      [SPARK-4694]Fix HiveThriftServer2 cann't stop In Yarn HA mode. · 4782def0
      carlmartin authored
      HiveThriftServer2 can not exit automactic when changing the standy resource manager in Yarn HA mode.
      The scheduler backend was aware of the AM had been exited so it call sc.stop to exit the driver process but there was a user thread(HiveThriftServer2 ) which was still alive and cause this problem.
      To fix it, make a demo thread to detect the sparkContext is null or not.If the sc is stopped, call the ThriftServer.stop to stop the user thread.
      
      Author: carlmartin <carlmartinmax@gmail.com>
      
      Closes #3576 from SaintBacchus/ThriftServer2ExitBug and squashes the following commits:
      
      2890b4a [carlmartin] Use SparkListener instead of the demo thread to stop the hive server.
      c15da0e [carlmartin] HiveThriftServer2 can not exit automactic when changing the standy resource manager in Yarn HA mode
      4782def0
    • Cheng Hao's avatar
      [SPARK-4625] [SQL] Add sort by for DSL & SimpleSqlParser · 5fdcbdc0
      Cheng Hao authored
      Add `sort by` support for both DSL & SqlParser.
      
      This PR is relevant with #3386, either one merged, will cause the other rebased.
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #3481 from chenghao-intel/sortby and squashes the following commits:
      
      041004f [Cheng Hao] Add sort by for DSL & SimpleSqlParser
      5fdcbdc0
  5. Dec 16, 2014
    • scwf's avatar
      [SPARK-4618][SQL] Make foreign DDL commands options case-insensitive · 60698801
      scwf authored
      Using lowercase for ```options``` key to make it case-insensitive, then we should use lower case to get value from parameters.
      So flowing cmd work
      ```
            create temporary table normal_parquet
            USING org.apache.spark.sql.parquet
            OPTIONS (
              PATH '/xxx/data'
            )
      ```
      
      Author: scwf <wangfei1@huawei.com>
      Author: wangfei <wangfei1@huawei.com>
      
      Closes #3470 from scwf/ddl-ulcase and squashes the following commits:
      
      ae78509 [scwf] address comments
      8f4f585 [wangfei] address comments
      3c132ef [scwf] minor fix
      a0fc20b [scwf] Merge branch 'master' of https://github.com/apache/spark into ddl-ulcase
      4f86401 [scwf] adding CaseInsensitiveMap
      e244e8d [wangfei] using lower case in json
      e0cb017 [wangfei] make options in-casesensitive
      60698801
    • Davies Liu's avatar
      [SPARK-4866] support StructType as key in MapType · ec5c4279
      Davies Liu authored
      This PR brings support of using StructType(and other hashable types) as key in MapType.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #3714 from davies/fix_struct_in_map and squashes the following commits:
      
      68585d7 [Davies Liu] fix primitive types in MapType
      9601534 [Davies Liu] support StructType as key in MapType
      ec5c4279
    • Cheng Hao's avatar
      [SPARK-4375] [SQL] Add 0 argument support for udf · 770d8153
      Cheng Hao authored
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #3595 from chenghao-intel/udf0 and squashes the following commits:
      
      a858973 [Cheng Hao] Add 0 arguments support for udf
      770d8153
    • Takuya UESHIN's avatar
      [SPARK-4720][SQL] Remainder should also return null if the divider is 0. · ddc7ba31
      Takuya UESHIN authored
      This is a follow-up of SPARK-4593 (#3443).
      
      Author: Takuya UESHIN <ueshin@happy-camper.st>
      
      Closes #3581 from ueshin/issues/SPARK-4720 and squashes the following commits:
      
      c3959d4 [Takuya UESHIN] Make Remainder return null if the divider is 0.
      ddc7ba31
    • Cheng Hao's avatar
      [SPARK-4744] [SQL] Short circuit evaluation for AND & OR in CodeGen · 0aa834ad
      Cheng Hao authored
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #3606 from chenghao-intel/codegen_short_circuit and squashes the following commits:
      
      f466303 [Cheng Hao] short circuit for AND & OR
      0aa834ad
    • Cheng Lian's avatar
      [SPARK-4798][SQL] A new set of Parquet testing API and test suites · 3b395e10
      Cheng Lian authored
      This PR provides a set Parquet testing API (see trait `ParquetTest`) that enables developers to write more concise test cases. A new set of Parquet test suites built upon this API  are added and aim to replace the old `ParquetQuerySuite`. To avoid potential merge conflicts, old testing code are not removed yet. The following classes can be safely removed after most Parquet related PRs are handled:
      
      - `ParquetQuerySuite`
      - `ParquetTestData`
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3644)
      <!-- Reviewable:end -->
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #3644 from liancheng/parquet-tests and squashes the following commits:
      
      800e745 [Cheng Lian] Enforces ordering of test output
      3bb8731 [Cheng Lian] Refactors HiveParquetSuite
      aa2cb2e [Cheng Lian] Decouples ParquetTest and TestSQLContext
      7b43a68 [Cheng Lian] Updates ParquetTest Scaladoc
      7f07af0 [Cheng Lian] Adds a new set of Parquet test suites
      3b395e10
    • Jacky Li's avatar
      [SPARK-4269][SQL] make wait time configurable in BroadcastHashJoin · fa66ef6c
      Jacky Li authored
      In BroadcastHashJoin, currently it is using a hard coded value (5 minutes) to wait for the execution and broadcast of the small table.
      In my opinion, it should be a configurable value since broadcast may exceed 5 minutes in some case, like in a busy/congested network environment.
      
      Author: Jacky Li <jacky.likun@huawei.com>
      
      Closes #3133 from jackylk/timeout-config and squashes the following commits:
      
      733ac08 [Jacky Li] add spark.sql.broadcastTimeout in SQLConf.scala
      557acd4 [Jacky Li] switch to sqlContext.getConf
      81a5e20 [Jacky Li] make wait time configurable in BroadcastHashJoin
      fa66ef6c
    • Michael Armbrust's avatar
      [SPARK-4827][SQL] Fix resolution of deeply nested Project(attr, Project(Star,...)). · a66c23e1
      Michael Armbrust authored
      Since `AttributeReference` resolution and `*` expansion are currently in separate rules, each pair requires a full iteration instead of being able to resolve in a single pass.  Since its pretty easy to construct queries that have many of these in a row, I combine them into a single rule in this PR.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #3674 from marmbrus/projectStars and squashes the following commits:
      
      d83d6a1 [Michael Armbrust] Fix resolution of deeply nested Project(attr, Project(Star,...)).
      a66c23e1
    • tianyi's avatar
      [SPARK-4483][SQL]Optimization about reduce memory costs during the HashOuterJoin · 30f6b85c
      tianyi authored
      In `HashOuterJoin.scala`, spark read data from both side of join operation before zip them together. It is a waste for memory. We are trying to read data from only one side, put them into a hashmap, and then generate the `JoinedRow` with data from other side one by one.
      Currently, we could only do this optimization for `left outer join` and `right outer join`. For `full outer join`, we will do something in another issue.
      
      for
      table test_csv contains 1 million records
      table dim_csv contains 10 thousand records
      
      SQL:
      `select * from test_csv a left outer join dim_csv b on a.key = b.key`
      
      the result is:
      master:
      ```
      CSV: 12671 ms
      CSV: 9021 ms
      CSV: 9200 ms
      Current Mem Usage:787788984
      ```
      after patch:
      ```
      CSV: 10382 ms
      CSV: 7543 ms
      CSV: 7469 ms
      Current Mem Usage:208145728
      ```
      
      Author: tianyi <tianyi@asiainfo-linkage.com>
      Author: tianyi <tianyi.asiainfo@gmail.com>
      
      Closes #3375 from tianyi/SPARK-4483 and squashes the following commits:
      
      72a8aec [tianyi] avoid having mutable state stored inside of the task
      99c5c97 [tianyi] performance optimization
      d2f94d7 [tianyi] fix bug: missing output when the join-key is null.
      2be45d1 [tianyi] fix spell bug
      1f2c6f1 [tianyi] remove commented codes
      a676de6 [tianyi] optimize some codes
      9e7d5b5 [tianyi] remove commented old codes
      838707d [tianyi] Optimization about reduce memory costs during the HashOuterJoin
      30f6b85c
    • wangxiaojing's avatar
      [SPARK-4527][SQl]Add BroadcastNestedLoopJoin operator selection testsuite · ea1315e3
      wangxiaojing authored
      In `JoinSuite` add BroadcastNestedLoopJoin operator selection testsuite
      
      Author: wangxiaojing <u9jing@gmail.com>
      
      Closes #3395 from wangxiaojing/SPARK-4527 and squashes the following commits:
      
      ea0e495 [wangxiaojing] change style
      53c3952 [wangxiaojing] Add BroadcastNestedLoopJoin operator selection testsuite
      ea1315e3
    • zsxwing's avatar
      [SPARK-4812][SQL] Fix the initialization issue of 'codegenEnabled' · 6530243a
      zsxwing authored
      The problem is `codegenEnabled` is `val`, but it uses a `val` `sqlContext`, which can be override by subclasses. Here is a simple example to show this issue.
      
      ```Scala
      scala> :paste
      // Entering paste mode (ctrl-D to finish)
      
      abstract class Foo {
      
        protected val sqlContext = "Foo"
      
        val codegenEnabled: Boolean = {
          println(sqlContext) // it will call subclass's `sqlContext` which has not yet been initialized.
          if (sqlContext != null) {
            true
          } else {
            false
          }
        }
      }
      
      class Bar extends Foo {
        override val sqlContext = "Bar"
      }
      
      println(new Bar().codegenEnabled)
      
      // Exiting paste mode, now interpreting.
      
      null
      false
      defined class Foo
      defined class Bar
      ```
      
      We should make `sqlContext` `final` to prevent subclasses from overriding it incorrectly.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #3660 from zsxwing/SPARK-4812 and squashes the following commits:
      
      1cbb623 [zsxwing] Make `sqlContext` final to prevent subclasses from overriding it incorrectly
      6530243a
    • jerryshao's avatar
      [SPARK-4847][SQL]Fix "extraStrategies cannot take effect in SQLContext" issue · dc8280dc
      jerryshao authored
      Author: jerryshao <saisai.shao@intel.com>
      
      Closes #3698 from jerryshao/SPARK-4847 and squashes the following commits:
      
      4741130 [jerryshao] Make later added extraStrategies effect when calling strategies
      dc8280dc
    • Judy Nash's avatar
      [SQL] SPARK-4700: Add HTTP protocol spark thrift server · 17688d14
      Judy Nash authored
      Add HTTP protocol support and test cases to spark thrift server, so users can deploy thrift server in both TCP and http mode.
      
      Author: Judy Nash <judynash@microsoft.com>
      Author: judynash <judynash@microsoft.com>
      
      Closes #3672 from judynash/master and squashes the following commits:
      
      526315d [Judy Nash] correct spacing on startThriftServer method
      31a6520 [Judy Nash] fix code style issues and update sql programming guide format issue
      47bf87e [Judy Nash] modify withJdbcStatement method definition to meet less than 100 line length
      2e9c11c [Judy Nash] add thrift server in http mode documentation on sql programming guide
      1cbd305 [Judy Nash] Merge remote-tracking branch 'upstream/master'
      2b1d312 [Judy Nash] updated http thrift server support based on feedback
      377532c [judynash] add HTTP protocol spark thrift server
      17688d14
  6. Dec 15, 2014
    • Sean Owen's avatar
      SPARK-4814 [CORE] Enable assertions in SBT, Maven tests / AssertionError from... · 81112e4b
      Sean Owen authored
      SPARK-4814 [CORE] Enable assertions in SBT, Maven tests / AssertionError from Hive's LazyBinaryInteger
      
      This enables assertions for the Maven and SBT build, but overrides the Hive module to not enable assertions.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #3692 from srowen/SPARK-4814 and squashes the following commits:
      
      caca704 [Sean Owen] Disable assertions just for Hive
      f71e783 [Sean Owen] Enable assertions for SBT and Maven build
      81112e4b
  7. Dec 12, 2014
    • Daoyuan Wang's avatar
      [SPARK-4829] [SQL] add rule to fold count(expr) if expr is not null · 41a3f934
      Daoyuan Wang authored
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      
      Closes #3676 from adrian-wang/countexpr and squashes the following commits:
      
      dc5765b [Daoyuan Wang] add rule to fold count(expr) if expr is not null
      41a3f934
    • Sasaki Toru's avatar
      [SPARK-4742][SQL] The name of Parquet File generated by... · 8091dd62
      Sasaki Toru authored
      [SPARK-4742][SQL] The name of Parquet File generated by AppendingParquetOutputFormat should be zero padded
      
      When I use Parquet File as a output file using ParquetOutputFormat#getDefaultWorkFile, the file name is not zero padded while RDD#saveAsText does zero padding.
      
      Author: Sasaki Toru <sasakitoa@nttdata.co.jp>
      
      Closes #3602 from sasakitoa/parquet-zeroPadding and squashes the following commits:
      
      6b0e58f [Sasaki Toru] Merge branch 'master' of git://github.com/apache/spark into parquet-zeroPadding
      20dc79d [Sasaki Toru] Fixed the name of Parquet File generated by AppendingParquetOutputFormat
      8091dd62
    • Cheng Hao's avatar
      [SPARK-4825] [SQL] CTAS fails to resolve when created using saveAsTable · 0abbff28
      Cheng Hao authored
      Fix bug when query like:
      ```
        test("save join to table") {
          val testData = sparkContext.parallelize(1 to 10).map(i => TestData(i, i.toString))
          sql("CREATE TABLE test1 (key INT, value STRING)")
          testData.insertInto("test1")
          sql("CREATE TABLE test2 (key INT, value STRING)")
          testData.insertInto("test2")
          testData.insertInto("test2")
          sql("SELECT COUNT(a.value) FROM test1 a JOIN test2 b ON a.key = b.key").saveAsTable("test")
          checkAnswer(
            table("test"),
            sql("SELECT COUNT(a.value) FROM test1 a JOIN test2 b ON a.key = b.key").collect().toSeq)
        }
      ```
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #3673 from chenghao-intel/spark_4825 and squashes the following commits:
      
      e8cbd56 [Cheng Hao] alternate the pattern matching order for logical plan:CTAS
      e004895 [Cheng Hao] fix bug
      0abbff28
    • Daoyuan Wang's avatar
      [SQL] enable empty aggr test case · cbb634ae
      Daoyuan Wang authored
      This is fixed by SPARK-4318 #3184
      
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      
      Closes #3445 from adrian-wang/emptyaggr and squashes the following commits:
      
      982575e [Daoyuan Wang] enable empty aggr test case
      cbb634ae
    • Daoyuan Wang's avatar
      [SPARK-4828] [SQL] sum and avg on empty table should always return null · acb3be6b
      Daoyuan Wang authored
      So the optimizations are not valid. Also I think the optimization here is rarely encounter, so removing them will not have influence on performance.
      
      Can we merge #3445 before I add a comparison test case from this?
      
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      
      Closes #3675 from adrian-wang/sumempty and squashes the following commits:
      
      42df763 [Daoyuan Wang] sum and avg on empty table should always return null
      acb3be6b
    • scwf's avatar
      [SQL] Remove unnecessary case in HiveContext.toHiveString · d8cf6785
      scwf authored
      a follow up of #3547
      /cc marmbrus
      
      Author: scwf <wangfei1@huawei.com>
      
      Closes #3563 from scwf/rnc and squashes the following commits:
      
      9395661 [scwf] remove unnecessary condition
      d8cf6785
    • Takuya UESHIN's avatar
      [SPARK-4293][SQL] Make Cast be able to handle complex types. · 33448036
      Takuya UESHIN authored
      Inserting data of type including `ArrayType.containsNull == false` or `MapType.valueContainsNull == false` or `StructType.fields.exists(_.nullable == false)` into Hive table will fail because `Cast` inserted by `HiveMetastoreCatalog.PreInsertionCasts` rule of `Analyzer` can't handle these types correctly.
      
      Complex type cast rule proposal:
      
      - Cast for non-complex types should be able to cast the same as before.
      - Cast for `ArrayType` can evaluate if
        - Element type can cast
        - Nullability rule doesn't break
      - Cast for `MapType` can evaluate if
        - Key type can cast
        - Nullability for casted key type is `false`
        - Value type can cast
        - Nullability rule for value type doesn't break
      - Cast for `StructType` can evaluate if
        - The field size is the same
        - Each field can cast
        - Nullability rule for each field doesn't break
      - The nested structure should be the same.
      
      Nullability rule:
      
      - If the casted type is `nullable == true`, the target nullability should be `true`
      
      Author: Takuya UESHIN <ueshin@happy-camper.st>
      
      Closes #3150 from ueshin/issues/SPARK-4293 and squashes the following commits:
      
      e935939 [Takuya UESHIN] Merge branch 'master' into issues/SPARK-4293
      ba14003 [Takuya UESHIN] Merge branch 'master' into issues/SPARK-4293
      8999868 [Takuya UESHIN] Fix a test title.
      f677c30 [Takuya UESHIN] Merge branch 'master' into issues/SPARK-4293
      287f410 [Takuya UESHIN] Add tests to insert data of types ArrayType / MapType / StructType with nullability is false into Hive table.
      4f71bb8 [Takuya UESHIN] Make Cast be able to handle complex types.
      33448036
    • Jacky Li's avatar
      [SPARK-4639] [SQL] Pass maxIterations in as a parameter in Analyzer · c152dde7
      Jacky Li authored
      fix a TODO in Analyzer:
      // TODO: pass this in as a parameter
      val fixedPoint = FixedPoint(100)
      
      Author: Jacky Li <jacky.likun@huawei.com>
      
      Closes #3499 from jackylk/config and squashes the following commits:
      
      4c1252c [Jacky Li] fix scalastyle
      820f460 [Jacky Li] pass maxIterations in as a parameter
      c152dde7
Loading