Skip to content
Snippets Groups Projects
  1. Jan 10, 2016
  2. Jan 09, 2016
  3. Jan 08, 2016
    • Liang-Chi Hsieh's avatar
      [SPARK-12577] [SQL] Better support of parentheses in partition by and order by... · 95cd5d95
      Liang-Chi Hsieh authored
      [SPARK-12577] [SQL] Better support of parentheses in partition by and order by clause of window function's over clause
      
      JIRA: https://issues.apache.org/jira/browse/SPARK-12577
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #10620 from viirya/fix-parentheses.
      95cd5d95
    • Josh Rosen's avatar
      [SPARK-4628][BUILD] Remove all non-Maven-Central repositories from build · 090d6913
      Josh Rosen authored
      This patch removes all non-Maven-central repositories from Spark's build, thereby avoiding any risk of future build-breaks due to us accidentally depending on an artifact which is not present in an immutable public Maven repository.
      
      I tested this by running
      
      ```
      build/mvn \
              -Phive \
              -Phive-thriftserver \
              -Pkinesis-asl \
              -Pspark-ganglia-lgpl \
              -Pyarn \
              dependency:go-offline
      ```
      
      inside of a fresh Ubuntu Docker container with no Ivy or Maven caches (I did a similar test for SBT).
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10659 from JoshRosen/SPARK-4628.
      090d6913
    • Josh Rosen's avatar
      [SPARK-12730][TESTS] De-duplicate some test code in BlockManagerSuite · 1fdf9bbd
      Josh Rosen authored
      This patch deduplicates some test code in BlockManagerSuite. I'm splitting this change off from a larger PR in order to make things easier to review.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10667 from JoshRosen/block-mgr-tests-cleanup.
      1fdf9bbd
    • Cheng Lian's avatar
      [SPARK-12593][SQL] Converts resolved logical plan back to SQL · d9447cac
      Cheng Lian authored
      This PR tries to enable Spark SQL to convert resolved logical plans back to SQL query strings.  For now, the major use case is to canonicalize Spark SQL native view support.  The major entry point is `SQLBuilder.toSQL`, which returns an `Option[String]` if the logical plan is recognized.
      
      The current version is still in WIP status, and is quite limited.  Known limitations include:
      
      1.  The logical plan must be analyzed but not optimized
      
          The optimizer erases `Subquery` operators, which contain necessary scope information for SQL generation.  Future versions should be able to recover erased scope information by inserting subqueries when necessary.
      
      1.  The logical plan must be created using HiveQL query string
      
          Query plans generated by composing arbitrary DataFrame API combinations are not supported yet.  Operators within these query plans need to be rearranged into a canonical form that is more suitable for direct SQL generation.  For example, the following query plan
      
          ```
          Filter (a#1 < 10)
           +- MetastoreRelation default, src, None
          ```
      
          need to be canonicalized into the following form before SQL generation:
      
          ```
          Project [a#1, b#2, c#3]
           +- Filter (a#1 < 10)
               +- MetastoreRelation default, src, None
          ```
      
          Otherwise, the SQL generation process will have to handle a large number of special cases.
      
      1.  Only a fraction of expressions and basic logical plan operators are supported in this PR
      
          Currently, 95.7% (1720 out of 1798) query plans in `HiveCompatibilitySuite` can be successfully converted to SQL query strings.
      
          Known unsupported components are:
      
          - Expressions
            - Part of math expressions
            - Part of string expressions (buggy?)
            - Null expressions
            - Calendar interval literal
            - Part of date time expressions
            - Complex type creators
            - Special `NOT` expressions, e.g. `NOT LIKE` and `NOT IN`
          - Logical plan operators/patterns
            - Cube, rollup, and grouping set
            - Script transformation
            - Generator
            - Distinct aggregation patterns that fit `DistinctAggregationRewriter` analysis rule
            - Window functions
      
          Support for window functions, generators, and cubes etc. will be added in follow-up PRs.
      
      This PR leverages `HiveCompatibilitySuite` for testing SQL generation in a "round-trip" manner:
      
      *   For all select queries, we try to convert it back to SQL
      *   If the query plan is convertible, we parse the generated SQL into a new logical plan
      *   Run the new logical plan instead of the original one
      
      If the query plan is inconvertible, the test case simply falls back to the original logic.
      
      TODO
      
      - [x] Fix failed test cases
      - [x] Support for more basic expressions and logical plan operators (e.g. distinct aggregation etc.)
      - [x] Comments and documentation
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #10541 from liancheng/sql-generation.
      d9447cac
    • Sean Owen's avatar
      [SPARK-4819] Remove Guava's "Optional" from public API · 659fd9d0
      Sean Owen authored
      Replace Guava `Optional` with (an API clone of) Java 8 `java.util.Optional` (edit: and a clone of Guava `Optional`)
      
      See also https://github.com/apache/spark/pull/10512
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #10513 from srowen/SPARK-4819.
      659fd9d0
    • Thomas Graves's avatar
      [SPARK-12654] sc.wholeTextFiles with spark.hadoop.cloneConf=true fail… · 553fd7b9
      Thomas Graves authored
      …s on secure Hadoop
      
      https://issues.apache.org/jira/browse/SPARK-12654
      
      So the bug here is that WholeTextFileRDD.getPartitions has:
      val conf = getConf
      in getConf if the cloneConf=true it creates a new Hadoop Configuration. Then it uses that to create a new newJobContext.
      The newJobContext will copy credentials around, but credentials are only present in a JobConf not in a Hadoop Configuration. So basically when it is cloning the hadoop configuration its changing it from a JobConf to Configuration and dropping the credentials that were there. NewHadoopRDD just uses the conf passed in for the getPartitions (not getConf) which is why it works.
      
      Author: Thomas Graves <tgraves@staydecay.corp.gq1.yahoo.com>
      
      Closes #10651 from tgravescs/SPARK-12654.
      553fd7b9
    • Udo Klein's avatar
      fixed numVertices in transitive closure example · 8c70cb4c
      Udo Klein authored
      Author: Udo Klein <git@blinkenlight.net>
      
      Closes #10642 from udoklein/patch-2.
      8c70cb4c
    • Jeff Zhang's avatar
      [DOCUMENTATION] doc fix of job scheduling · 00d92617
      Jeff Zhang authored
      spark.shuffle.service.enabled is spark application related configuration, it is not necessary to set it in yarn-site.xml
      
      Author: Jeff Zhang <zjffdu@apache.org>
      
      Closes #10657 from zjffdu/doc-fix.
      00d92617
    • Bryan Cutler's avatar
      [SPARK-12701][CORE] FileAppender should use join to ensure writing thread completion · ea104b8f
      Bryan Cutler authored
      Changed Logging FileAppender to use join in `awaitTermination` to ensure that thread is properly finished before returning.
      
      Author: Bryan Cutler <cutlerb@gmail.com>
      
      Closes #10654 from BryanCutler/fileAppender-join-thread-SPARK-12701.
      ea104b8f
    • Liang-Chi Hsieh's avatar
      [SPARK-12687] [SQL] Support from clause surrounded by `()`. · cfe1ba56
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-12687
      
      Some queries such as `(select 1 as a) union (select 2 as a)` can't work. This patch fixes it.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #10660 from viirya/fix-union.
      cfe1ba56
    • Sean Owen's avatar
      [SPARK-12618][CORE][STREAMING][SQL] Clean up build warnings: 2.0.0 edition · b9c83533
      Sean Owen authored
      Fix most build warnings: mostly deprecated API usages. I'll annotate some of the changes below. CC rxin who is leading the charge to remove the deprecated APIs.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #10570 from srowen/SPARK-12618.
      b9c83533
    • Kousuke Saruta's avatar
      [SPARK-12692][BUILD] Scala style: check no white space before comma and colon · 794ea553
      Kousuke Saruta authored
      We should not put a white space before `,` and `:` so let's check it.
      Because there are lots of style violations, first, I'd like to add a checker, enable and let the level `warning`.
      Then, I'd like to fix the style step by step.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #10643 from sarutak/SPARK-12692.
      794ea553
  4. Jan 07, 2016
  5. Jan 06, 2016
Loading