Skip to content
Snippets Groups Projects
  • Zhan Zhang's avatar
    7c89a8f0
    [SPARK-2706][SQL] Enable Spark to support Hive 0.13 · 7c89a8f0
    Zhan Zhang authored
    Given that a lot of users are trying to use hive 0.13 in spark, and the incompatibility between hive-0.12 and hive-0.13 on the API level I want to propose following approach, which has no or minimum impact on existing hive-0.12 support, but be able to jumpstart the development of hive-0.13 and future version support.
    
    Approach: Introduce “hive-version” property,  and manipulate pom.xml files to support different hive version at compiling time through shim layer, e.g., hive-0.12.0 and hive-0.13.1. More specifically,
    
    1. For each different hive version, there is a very light layer of shim code to handle API differences, sitting in sql/hive/hive-version, e.g., sql/hive/v0.12.0 or sql/hive/v0.13.1
    
    2. Add a new profile hive-default active by default, which picks up all existing configuration and hive-0.12.0 shim (v0.12.0)  if no hive.version is specified.
    
    3. If user specifies different version (currently only 0.13.1 by -Dhive.version = 0.13.1), hive-versions profile will be activated, which pick up hive-version specific shim layer and configuration, mainly the hive jars and hive-version shim, e.g., v0.13.1.
    
    4. With this approach, nothing is changed with current hive-0.12 support.
    
    No change by default: sbt/sbt -Phive
    For example: sbt/sbt -Phive -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 assembly
    
    To enable hive-0.13: sbt/sbt -Dhive.version=0.13.1
    For example: sbt/sbt -Dhive.version=0.13.1 -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 assembly
    
    Note that in hive-0.13, hive-thriftserver is not enabled, which should be fixed by other Jira, and we don’t need -Phive with -Dhive.version in building (probably we should use -Phive -Dhive.version=xxx instead after thrift server is also supported in hive-0.13.1).
    
    Author: Zhan Zhang <zhazhan@gmail.com>
    Author: zhzhan <zhazhan@gmail.com>
    Author: Patrick Wendell <pwendell@gmail.com>
    
    Closes #2241 from zhzhan/spark-2706 and squashes the following commits:
    
    3ece905 [Zhan Zhang] minor fix
    410b668 [Zhan Zhang] solve review comments
    cbb4691 [Zhan Zhang] change run-test for new options
    0d4d2ed [Zhan Zhang] rebase
    497b0f4 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
    8fad1cf [Zhan Zhang] change the pom file and make hive-0.13.1 as the default
    ab028d1 [Zhan Zhang] rebase
    4a2e36d [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
    4cb1b93 [zhzhan] Merge pull request #1 from pwendell/pr-2241
    b0478c0 [Patrick Wendell] Changes to simplify the build of SPARK-2706
    2b50502 [Zhan Zhang] rebase
    a72c0d4 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
    cb22863 [Zhan Zhang] correct the typo
    20f6cf7 [Zhan Zhang] solve compatability issue
    f7912a9 [Zhan Zhang] rebase and solve review feedback
    301eb4a [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
    10c3565 [Zhan Zhang] address review comments
    6bc9204 [Zhan Zhang] rebase and remove temparory repo
    d3aa3f2 [Zhan Zhang] Merge branch 'master' into spark-2706
    cedcc6f [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
    3ced0d7 [Zhan Zhang] rebase
    d9b981d [Zhan Zhang] rebase and fix error due to rollback
    adf4924 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
    3dd50e8 [Zhan Zhang] solve conflicts and remove unnecessary implicts
    d10bf00 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
    dc7bdb3 [Zhan Zhang] solve conflicts
    7e0cc36 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
    d7c3e1e [Zhan Zhang] Merge branch 'master' into spark-2706
    68deb11 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
    d48bd18 [Zhan Zhang] address review comments
    3ee3b2b [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
    57ea52e [Zhan Zhang] Merge branch 'master' into spark-2706
    2b0d513 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
    9412d24 [Zhan Zhang] address review comments
    f4af934 [Zhan Zhang] rebase
    1ccd7cc [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
    128b60b [Zhan Zhang] ignore 0.12.0 test cases for the time being
    af9feb9 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
    5f5619f [Zhan Zhang] restructure the directory and different hive version support
    05d3683 [Zhan Zhang] solve conflicts
    e4c1982 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
    94b4fdc [Zhan Zhang] Spark-2706: hive-0.13.1 support on spark
    87ebf3b [Zhan Zhang] Merge branch 'master' into spark-2706
    921e914 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
    f896b2a [Zhan Zhang] Merge branch 'master' into spark-2706
    789ea21 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
    cb53a2c [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
    f6a8a40 [Zhan Zhang] revert
    ba14f28 [Zhan Zhang] test
    dbedff3 [Zhan Zhang] Merge remote-tracking branch 'upstream/master'
    70964fe [Zhan Zhang] revert
    fe0f379 [Zhan Zhang] Merge branch 'master' of https://github.com/zhzhan/spark
    70ffd93 [Zhan Zhang] revert
    42585ec [Zhan Zhang] test
    7d5fce2 [Zhan Zhang] test
    7c89a8f0
    History
    [SPARK-2706][SQL] Enable Spark to support Hive 0.13
    Zhan Zhang authored
    Given that a lot of users are trying to use hive 0.13 in spark, and the incompatibility between hive-0.12 and hive-0.13 on the API level I want to propose following approach, which has no or minimum impact on existing hive-0.12 support, but be able to jumpstart the development of hive-0.13 and future version support.
    
    Approach: Introduce “hive-version” property,  and manipulate pom.xml files to support different hive version at compiling time through shim layer, e.g., hive-0.12.0 and hive-0.13.1. More specifically,
    
    1. For each different hive version, there is a very light layer of shim code to handle API differences, sitting in sql/hive/hive-version, e.g., sql/hive/v0.12.0 or sql/hive/v0.13.1
    
    2. Add a new profile hive-default active by default, which picks up all existing configuration and hive-0.12.0 shim (v0.12.0)  if no hive.version is specified.
    
    3. If user specifies different version (currently only 0.13.1 by -Dhive.version = 0.13.1), hive-versions profile will be activated, which pick up hive-version specific shim layer and configuration, mainly the hive jars and hive-version shim, e.g., v0.13.1.
    
    4. With this approach, nothing is changed with current hive-0.12 support.
    
    No change by default: sbt/sbt -Phive
    For example: sbt/sbt -Phive -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 assembly
    
    To enable hive-0.13: sbt/sbt -Dhive.version=0.13.1
    For example: sbt/sbt -Dhive.version=0.13.1 -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 assembly
    
    Note that in hive-0.13, hive-thriftserver is not enabled, which should be fixed by other Jira, and we don’t need -Phive with -Dhive.version in building (probably we should use -Phive -Dhive.version=xxx instead after thrift server is also supported in hive-0.13.1).
    
    Author: Zhan Zhang <zhazhan@gmail.com>
    Author: zhzhan <zhazhan@gmail.com>
    Author: Patrick Wendell <pwendell@gmail.com>
    
    Closes #2241 from zhzhan/spark-2706 and squashes the following commits:
    
    3ece905 [Zhan Zhang] minor fix
    410b668 [Zhan Zhang] solve review comments
    cbb4691 [Zhan Zhang] change run-test for new options
    0d4d2ed [Zhan Zhang] rebase
    497b0f4 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
    8fad1cf [Zhan Zhang] change the pom file and make hive-0.13.1 as the default
    ab028d1 [Zhan Zhang] rebase
    4a2e36d [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
    4cb1b93 [zhzhan] Merge pull request #1 from pwendell/pr-2241
    b0478c0 [Patrick Wendell] Changes to simplify the build of SPARK-2706
    2b50502 [Zhan Zhang] rebase
    a72c0d4 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
    cb22863 [Zhan Zhang] correct the typo
    20f6cf7 [Zhan Zhang] solve compatability issue
    f7912a9 [Zhan Zhang] rebase and solve review feedback
    301eb4a [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
    10c3565 [Zhan Zhang] address review comments
    6bc9204 [Zhan Zhang] rebase and remove temparory repo
    d3aa3f2 [Zhan Zhang] Merge branch 'master' into spark-2706
    cedcc6f [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
    3ced0d7 [Zhan Zhang] rebase
    d9b981d [Zhan Zhang] rebase and fix error due to rollback
    adf4924 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
    3dd50e8 [Zhan Zhang] solve conflicts and remove unnecessary implicts
    d10bf00 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
    dc7bdb3 [Zhan Zhang] solve conflicts
    7e0cc36 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
    d7c3e1e [Zhan Zhang] Merge branch 'master' into spark-2706
    68deb11 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
    d48bd18 [Zhan Zhang] address review comments
    3ee3b2b [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
    57ea52e [Zhan Zhang] Merge branch 'master' into spark-2706
    2b0d513 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
    9412d24 [Zhan Zhang] address review comments
    f4af934 [Zhan Zhang] rebase
    1ccd7cc [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
    128b60b [Zhan Zhang] ignore 0.12.0 test cases for the time being
    af9feb9 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
    5f5619f [Zhan Zhang] restructure the directory and different hive version support
    05d3683 [Zhan Zhang] solve conflicts
    e4c1982 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
    94b4fdc [Zhan Zhang] Spark-2706: hive-0.13.1 support on spark
    87ebf3b [Zhan Zhang] Merge branch 'master' into spark-2706
    921e914 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
    f896b2a [Zhan Zhang] Merge branch 'master' into spark-2706
    789ea21 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
    cb53a2c [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark
    f6a8a40 [Zhan Zhang] revert
    ba14f28 [Zhan Zhang] test
    dbedff3 [Zhan Zhang] Merge remote-tracking branch 'upstream/master'
    70964fe [Zhan Zhang] revert
    fe0f379 [Zhan Zhang] Merge branch 'master' of https://github.com/zhzhan/spark
    70ffd93 [Zhan Zhang] revert
    42585ec [Zhan Zhang] test
    7d5fce2 [Zhan Zhang] test