Skip to content
  • Cheng Lian's avatar
    52ae9525
    [SPARK-9974] [BUILD] [SQL] Makes sure com.twitter:parquet-hadoop-bundle:1.6.0... · 52ae9525
    Cheng Lian authored
    [SPARK-9974] [BUILD] [SQL] Makes sure com.twitter:parquet-hadoop-bundle:1.6.0 is in SBT assembly jar
    
    PR #7967 enables Spark SQL to persist Parquet tables in Hive compatible format when possible. One of the consequence is that, we have to set input/output classes to `MapredParquetInputFormat`/`MapredParquetOutputFormat`, which rely on com.twitter:parquet-hadoop:1.6.0 bundled with Hive 1.2.1.
    
    When loading such a table in Spark SQL, `o.a.h.h.ql.metadata.Table` first loads these input/output format classes, and thus classes in com.twitter:parquet-hadoop:1.6.0.  However, the scope of this dependency is defined as "runtime", and is not packaged into Spark assembly jar.  This results in a `ClassNotFoundException`.
    
    This issue can be worked around by asking users to add parquet-hadoop 1.6.0 via the `--driver-class-path` option.  However, considering Maven build is immune to this problem, I feel it can be confusing and inconvenient for users.
    
    So this PR fixes this issue by changing scope of parquet-hadoop 1.6.0 to "compile".
    
    Author: Cheng Lian <lian@databricks.com>
    
    Closes #8198 from liancheng/spark-9974/bundle-parquet-1.6.0.
    52ae9525
    [SPARK-9974] [BUILD] [SQL] Makes sure com.twitter:parquet-hadoop-bundle:1.6.0...
    Cheng Lian authored
    [SPARK-9974] [BUILD] [SQL] Makes sure com.twitter:parquet-hadoop-bundle:1.6.0 is in SBT assembly jar
    
    PR #7967 enables Spark SQL to persist Parquet tables in Hive compatible format when possible. One of the consequence is that, we have to set input/output classes to `MapredParquetInputFormat`/`MapredParquetOutputFormat`, which rely on com.twitter:parquet-hadoop:1.6.0 bundled with Hive 1.2.1.
    
    When loading such a table in Spark SQL, `o.a.h.h.ql.metadata.Table` first loads these input/output format classes, and thus classes in com.twitter:parquet-hadoop:1.6.0.  However, the scope of this dependency is defined as "runtime", and is not packaged into Spark assembly jar.  This results in a `ClassNotFoundException`.
    
    This issue can be worked around by asking users to add parquet-hadoop 1.6.0 via the `--driver-class-path` option.  However, considering Maven build is immune to this problem, I feel it can be confusing and inconvenient for users.
    
    So this PR fixes this issue by changing scope of parquet-hadoop 1.6.0 to "compile".
    
    Author: Cheng Lian <lian@databricks.com>
    
    Closes #8198 from liancheng/spark-9974/bundle-parquet-1.6.0.
Loading