Skip to content
Snippets Groups Projects
  • Aaron Davidson's avatar
    41065584
    SPARK-1314: Use SPARK_HIVE to determine if we include Hive in packaging · 41065584
    Aaron Davidson authored
    Previously, we based our decision regarding including datanucleus jars based on the existence of a spark-hive-assembly jar, which was incidentally built whenever "sbt assembly" is run. This means that a typical and previously supported pathway would start using hive jars.
    
    This patch has the following features/bug fixes:
    
    - Use of SPARK_HIVE (default false) to determine if we should include Hive in the assembly jar.
    - Analagous feature in Maven with -Phive (previously, there was no support for adding Hive to any of our jars produced by Maven)
    - assemble-deps fixed since we no longer use a different ASSEMBLY_DIR
    - avoid adding log message in compute-classpath.sh to the classpath :)
    
    Still TODO before mergeable:
    - We need to download the datanucleus jars outside of sbt. Perhaps we can have spark-class download them if SPARK_HIVE is set similar to how sbt downloads itself.
    - Spark SQL documentation updates.
    
    Author: Aaron Davidson <aaron@databricks.com>
    
    Closes #237 from aarondav/master and squashes the following commits:
    
    5dc4329 [Aaron Davidson] Typo fixes
    dd4f298 [Aaron Davidson] Doc update
    dd1a365 [Aaron Davidson] Eliminate need for SPARK_HIVE at runtime by d/ling datanucleus from Maven
    a9269b5 [Aaron Davidson] [WIP] Use SPARK_HIVE to determine if we include Hive in packaging
    41065584
    History
    SPARK-1314: Use SPARK_HIVE to determine if we include Hive in packaging
    Aaron Davidson authored
    Previously, we based our decision regarding including datanucleus jars based on the existence of a spark-hive-assembly jar, which was incidentally built whenever "sbt assembly" is run. This means that a typical and previously supported pathway would start using hive jars.
    
    This patch has the following features/bug fixes:
    
    - Use of SPARK_HIVE (default false) to determine if we should include Hive in the assembly jar.
    - Analagous feature in Maven with -Phive (previously, there was no support for adding Hive to any of our jars produced by Maven)
    - assemble-deps fixed since we no longer use a different ASSEMBLY_DIR
    - avoid adding log message in compute-classpath.sh to the classpath :)
    
    Still TODO before mergeable:
    - We need to download the datanucleus jars outside of sbt. Perhaps we can have spark-class download them if SPARK_HIVE is set similar to how sbt downloads itself.
    - Spark SQL documentation updates.
    
    Author: Aaron Davidson <aaron@databricks.com>
    
    Closes #237 from aarondav/master and squashes the following commits:
    
    5dc4329 [Aaron Davidson] Typo fixes
    dd4f298 [Aaron Davidson] Doc update
    dd1a365 [Aaron Davidson] Eliminate need for SPARK_HIVE at runtime by d/ling datanucleus from Maven
    a9269b5 [Aaron Davidson] [WIP] Use SPARK_HIVE to determine if we include Hive in packaging