-
- Downloads
[SPARK-21422][BUILD] Depend on Apache ORC 1.4.0
## What changes were proposed in this pull request? Like Parquet, this PR aims to depend on the latest Apache ORC 1.4 for Apache Spark 2.3. There are key benefits for Apache ORC 1.4. - Stability: Apache ORC 1.4.0 has many fixes and we can depend on ORC community more. - Maintainability: Reduce the Hive dependency and can remove old legacy code later. Later, we can get the following two key benefits by adding new ORCFileFormat in SPARK-20728 (#17980), too. - Usability: User can use ORC data sources without hive module, i.e, -Phive. - Speed: Use both Spark ColumnarBatch and ORC RowBatch together. This will be faster than the current implementation in Spark. ## How was this patch tested? Pass the jenkins. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #18640 from dongjoon-hyun/SPARK-21422.
Showing
- assembly/pom.xml 6 additions, 0 deletionsassembly/pom.xml
- dev/deps/spark-deps-hadoop-2.6 3 additions, 0 deletionsdev/deps/spark-deps-hadoop-2.6
- dev/deps/spark-deps-hadoop-2.7 3 additions, 0 deletionsdev/deps/spark-deps-hadoop-2.7
- pom.xml 44 additions, 0 deletionspom.xml
- sql/core/pom.xml 10 additions, 0 deletionssql/core/pom.xml
Please register or sign in to comment