-
- Downloads
[SPARK-4785][SQL] Initilize Hive UDFs on the driver and serialize them with a wrapper
Different from Hive 0.12.0, in Hive 0.13.1 UDF/UDAF/UDTF (aka Hive function) objects should only be initialized once on the driver side and then serialized to executors. However, not all function objects are serializable (e.g. GenericUDF doesn't implement Serializable). Hive 0.13.1 solves this issue with Kryo or XML serializer. Several utility ser/de methods are provided in class o.a.h.h.q.e.Utilities for this purpose. In this PR we chose Kryo for efficiency. The Kryo serializer used here is created in Hive. Spark Kryo serializer wasn't used because there's no available SparkConf instance. Author: Cheng Hao <hao.cheng@intel.com> Author: Cheng Lian <lian@databricks.com> Closes #3640 from chenghao-intel/udf_serde and squashes the following commits: 8e13756 [Cheng Hao] Update the comment 74466a3 [Cheng Hao] refactor as feedbacks 396c0e1 [Cheng Hao] avoid Simple UDF to be serialized e9c3212 [Cheng Hao] update the comment 19cbd46 [Cheng Hao] support udf instance ser/de after initialization
Showing
- sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala 4 additions, 1 deletion...ive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUdfs.scala 44 additions, 49 deletions...e/src/main/scala/org/apache/spark/sql/hive/hiveUdfs.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUdfSuite.scala 7 additions, 0 deletions...la/org/apache/spark/sql/hive/execution/HiveUdfSuite.scala
- sql/hive/v0.12.0/src/main/scala/org/apache/spark/sql/hive/Shim12.scala 11 additions, 0 deletions...2.0/src/main/scala/org/apache/spark/sql/hive/Shim12.scala
- sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim13.scala 107 additions, 0 deletions...3.1/src/main/scala/org/apache/spark/sql/hive/Shim13.scala
Loading
Please register or sign in to comment