Skip to content
Snippets Groups Projects
  • Michael Armbrust's avatar
    158ad0bb
    [SPARK-2097][SQL] UDF Support · 158ad0bb
    Michael Armbrust authored
    This patch adds the ability to register lambda functions written in Python, Java or Scala as UDFs for use in SQL or HiveQL.
    
    Scala:
    ```scala
    registerFunction("strLenScala", (_: String).length)
    sql("SELECT strLenScala('test')")
    ```
    Python:
    ```python
    sqlCtx.registerFunction("strLenPython", lambda x: len(x), IntegerType())
    sqlCtx.sql("SELECT strLenPython('test')")
    ```
    Java:
    ```java
    sqlContext.registerFunction("stringLengthJava", new UDF1<String, Integer>() {
      Override
      public Integer call(String str) throws Exception {
        return str.length();
      }
    }, DataType.IntegerType);
    
    sqlContext.sql("SELECT stringLengthJava('test')");
    ```
    
    Author: Michael Armbrust <michael@databricks.com>
    
    Closes #1063 from marmbrus/udfs and squashes the following commits:
    
    9eda0fe [Michael Armbrust] newline
    747c05e [Michael Armbrust] Add some scala UDF tests.
    d92727d [Michael Armbrust] Merge remote-tracking branch 'apache/master' into udfs
    005d684 [Michael Armbrust] Fix naming and formatting.
    d14dac8 [Michael Armbrust] Fix last line of autogened java files.
    8135c48 [Michael Armbrust] Move UDF unit tests to pyspark.
    40b0ffd [Michael Armbrust] Merge remote-tracking branch 'apache/master' into udfs
    6a36890 [Michael Armbrust] Switch logging so that SQLContext can be serializable.
    7a83101 [Michael Armbrust] Drop toString
    795fd15 [Michael Armbrust] Try to avoid capturing SQLContext.
    e54fb45 [Michael Armbrust] Docs and tests.
    437cbe3 [Michael Armbrust] Update use of dataTypes, fix some python tests, address review comments.
    01517d6 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into udfs
    8e6c932 [Michael Armbrust] WIP
    3f96a52 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into udfs
    6237c8d [Michael Armbrust] WIP
    2766f0b [Michael Armbrust] Move udfs support to SQL from hive. Add support for Java UDFs.
    0f7d50c [Michael Armbrust] Draft of native Spark SQL UDFs for Scala and Python.
    158ad0bb
    History
    [SPARK-2097][SQL] UDF Support
    Michael Armbrust authored
    This patch adds the ability to register lambda functions written in Python, Java or Scala as UDFs for use in SQL or HiveQL.
    
    Scala:
    ```scala
    registerFunction("strLenScala", (_: String).length)
    sql("SELECT strLenScala('test')")
    ```
    Python:
    ```python
    sqlCtx.registerFunction("strLenPython", lambda x: len(x), IntegerType())
    sqlCtx.sql("SELECT strLenPython('test')")
    ```
    Java:
    ```java
    sqlContext.registerFunction("stringLengthJava", new UDF1<String, Integer>() {
      Override
      public Integer call(String str) throws Exception {
        return str.length();
      }
    }, DataType.IntegerType);
    
    sqlContext.sql("SELECT stringLengthJava('test')");
    ```
    
    Author: Michael Armbrust <michael@databricks.com>
    
    Closes #1063 from marmbrus/udfs and squashes the following commits:
    
    9eda0fe [Michael Armbrust] newline
    747c05e [Michael Armbrust] Add some scala UDF tests.
    d92727d [Michael Armbrust] Merge remote-tracking branch 'apache/master' into udfs
    005d684 [Michael Armbrust] Fix naming and formatting.
    d14dac8 [Michael Armbrust] Fix last line of autogened java files.
    8135c48 [Michael Armbrust] Move UDF unit tests to pyspark.
    40b0ffd [Michael Armbrust] Merge remote-tracking branch 'apache/master' into udfs
    6a36890 [Michael Armbrust] Switch logging so that SQLContext can be serializable.
    7a83101 [Michael Armbrust] Drop toString
    795fd15 [Michael Armbrust] Try to avoid capturing SQLContext.
    e54fb45 [Michael Armbrust] Docs and tests.
    437cbe3 [Michael Armbrust] Update use of dataTypes, fix some python tests, address review comments.
    01517d6 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into udfs
    8e6c932 [Michael Armbrust] WIP
    3f96a52 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into udfs
    6237c8d [Michael Armbrust] WIP
    2766f0b [Michael Armbrust] Move udfs support to SQL from hive. Add support for Java UDFs.
    0f7d50c [Michael Armbrust] Draft of native Spark SQL UDFs for Scala and Python.