sql/hive · 6327ea570bf542983081c5d1d3ee7e6123365c8f · cs525-sp18-g07 / spark

[SPARK-21848][SQL] Add trait UserDefinedExpression to identify user-defined functions

Wang Gengliang authored 7 years ago

## What changes were proposed in this pull request?

Add trait UserDefinedExpression to identify user-defined functions.
UDF can be expensive. In optimizer we may need to avoid executing UDF multiple times.
E.g.
```scala
table.select(UDF as 'a).select('a, ('a + 1) as 'b)
```
If UDF is expensive in this case, optimizer should not collapse the project to
```scala
table.select(UDF as 'a, (UDF+1) as 'b)
```

Currently UDF classes like PythonUDF, HiveGenericUDF are not defined in catalyst.
This PR is to add a new trait to make it easier to identify user-defined functions.

## How was this patch tested?

Unit test

Author: Wang Gengliang <ltnwgl@gmail.com>

Closes #19064 from gengliangwang/UDFType.

8fcbda9c

History

8fcbda9c 7 years ago

History

Name	Last commit	Last update
..
compatibility/src/test/scala/org/apache/spark/sql/hive/execution
src
pom.xml