Skip to content
Snippets Groups Projects
Commit 95e37214 authored by Davies Liu's avatar Davies Liu Committed by Davies Liu
Browse files

[SPARK-14781] [SQL] support nested predicate subquery

## What changes were proposed in this pull request?

In order to support nested predicate subquery, this PR introduce an internal join type ExistenceJoin, which will emit all the rows from left, plus an additional column, which presents there are any rows matched from right or not (it's not null-aware right now). This additional column could be used to replace the subquery in Filter.

In theory, all the predicate subquery could use this join type, but it's slower than LeftSemi and LeftAnti, so it's only used for nested subquery (subquery inside OR).

For example, the following SQL:
```sql
SELECT a FROM t  WHERE EXISTS (select 0) OR EXISTS (select 1)
```

This PR also fix a bug in predicate subquery push down through join (they should not).

Nested null-aware subquery is still not supported. For example,   `a > 3 OR b NOT IN (select bb from t)`

After this, we could run TPCDS query Q10, Q35, Q45

## How was this patch tested?

Added unit tests.

Author: Davies Liu <davies@databricks.com>

Closes #12820 from davies/or_exists.
parent 6e632012
No related branches found
No related tags found
No related merge requests found
Showing
with 345 additions and 61 deletions
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment