Skip to content
Snippets Groups Projects
Commit d314677c authored by hyukjinkwon's avatar hyukjinkwon Committed by Josh Rosen
Browse files

[SPARK-16461][SQL] Support partition batch pruning with `<=>` predicate in InMemoryTableScanExec

## What changes were proposed in this pull request?

It seems `EqualNullSafe` filter was missed for batch pruneing partitions in cached tables.

It seems supporting this improves the performance roughly 5 times faster.

Running the codes below:

```scala
test("Null-safe equal comparison") {
  val N = 20000000
  val df = spark.range(N).repartition(20)
  val benchmark = new Benchmark("Null-safe equal comparison", N)
  df.createOrReplaceTempView("t")
  spark.catalog.cacheTable("t")
  sql("select id from t where id <=> 1").collect()

  benchmark.addCase("Null-safe equal comparison", 10) { _ =>
    sql("select id from t where id <=> 1").collect()
  }
  benchmark.run()
}
```

produces the results below:

**Before:**

```
Running benchmark: Null-safe equal comparison
  Running case: Null-safe equal comparison
  Stopped after 10 iterations, 2098 ms

Java HotSpot(TM) 64-Bit Server VM 1.8.0_45-b14 on Mac OS X 10.11.5
Intel(R) Core(TM) i7-4850HQ CPU  2.30GHz

Null-safe equal comparison:              Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
Null-safe equal comparison                     204 /  210         98.1          10.2       1.0X
```

**After:**

```
Running benchmark: Null-safe equal comparison
  Running case: Null-safe equal comparison
  Stopped after 10 iterations, 478 ms

Java HotSpot(TM) 64-Bit Server VM 1.8.0_45-b14 on Mac OS X 10.11.5
Intel(R) Core(TM) i7-4850HQ CPU  2.30GHz

Null-safe equal comparison:              Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
Null-safe equal comparison                      42 /   48        474.1           2.1       1.0X
```

## How was this patch tested?

Unit tests in `PartitionBatchPruningSuite`.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #14117 from HyukjinKwon/SPARK-16461.
parent e388bd54
No related branches found
No related tags found
No related merge requests found
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment