Skip to content
Snippets Groups Projects
Unverified Commit 5d3f4615 authored by jiangxingbo's avatar jiangxingbo Committed by Sean Owen
Browse files

[SPARK-17506][SQL] Improve the check double values equality rule.

## What changes were proposed in this pull request?

In `ExpressionEvalHelper`, we check the equality between two double values by comparing whether the expected value is within the range [target - tolerance, target + tolerance], but this can cause a negative false when the compared numerics are very large.
Before:
```
val1 = 1.6358558070241E306
val2 = 1.6358558070240974E306
ExpressionEvalHelper.compareResults(val1, val2)
false
```
In fact, `val1` and `val2` are but with different precisions, we should tolerant this case by comparing with percentage range, eg.,expected is within range [target - target * tolerance_percentage, target + target * tolerance_percentage].
After:
```
val1 = 1.6358558070241E306
val2 = 1.6358558070240974E306
ExpressionEvalHelper.compareResults(val1, val2)
true
```

## How was this patch tested?

Exsiting testcases.

Author: jiangxingbo <jiangxb1987@gmail.com>

Closes #15059 from jiangxb1987/deq.
parent 3fe630d3
No related branches found
No related tags found
No related merge requests found
...@@ -170,11 +170,9 @@ class ArithmeticExpressionSuite extends SparkFunSuite with ExpressionEvalHelper ...@@ -170,11 +170,9 @@ class ArithmeticExpressionSuite extends SparkFunSuite with ExpressionEvalHelper
checkEvaluation(Remainder(positiveLongLit, positiveLongLit), 0L) checkEvaluation(Remainder(positiveLongLit, positiveLongLit), 0L)
checkEvaluation(Remainder(negativeLongLit, negativeLongLit), 0L) checkEvaluation(Remainder(negativeLongLit, negativeLongLit), 0L)
// TODO: the following lines would fail the test due to inconsistency result of interpret DataTypeTestUtils.numericTypeWithoutDecimal.foreach { tpe =>
// and codegen for remainder between giant values, seems like a numeric stability issue checkConsistencyBetweenInterpretedAndCodegen(Remainder, tpe, tpe)
// DataTypeTestUtils.numericTypeWithoutDecimal.foreach { tpe => }
// checkConsistencyBetweenInterpretedAndCodegen(Remainder, tpe, tpe)
// }
} }
test("Abs") { test("Abs") {
......
...@@ -19,6 +19,7 @@ package org.apache.spark.sql.catalyst.expressions ...@@ -19,6 +19,7 @@ package org.apache.spark.sql.catalyst.expressions
import org.scalacheck.Gen import org.scalacheck.Gen
import org.scalactic.TripleEqualsSupport.Spread import org.scalactic.TripleEqualsSupport.Spread
import org.scalatest.exceptions.TestFailedException
import org.scalatest.prop.GeneratorDrivenPropertyChecks import org.scalatest.prop.GeneratorDrivenPropertyChecks
import org.apache.spark.SparkFunSuite import org.apache.spark.SparkFunSuite
...@@ -289,13 +290,37 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks { ...@@ -289,13 +290,37 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks {
(result, expected) match { (result, expected) match {
case (result: Array[Byte], expected: Array[Byte]) => case (result: Array[Byte], expected: Array[Byte]) =>
java.util.Arrays.equals(result, expected) java.util.Arrays.equals(result, expected)
case (result: Double, expected: Spread[Double @unchecked]) =>
expected.asInstanceOf[Spread[Double]].isWithin(result)
case (result: Double, expected: Double) if result.isNaN && expected.isNaN => case (result: Double, expected: Double) if result.isNaN && expected.isNaN =>
true true
case (result: Double, expected: Double) =>
relativeErrorComparison(result, expected)
case (result: Float, expected: Float) if result.isNaN && expected.isNaN => case (result: Float, expected: Float) if result.isNaN && expected.isNaN =>
true true
case _ => result == expected case _ => result == expected
} }
} }
/**
* Private helper function for comparing two values using relative tolerance.
* Note that if x or y is extremely close to zero, i.e., smaller than Double.MinPositiveValue,
* the relative tolerance is meaningless, so the exception will be raised to warn users.
*
* TODO: this duplicates functions in spark.ml.util.TestingUtils.relTol and
* spark.mllib.util.TestingUtils.relTol, they could be moved to common utils sub module for the
* whole spark project which does not depend on other modules. See more detail in discussion:
* https://github.com/apache/spark/pull/15059#issuecomment-246940444
*/
private def relativeErrorComparison(x: Double, y: Double, eps: Double = 1E-8): Boolean = {
val absX = math.abs(x)
val absY = math.abs(y)
val diff = math.abs(x - y)
if (x == y) {
true
} else if (absX < Double.MinPositiveValue || absY < Double.MinPositiveValue) {
throw new TestFailedException(
s"$x or $y is extremely close to zero, so the relative tolerance is meaningless.", 0)
} else {
diff < eps * math.min(absX, absY)
}
}
} }
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment