Skip to content
Snippets Groups Projects
Commit ee07541e authored by Sean Owen's avatar Sean Owen Committed by Xiangrui Meng
Browse files

SPARK-2748 [MLLIB] [GRAPHX] Loss of precision for small arguments to Math.exp, Math.log

In a few places in MLlib, an expression of the form `log(1.0 + p)` is evaluated. When p is so small that `1.0 + p == 1.0`, the result is 0.0. However the correct answer is very near `p`. This is why `Math.log1p` exists.

Similarly for one instance of `exp(m) - 1` in GraphX; there's a special `Math.expm1` method.

While the errors occur only for very small arguments, given their use in machine learning algorithms, this is entirely possible.

Also note the related PR for Python: https://github.com/apache/spark/pull/1652

Author: Sean Owen <srowen@gmail.com>

Closes #1659 from srowen/SPARK-2748 and squashes the following commits:

c5926d4 [Sean Owen] Use log1p, expm1 for better precision for tiny arguments
parent 7c5fc28a
No related branches found
No related tags found
No related merge requests found
......@@ -100,8 +100,10 @@ object GraphGenerators {
*/
private def sampleLogNormal(mu: Double, sigma: Double, maxVal: Int): Int = {
val rand = new Random()
val m = math.exp(mu + (sigma * sigma) / 2.0)
val s = math.sqrt((math.exp(sigma*sigma) - 1) * math.exp(2*mu + sigma*sigma))
val sigmaSq = sigma * sigma
val m = math.exp(mu + sigmaSq / 2.0)
// expm1 is exp(m)-1 with better accuracy for tiny m
val s = math.sqrt(math.expm1(sigmaSq) * math.exp(2*mu + sigmaSq))
// Z ~ N(0, 1)
var X: Double = maxVal
......
......@@ -68,9 +68,9 @@ class LogisticGradient extends Gradient {
val gradient = brzData * gradientMultiplier
val loss =
if (label > 0) {
math.log(1 + math.exp(margin))
math.log1p(math.exp(margin)) // log1p is log(1+p) but more accurate for small p
} else {
math.log(1 + math.exp(margin)) - margin
math.log1p(math.exp(margin)) - margin
}
(Vectors.fromBreeze(gradient), loss)
......@@ -89,9 +89,9 @@ class LogisticGradient extends Gradient {
brzAxpy(gradientMultiplier, brzData, cumGradient.toBreeze)
if (label > 0) {
math.log(1 + math.exp(margin))
math.log1p(math.exp(margin))
} else {
math.log(1 + math.exp(margin)) - margin
math.log1p(math.exp(margin)) - margin
}
}
}
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment