Skip to content
Snippets Groups Projects
Commit a6c72ab1 authored by Michael Armbrust's avatar Michael Armbrust Committed by Reynold Xin
Browse files

[SPARK-1994][SQL] Weird data corruption bug when running Spark SQL on data in HDFS

Basically there is a race condition (possibly a scala bug?) when these values are recomputed on all of the slaves that results in an incorrect projection being generated (possibly because the GUID uniqueness contract is broken?).

In general we should probably enforce that all expression planing occurs on the driver, as is now occurring here.

Author: Michael Armbrust <michael@databricks.com>

Closes #1004 from marmbrus/fixAggBug and squashes the following commits:

e0c116c [Michael Armbrust] Compute aggregate expression during planning instead of lazily on workers.
parent 41c4a331
No related branches found
No related tags found
No related merge requests found
......@@ -77,8 +77,7 @@ case class Aggregate(
resultAttribute: AttributeReference)
/** A list of aggregates that need to be computed for each group. */
@transient
private[this] lazy val computedAggregates = aggregateExpressions.flatMap { agg =>
private[this] val computedAggregates = aggregateExpressions.flatMap { agg =>
agg.collect {
case a: AggregateExpression =>
ComputedAggregate(
......@@ -89,8 +88,7 @@ case class Aggregate(
}.toArray
/** The schema of the result of all aggregate evaluations */
@transient
private[this] lazy val computedSchema = computedAggregates.map(_.resultAttribute)
private[this] val computedSchema = computedAggregates.map(_.resultAttribute)
/** Creates a new aggregate buffer for a group. */
private[this] def newAggregateBuffer(): Array[AggregateFunction] = {
......@@ -104,8 +102,7 @@ case class Aggregate(
}
/** Named attributes used to substitute grouping attributes into the final result. */
@transient
private[this] lazy val namedGroups = groupingExpressions.map {
private[this] val namedGroups = groupingExpressions.map {
case ne: NamedExpression => ne -> ne.toAttribute
case e => e -> Alias(e, s"groupingExpr:$e")().toAttribute
}
......@@ -114,16 +111,14 @@ case class Aggregate(
* A map of substitutions that are used to insert the aggregate expressions and grouping
* expression into the final result expression.
*/
@transient
private[this] lazy val resultMap =
private[this] val resultMap =
(computedAggregates.map { agg => agg.unbound -> agg.resultAttribute } ++ namedGroups).toMap
/**
* Substituted version of aggregateExpressions expressions which are used to compute final
* output rows given a group and the result of all aggregate computations.
*/
@transient
private[this] lazy val resultExpressions = aggregateExpressions.map { agg =>
private[this] val resultExpressions = aggregateExpressions.map { agg =>
agg.transform {
case e: Expression if resultMap.contains(e) => resultMap(e)
}
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment