-
- Downloads
[SPARK-18189][SQL] Fix serialization issue in KeyValueGroupedDataset
## What changes were proposed in this pull request? Likewise [DataSet.scala](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L156) KeyValueGroupedDataset should mark the queryExecution as transient. As mentioned in the Jira ticket, without transient we saw serialization issues like ``` Caused by: java.io.NotSerializableException: org.apache.spark.sql.execution.QueryExecution Serialization stack: - object not serializable (class: org.apache.spark.sql.execution.QueryExecution, value: == ``` ## How was this patch tested? Run the query which is specified in the Jira ticket before and after: ``` val a = spark.createDataFrame(sc.parallelize(Seq((1,2),(3,4)))).as[(Int,Int)] val grouped = a.groupByKey( {x:(Int,Int)=>x._1} ) val mappedGroups = grouped.mapGroups((k,x)=> {(k,1)} ) val yyy = sc.broadcast(1) val last = mappedGroups.rdd.map(xx=> { val simpley = yyy.value 1 } ) ``` Author: Ergin Seyfe <eseyfe@fb.com> Closes #15706 from seyfe/keyvaluegrouped_serialization.
Showing
- repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala 17 additions, 0 deletions...2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala
- sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala 1 addition, 1 deletion...n/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala
Please register or sign in to comment