Skip to content
Snippets Groups Projects
Commit 699a4dfd authored by Subhobrata Dey's avatar Subhobrata Dey Committed by Reynold Xin
Browse files

[SPARK-14632] randomSplit method fails on dataframes with maps in schema

## What changes were proposed in this pull request?

The patch fixes the issue with the randomSplit method which is not able to split dataframes which has maps in schema. The bug was introduced in spark 1.6.1.

## How was this patch tested?

Tested with unit tests.

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Author: Subhobrata Dey <sbcd90@gmail.com>

Closes #12438 from sbcd90/randomSplitIssue.
parent 8a87f7d5
No related branches found
No related tags found
No related merge requests found
...@@ -1502,7 +1502,9 @@ class Dataset[T] private[sql]( ...@@ -1502,7 +1502,9 @@ class Dataset[T] private[sql](
// constituent partitions each time a split is materialized which could result in // constituent partitions each time a split is materialized which could result in
// overlapping splits. To prevent this, we explicitly sort each input partition to make the // overlapping splits. To prevent this, we explicitly sort each input partition to make the
// ordering deterministic. // ordering deterministic.
val sorted = Sort(logicalPlan.output.map(SortOrder(_, Ascending)), global = false, logicalPlan) // MapType cannot be sorted.
val sorted = Sort(logicalPlan.output.filterNot(_.dataType.isInstanceOf[MapType])
.map(SortOrder(_, Ascending)), global = false, logicalPlan)
val sum = weights.sum val sum = weights.sum
val normalizedCumWeights = weights.map(_ / sum).scanLeft(0.0d)(_ + _) val normalizedCumWeights = weights.map(_ / sum).scanLeft(0.0d)(_ + _)
normalizedCumWeights.sliding(2).map { x => normalizedCumWeights.sliding(2).map { x =>
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment