Commit b0f2fb5b authored 8 years ago by Dongjoon Hyun Committed by Shivaram Venkataraman 8 years ago

[SPARK-16053][R] Add `spark_partition_id` in SparkR

## What changes were proposed in this pull request?

This PR adds `spark_partition_id` virtual column function in SparkR for API parity.

The following is just an example to illustrate a SparkR usage on a partitioned parquet table created by `spark.range(10).write.mode("overwrite").parquet("/tmp/t1")`.
```r
> collect(select(read.parquet('/tmp/t1'), c('id', spark_partition_id())))
   id SPARK_PARTITION_ID()
1   3                    0
2   4                    0
3   8                    1
4   9                    1
5   0                    2
6   1                    3
7   2                    4
8   5                    5
9   6                    6
10  7                    7
```

## How was this patch tested?

Pass the Jenkins tests (including new testcase).

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #13768 from dongjoon-hyun/SPARK-16053.

parent aee1420e

No related branches found

No related tags found

No related merge requests found

Hide whitespace changes

Inline Side-by-side

Showing with 27 additions and 0 deletions

Please register or to comment