Skip to content
Snippets Groups Projects
Commit 2e861df9 authored by Reynold Xin's avatar Reynold Xin
Browse files

[DOC] bucketing is applicable to all file-based data sources

## What changes were proposed in this pull request?
Starting Spark 2.1.0, bucketing feature is available for all file-based data sources. This patch fixes some function docs that haven't yet been updated to reflect that.

## How was this patch tested?
N/A

Author: Reynold Xin <rxin@databricks.com>

Closes #16349 from rxin/ds-doc.
parent 7c5b7b3a
No related branches found
No related tags found
No related merge requests found
......@@ -150,7 +150,7 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
* predicates on the partitioned columns. In order for partitioning to work well, the number
* of distinct values in each column should typically be less than tens of thousands.
*
* This was initially applicable for Parquet but in 1.5+ covers JSON, text, ORC and avro as well.
* This is applicable for all file-based data sources (e.g. Parquet, JSON) staring Spark 2.1.0.
*
* @since 1.4.0
*/
......@@ -164,7 +164,7 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
* Buckets the output by the given columns. If specified, the output is laid out on the file
* system similar to Hive's bucketing scheme.
*
* This is applicable for Parquet, JSON and ORC.
* This is applicable for all file-based data sources (e.g. Parquet, JSON) staring Spark 2.1.0.
*
* @since 2.0
*/
......@@ -178,7 +178,7 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
/**
* Sorts the output in each bucket by the given columns.
*
* This is applicable for Parquet, JSON and ORC.
* This is applicable for all file-based data sources (e.g. Parquet, JSON) staring Spark 2.1.0.
*
* @since 2.0
*/
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment