Skip to content
Snippets Groups Projects
Commit 63f077fb authored by Zheng RuiFeng's avatar Zheng RuiFeng Committed by Xiao Li
Browse files

[SPARK-20041][DOC] Update docs for NaN handling in approxQuantile

## What changes were proposed in this pull request?
Update docs for NaN handling in approxQuantile.

## How was this patch tested?
existing tests.

Author: Zheng RuiFeng <ruifengz@foxmail.com>

Closes #17369 from zhengruifeng/doc_quantiles_nan.
parent 14865d7f
No related branches found
No related tags found
No related merge requests found
...@@ -149,7 +149,8 @@ setMethod("freqItems", signature(x = "SparkDataFrame", cols = "character"), ...@@ -149,7 +149,8 @@ setMethod("freqItems", signature(x = "SparkDataFrame", cols = "character"),
#' This method implements a variation of the Greenwald-Khanna algorithm (with some speed #' This method implements a variation of the Greenwald-Khanna algorithm (with some speed
#' optimizations). The algorithm was first present in [[http://dx.doi.org/10.1145/375663.375670 #' optimizations). The algorithm was first present in [[http://dx.doi.org/10.1145/375663.375670
#' Space-efficient Online Computation of Quantile Summaries]] by Greenwald and Khanna. #' Space-efficient Online Computation of Quantile Summaries]] by Greenwald and Khanna.
#' Note that rows containing any NA values will be removed before calculation. #' Note that NA values will be ignored in numerical columns before calculation. For
#' columns only containing NA values, an empty list is returned.
#' #'
#' @param x A SparkDataFrame. #' @param x A SparkDataFrame.
#' @param cols A single column name, or a list of names for multiple columns. #' @param cols A single column name, or a list of names for multiple columns.
......
...@@ -93,8 +93,8 @@ private[feature] trait QuantileDiscretizerBase extends Params ...@@ -93,8 +93,8 @@ private[feature] trait QuantileDiscretizerBase extends Params
* are too few distinct values of the input to create enough distinct quantiles. * are too few distinct values of the input to create enough distinct quantiles.
* *
* NaN handling: * NaN handling:
* NaN values will be removed from the column during `QuantileDiscretizer` fitting. This will * null and NaN values will be ignored from the column during `QuantileDiscretizer` fitting. This
* produce a `Bucketizer` model for making predictions. During the transformation, * will produce a `Bucketizer` model for making predictions. During the transformation,
* `Bucketizer` will raise an error when it finds NaN values in the dataset, but the user can * `Bucketizer` will raise an error when it finds NaN values in the dataset, but the user can
* also choose to either keep or remove NaN values within the dataset by setting `handleInvalid`. * also choose to either keep or remove NaN values within the dataset by setting `handleInvalid`.
* If the user chooses to keep NaN values, they will be handled specially and placed into their own * If the user chooses to keep NaN values, they will be handled specially and placed into their own
......
...@@ -1384,7 +1384,8 @@ class DataFrame(object): ...@@ -1384,7 +1384,8 @@ class DataFrame(object):
Space-efficient Online Computation of Quantile Summaries]] Space-efficient Online Computation of Quantile Summaries]]
by Greenwald and Khanna. by Greenwald and Khanna.
Note that rows containing any null values will be removed before calculation. Note that null values will be ignored in numerical columns before calculation.
For columns only containing null values, an empty list is returned.
:param col: str, list. :param col: str, list.
Can be a single column name, or a list of names for multiple columns. Can be a single column name, or a list of names for multiple columns.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment