Skip to content
Snippets Groups Projects
Commit 5693ac8e authored by Xiangrui Meng's avatar Xiangrui Meng
Browse files

[SPARK-18793][SPARK-18794][R] add spark.randomForest/spark.gbt to vignettes


## What changes were proposed in this pull request?

Mention `spark.randomForest` and `spark.gbt` in vignettes. Keep the content minimal since users can type `?spark.randomForest` to see the full doc.

cc: jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes #16264 from mengxr/SPARK-18793.

(cherry picked from commit 594b14f1)
Signed-off-by: default avatarXiangrui Meng <meng@databricks.com>
parent 25b97589
No related branches found
No related tags found
No related merge requests found
......@@ -449,6 +449,10 @@ SparkR supports the following machine learning models and algorithms.
* Generalized Linear Model (GLM)
* Random Forest
* Gradient-Boosted Trees (GBT)
* Naive Bayes Model
* $k$-means Clustering
......@@ -526,6 +530,34 @@ gaussianFitted <- predict(gaussianGLM, carsDF)
head(select(gaussianFitted, "model", "prediction", "mpg", "wt", "hp"))
```
#### Random Forest
`spark.randomForest` fits a [random forest](https://en.wikipedia.org/wiki/Random_forest) classification or regression model on a `SparkDataFrame`.
Users can call `summary` to get a summary of the fitted model, `predict` to make predictions, and `write.ml`/`read.ml` to save/load fitted models.
In the following example, we use the `longley` dataset to train a random forest and make predictions:
```{r, warning=FALSE}
df <- createDataFrame(longley)
rfModel <- spark.randomForest(df, Employed ~ ., type = "regression", maxDepth = 2, numTrees = 2)
summary(rfModel)
predictions <- predict(rfModel, df)
```
#### Gradient-Boosted Trees
`spark.gbt` fits a [gradient-boosted tree](https://en.wikipedia.org/wiki/Gradient_boosting) classification or regression model on a `SparkDataFrame`.
Users can call `summary` to get a summary of the fitted model, `predict` to make predictions, and `write.ml`/`read.ml` to save/load fitted models.
Similar to the random forest example above, we use the `longley` dataset to train a gradient-boosted tree and make predictions:
```{r, warning=FALSE}
df <- createDataFrame(longley)
gbtModel <- spark.gbt(df, Employed ~ ., type = "regression", maxDepth = 2, maxIter = 2)
summary(gbtModel)
predictions <- predict(gbtModel, df)
```
#### Naive Bayes Model
Naive Bayes model assumes independence among the features. `spark.naiveBayes` fits a [Bernoulli naive Bayes model](https://en.wikipedia.org/wiki/Naive_Bayes_classifier#Bernoulli_naive_Bayes) against a SparkDataFrame. The data should be all categorical. These models are often used for document classification.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment