rfsrc.fast.Rd
Fast approximate random forests using subsampling with forest options set to encourage computational speed. Applies to all families.
Model to be fit. If missing, unsupervised splitting is implemented.
Data frame containing the y-outcome and x-variables.
Number of trees.
Non-negative integer value specifying number of random split points used to split a node (deterministic splitting corresponds to the value zero and can be slower).
Bootstrap protocol used in growing a tree.
Function specifying size of subsampled data. Can also be a number.
Type of bootstrap used.
Bootstrap specification when "by.user"
is used.
Integer value used for survival to
constrain ensemble calculations to a grid of ntime
time points.
Save key forest values? Turn this on if you want prediction on test data.
Save memory? Setting this to FALSE
stores
terminal node quantities used for prediction on test data. This
yields rapid prediction but can be memory intensive for big data,
especially competing risks and survival models.
Further arguments to be passed to rfsrc
.
Calls rfsrc
by choosing options (like subsampling) to
encourage computational speeds. This will provide a good
approximation but will not be as good as default settings of
rfsrc
.
An object of class (rfsrc, grow)
.
# \donttest{
## ------------------------------------------------------------
## regression
## ------------------------------------------------------------
## load the Iowa housing data
data(housing, package = "randomForestSRC")
## do quick and *dirty* imputation
housing <- impute(SalePrice ~ ., housing,
ntree = 50, nimpute = 1, splitrule = "random")
## grow a fast forest
o1 <- rfsrc.fast(SalePrice ~ ., housing)
o2 <- rfsrc.fast(SalePrice ~ ., housing, nodesize = 1)
print(o1)
print(o2)
## grow a fast bivariate forest
o3 <- rfsrc.fast(cbind(SalePrice,Overall.Qual) ~ ., housing)
print(o3)
## ------------------------------------------------------------
## classification
## ------------------------------------------------------------
data(wine, package = "randomForestSRC")
wine$quality <- factor(wine$quality)
o <- rfsrc.fast(quality ~ ., wine)
print(o)
## ------------------------------------------------------------
## grow fast random survival forests without C-calculation
## use brier score to assess model performance
## compare pure random splitting to logrank splitting
## ------------------------------------------------------------
data(peakVO2, package = "randomForestSRC")
f <- as.formula(Surv(ttodead, died)~.)
o1 <- rfsrc.fast(f, peakVO2, perf.type = "none")
o2 <- rfsrc.fast(f, peakVO2, perf.type = "none", splitrule = "random")
bs1 <- get.brier.survival(o1, cens.model = "km")
bs2 <- get.brier.survival(o2, cens.model = "km")
plot(bs2$brier.score, type = "s", col = 2)
lines(bs1$brier.score, type = "s", col = 4)
legend("bottomright", legend = c("random", "logrank"), fill = c(2,4))
## ------------------------------------------------------------
## competing risks
## ------------------------------------------------------------
data(wihs, package = "randomForestSRC")
o <- rfsrc.fast(Surv(time, status) ~ ., wihs)
print(o)
## ------------------------------------------------------------
## class imbalanced data using gmean performance
## ------------------------------------------------------------
data(breast, package = "randomForestSRC")
breast <- na.omit(breast)
f <- as.formula(status ~ .)
o <- rfsrc.fast(f, breast, perf.type = "gmean")
print(o)
## ------------------------------------------------------------
## class imbalanced data using random forests quantile-classifer (RFQ)
## fast=TRUE => rfsrc.fast
## see imbalanced function for further details
## ------------------------------------------------------------
data(breast, package = "randomForestSRC")
breast <- na.omit(breast)
f <- as.formula(status ~ .)
o <- imbalanced(f, breast, fast = TRUE)
print(o)
# }