mtry and nodesizetune.rfsrc.RdFinds the optimal mtry and nodesize for a random forest
using out-of-bag (OOB) error. Two search strategies are supported: a
grid-based search and a golden-section search with noise control. Works
for all response families supported by rfsrc.fast.
tune(formula, data,
mtry.start = ncol(data) / 2,
nodesize.try = c(1:9, seq(10, 100, by = 5)), ntree.try = 100,
sampsize = function(x) { min(x * .632, max(150, x^(3/4))) },
nsplit = 1, step.factor = 1.25, improve = 1e-3, strikeout = 3, max.iter = 25,
method = c("grid", "golden"),
final.window = 5, reps.initial = 2, reps.final = 3,
trace = FALSE, do.best = TRUE, seed = NULL, ...)
tune.nodesize(formula, data,
nodesize.try = c(1:9, seq(10, 150, by = 5)), ntree.try = 100,
sampsize = function(x) { min(x * .632, max(150, x^(4/5))) },
nsplit = 1, method = c("grid", "golden"),
final.window = 5, reps.initial = 2, reps.final = 3, max.iter = 50,
trace = TRUE, seed = NULL, ...)A model formula.
A data frame with response and predictors.
Initial mtry for tune.
Candidate nodesize values. Only values \(\le\) floor(sampsize(n)/2) are used.
Number of trees grown at each tuning evaluation.
Function or numeric giving the per-tree subsample size. During tuning a single numeric size ssize is computed and passed to rfsrc.fast. If a vector is supplied (e.g., class specific), its total is used for ssize.
Number of random split points to consider at each node.
Multiplicative step-out factor over mtry for grid search in tune.
Minimum relative improvement required to continue a search step in tune.
Maximum number of consecutive non-improving steps allowed in tune.
Maximum number of iterations for the step-out search in tune or the coordinate loop when method = "golden".
Search strategy: "grid" (default) or "golden".
For golden search, the terminal bracket width for the one-dimensional line search.
Replicates averaged at interior evaluations during golden iterations.
Replicates averaged for each candidate during the final local sweep in golden search.
If TRUE, prints progress.
If TRUE, tune fits and returns a forest at the optimal pair.
Optional integer for reproducible tuning. The holdout split (when used) and all tuning fits become deterministic for a given seed.
Additional arguments passed to rfsrc.fast. Arguments that control tuning itself (perf.type, forest, save.memory, ntree, mtry, nodesize, sampsize, nsplit) are managed internally.
Error estimate. If 2 * ssize < n, a disjoint holdout of
size ssize is used for evaluation; otherwise OOB error is
used.
Subsample used during tuning. Both functions derive a single
integer ssize from sampsize and pass it to
rfsrc.fast for all tuning fits. This improves stability
and comparability across candidates. When do.best = TRUE in
tune, the final forest is fit with the user-supplied
sampsize exactly as provided.
Grid search. tune performs a step-out search over
mtry for each nodesize in nodesize.try, using
step.factor, improve, strikeout, and
max.iter. tune.nodesize evaluates the supplied
nodesize.try grid directly.
Golden search. Uses a guarded golden-section line search with
noise control. For each one-dimensional search (over nodesize or
mtry), the routine probes a small left-anchor grid 1:9,
iterates golden shrinkage until the bracket width is at most
final.window, then runs a short local sweep with
reps.final replicates. In tune the searches over
nodesize and mtry alternate in a simple coordinate loop,
with improve and strikeout as stopping controls.
For tune:
results: matrix with columns nodesize, mtry, err.
optimal: named numeric vector c(nodesize = ..., mtry = ...).
rf: fitted forest at the optimum if do.best = TRUE.
For tune.nodesize:
nsize.opt: optimal nodesize.
err: data frame with columns nodesize and err.
# \donttest{
## ------------------------------------------------------------
## White wine classification example
## ------------------------------------------------------------
data(wine, package = "randomForestSRC")
wine$quality <- factor(wine$quality)
## Fixed seed makes tuning reproducible
set.seed(1)
## Full tuner over nodesize and mtry (grid)
o1 <- tune(quality ~ ., wine, sampsize = 100, method = "grid")
print(o1$optimal)
## Golden search alternative
o2 <- tune(quality ~ ., wine, sampsize = 100, method = "golden",
reps.initial = 2, reps.final = 3, seed = 1)
print(o2$optimal)
## visualize the nodesize/mtry surface
if (library("interp", logical.return = TRUE)) {
plot.tune <- function(o, linear = TRUE) {
x <- o$results[, 1]
y <- o$results[, 2]
z <- o$results[, 3]
so <- interp(x = x, y = y, z = z, linear = linear)
idx <- which.min(z)
x0 <- x[idx]; y0 <- y[idx]
filled.contour(x = so$x, y = so$y, z = so$z,
xlim = range(so$x, finite = TRUE) + c(-2, 2),
ylim = range(so$y, finite = TRUE) + c(-2, 2),
color.palette = colorRampPalette(c("yellow", "red")),
xlab = "nodesize", ylab = "mtry",
main = "error rate for nodesize and mtry",
key.title = title(main = "OOB error", cex.main = 1),
plot.axes = {
axis(1); axis(2)
points(x0, y0, pch = "x", cex = 1, font = 2)
points(x, y, pch = 16, cex = .25)
})
}
plot.tune(o1)
plot.tune(o2)
}
## ------------------------------------------------------------
## nodesize only: grid vs golden
## ------------------------------------------------------------
o3 <- tune.nodesize(quality ~ ., wine, sampsize = 100, method = "grid",
trace = TRUE, seed = 1)
o4 <- tune.nodesize(quality ~ ., wine, sampsize = 100, method = "golden",
reps.initial = 2, reps.final = 3, trace = TRUE, seed = 1)
plot(o3$err, type = "s", xlab = "nodesize", ylab = "error")
## ------------------------------------------------------------
## Tuning for class imbalance (rfq with geometric mean performance)
## ------------------------------------------------------------
data(breast, package = "randomForestSRC")
breast <- na.omit(breast)
o5 <- tune(status ~ ., data = breast, rfq = TRUE, perf.type = "gmean",
method = "golden", seed = 1)
print(o5$optimal)
## ------------------------------------------------------------
## Competing risks example (nodesize only)
## ------------------------------------------------------------
data(wihs, package = "randomForestSRC")
plot(tune.nodesize(Surv(time, status) ~ ., wihs, trace = TRUE)$err, type = "s")
# }