stat.split.rfsrc.Rd
Extract split statistic information from the forest. The function
returns a list of length ntree
, in which each element
corresponds to a tree. The element [[b]] is itself a vector of length
xvar.names identified by its x-variable name. Each element [[b]]$xvar
contains the complete list of splits on xvar with associated
identifying information. The information is as follows:
treeID Tree identifier.
nodeID Node identifier.
parmID Variable indentifier.
contPT Value node was split in the case of a continuous variable.
mwcpSZ Size of the multi-word complementary pair in the case of a factor split.
dpthID Zero (0) based depth of split.
spltTY Split type for parent node:
bit 1 | bit 0 | meaning |
----- | ----- | ------- |
0 | 0 | 0 = both daughters have valid splits |
0 | 1 | 1 = only the right daughter is terminal |
1 | 0 | 2 = only the left daughter is terminal |
1 | 1 | 3 = both daughters are terminal |
spltEC End cut statistic for real valued variables between [0,0.5] that is small when the split is towards the edge and large when the split is towards the middle. Subtracting this value from 0.5 yields the end cut statistic studied in Ishwaran (2014) and is a way to identify ECP behavior (end cut preference behavior).
spltST Split statistic:
For objects of class (rfsrc, grow), this is the split statistic that resulted in the variable being choosen for the split.
For an object of class (rfsrc, pred) this is the variance of the response within the node for the test data. This value is relevant only for real valued responses. In classification and survival, it is not relevant.
# S3 method for rfsrc
stat.split(object, ...)
An object of class (rfsrc, grow)
,
(rfsrc, synthetic)
or (rfsrc,
predict)
Further arguments passed to or from other methods.
Invisibly, a list with the following components:
...
Ishwaran H. (2015). The effect of splitting on random forests. Machine Learning, 99:75-118.
# \donttest{
## run a forest, then make a call to stat.split
grow.obj <- rfsrc(mpg ~., data = mtcars, membership=TRUE, statistics=TRUE)
stat.obj <- stat.split(grow.obj)
## nice wrapper to extract split-statistic for desired variable
## for continuous variables plots ECP data
get.split <- function(splitObj, xvar, inches = 0.1, ...) {
which.var <- which(names(splitObj[[1]]) == xvar)
ntree <- length(splitObj)
stat <- data.frame(do.call(rbind, sapply(1:ntree, function(b) {
splitObj[[b]][which.var]})))
dpth <- stat$dpthID
ecp <- 1/2 - stat$spltEC
sp <- stat$contPT
if (!all(is.na(sp))) {
fgC <- function(x) {
as.numeric(as.character(cut(x, breaks = c(-1, 0.2, 0.35, 0.5),
labels = c(1, 4, 2))))
}
symbols(jitter(sp), jitter(dpth), ecp, inches = inches, bg = fgC(ecp),
xlab = xvar, ylab = "node depth", ...)
legend("topleft", legend = c("low ecp", "med ecp", "high ecp"),
fill = c(1, 4, 2))
}
invisible(stat)
}
## use get.split to investigate ECP behavior of variables
get.split(stat.obj, "disp")
# }