partial.rfsrc.Rd
Direct, fast inferface for partial effect of a variable. Works for all families.
partial.rfsrc(object, oob = TRUE,
partial.type = NULL, partial.xvar = NULL, partial.values = NULL,
partial.xvar2 = NULL, partial.values2 = NULL,
partial.time = NULL, get.tree = NULL, seed = NULL, do.trace = FALSE, ...)
An object of class (rfsrc, grow)
.
By default out-of-bag values are returned, but inbag
values can be requested by setting this option to FALSE
.
Character vector specifying type of predicted value requested. See details below.
Character value specifying the single primary partial x-variable to be used.
Vector of values that the primary partialy x-variable will assume.
Vector of character values specifying the second order x-variables to be used.
Vector of values that the second order
x-variables will assume. Each second order x-variable can only
assume a single value. This the length of partial.xvar2
and
partial.values2
will be the same. In addition, the user must
do the appropriate conversion for factors, and represent a value as
a numeric element.
For survival families, the time at which the predicted
survival value is evaluated at (depends on partial.type
).
Vector of integer(s) identifying trees over which the partial values are calculated over. By default, uses all trees in the forest.
Negative integer specifying seed for the random number generator.
Number of seconds between updates to the user on approximate time to completion.
Further arguments passed to or from other methods.
Used for direct, efficient call to obtain partial plot effects. This function is intended primarily for experts.
Out-of-bag (OOB) values are returned by default.
For factors, the partial value should be encoded as a positive integer reflecting the level number of the factor. The actual label of the factor should not be used.
The utility function get.partial.plot.data
is supplied for
processing returned raw partial effects in a format more convenient
for plotting. Options are specified as in plot.variable
. See
examples for illustration.
Raw partial plot effects data is returned either as an array or a list of length equal to the number of outcomes (length is one for univariate families) with entries depending on the underlying family:
For regression, partial plot data is returned as a list
in regrOutput
with dim [n] x [length(partial.values)]
.
For classification, partial plot data is returned as a list
in classOutput
of dim [n] x [1 + yvar.nlevels[.]] x
[length(partial.values)]
.
For mixed multivariate regression, values are returned in
list format both in regrOutput
and classOutput
For survival, values are returned as either a matrix or array
in survOutput
. Depending on partial type specified this can be:
For partial type surv
returns the survival function
of dim [n] x [length(partial.time)] x [length(partial.values)]
.
For partial type mort
returns mortality
of dim [n] x [length(partial.values)]
.
For partial type chf
returns the cumulative hazard function
of dim [n] x [length(partial.time)] x [length(partial.values)]
.
For competing risks, values are returned as either a matrix or
array in survOutput
. Depending on the options specified this
can be:
For partial type years.lost
returns the expected number of life years lost
of dim [n] x [length(event.info$event.type)] x
[length(partial.values)]
.
For partial type cif
returns the cumulative incidence function of dim
[n] x [length(partial.time)] x
[length(event.info$event.type)] x
[length(partial.values)]
.
For partial type chf
returns the cumulative hazard function of dim
[n] x [length(partial.time)] x [length(event.info$event.type)]
x [length(partial.values)]
.
Ishwaran H., Kogalur U.B. (2007). Random survival forests for R, Rnews, 7(2):25-31.
Ishwaran H., Kogalur U.B., Blackstone E.H. and Lauer M.S. (2008). Random survival forests, Ann. App. Statist., 2:841-860.
# \donttest{
## ------------------------------------------------------------
##
## regression
##
## ------------------------------------------------------------
airq.obj <- rfsrc(Ozone ~ ., data = airquality)
## partial effect for wind
partial.obj <- partial(airq.obj,
partial.xvar = "Wind",
partial.values = airq.obj$xvar$Wind)
pdta <- get.partial.plot.data(partial.obj)
## plot partial values
plot(pdta$x, pdta$yhat, type = "b", pch = 16,
xlab = "wind", ylab = "partial effect of wind")
## example where we display all the partial effects
## instead of averaging - use the granule=TRUE option
pdta <- get.partial.plot.data(partial.obj, granule = TRUE)
boxplot(pdta$yhat ~ pdta$x, xlab = "Wind", ylab = "partial effect")
## ------------------------------------------------------------
##
## regression: partial effects for two variables simultaneously
##
## ------------------------------------------------------------
airq.obj <- rfsrc(Ozone ~ ., data = airquality)
## specify wind and temperature values of interest
wind <- sort(unique(airq.obj$xvar$Wind))
temp <- sort(unique(airq.obj$xvar$Temp))
## partial effect for wind, for a given temp
pdta <- do.call(rbind, lapply(temp, function(x2) {
o <- partial(airq.obj,
partial.xvar = "Wind", partial.xvar2 = "Temp",
partial.values = wind, partial.values2 = x2)
cbind(wind, x2, get.partial.plot.data(o)$yhat)
}))
pdta <- data.frame(pdta)
colnames(pdta) <- c("wind", "temp", "effectSize")
## coplot of partial effect of wind and temp
coplot(effectSize ~ wind|temp, pdta, pch = 16, overlap = 0)
## ------------------------------------------------------------
##
## regression: partial effects for three variables simultaneously
## (can be slow, so modify accordingly)
##
## ------------------------------------------------------------
n <- 1000
x <- matrix(rnorm(n * 3), ncol = 3)
y <- x[, 1] + x[, 1] * x[, 2] + x[, 1] * x[, 2] * x[, 3]
o <- rfsrc(y ~ ., data = data.frame(y = y, x))
## define target x values
x1 <- seq(-3, 3, length = 40)
x2 <- x3 <- seq(-3, 3, length = 10)
## extract second order partial effects
pdta <- do.call(rbind,
lapply(x3, function(x3v) {
cat("outer loop x3 = ", x3v, "\n")
do.call(rbind,lapply(x2, function(x2v) {
o <- partial(o,
partial.xvar = "X1",
partial.values = x1,
partial.xvar2 = c("X2", "X3"),
partial.values2 = c(x2v, x3v))
cbind(x1, x2v, x3v, get.partial.plot.data(o)$yhat)
}))
}))
pdta <- data.frame(pdta)
colnames(pdta) <- c("x1", "x2", "x3", "effectSize")
## coplot of partial effects
coplot(effectSize ~ x1|x2*x3, pdta, pch = 16, overlap = 0)
## ------------------------------------------------------------
##
## classification
##
## ------------------------------------------------------------
iris.obj <- rfsrc(Species ~., data = iris)
## partial effect for sepal length
partial.obj <- partial(iris.obj,
partial.xvar = "Sepal.Length",
partial.values = iris.obj$xvar$Sepal.Length)
## extract partial effects for each species outcome
pdta1 <- get.partial.plot.data(partial.obj, target = "setosa")
pdta2 <- get.partial.plot.data(partial.obj, target = "versicolor")
pdta3 <- get.partial.plot.data(partial.obj, target = "virginica")
## plot the results
par(mfrow=c(1,1))
plot(pdta1$x, pdta1$yhat, type="b", pch = 16,
xlab = "sepal length", ylab = "adjusted probability",
ylim = range(pdta1$yhat,pdta2$yhat,pdta3$yhat))
points(pdta2$x, pdta2$yhat, col = 2, type = "b", pch = 16)
points(pdta3$x, pdta3$yhat, col = 4, type = "b", pch = 16)
legend("topleft", legend=levels(iris.obj$yvar), fill = c(1, 2, 4))
## ------------------------------------------------------------
##
## survival
##
## ------------------------------------------------------------
data(veteran, package = "randomForestSRC")
v.obj <- rfsrc(Surv(time,status)~., veteran, nsplit = 10, ntree = 100)
## partial effect of age on mortality
partial.obj <- partial(v.obj,
partial.type = "mort",
partial.xvar = "age",
partial.values = v.obj$xvar$age,
partial.time = v.obj$time.interest)
pdta <- get.partial.plot.data(partial.obj)
plot(lowess(pdta$x, pdta$yhat, f = 1/3),
type = "l", xlab = "age", ylab = "adjusted mortality")
## example where x is discrete - partial effect of age on mortality
## we use the granule=TRUE option
partial.obj <- partial(v.obj,
partial.type = "mort",
partial.xvar = "trt",
partial.values = v.obj$xvar$trt,
partial.time = v.obj$time.interest)
pdta <- get.partial.plot.data(partial.obj, granule = TRUE)
boxplot(pdta$yhat ~ pdta$x, xlab = "treatment", ylab = "partial effect")
## partial effects of karnofsky score on survival
karno <- quantile(v.obj$xvar$karno)
partial.obj <- partial(v.obj,
partial.type = "surv",
partial.xvar = "karno",
partial.values = karno,
partial.time = v.obj$time.interest)
pdta <- get.partial.plot.data(partial.obj)
matplot(pdta$partial.time, t(pdta$yhat), type = "l", lty = 1,
xlab = "time", ylab = "karnofsky adjusted survival")
legend("topright", legend = paste0("karnofsky = ", karno), fill = 1:5)
## ------------------------------------------------------------
##
## competing risk
##
## ------------------------------------------------------------
data(follic, package = "randomForestSRC")
follic.obj <- rfsrc(Surv(time, status) ~ ., follic, nsplit = 3, ntree = 100)
## partial effect of age on years lost
partial.obj <- partial(follic.obj,
partial.type = "years.lost",
partial.xvar = "age",
partial.values = follic.obj$xvar$age,
partial.time = follic.obj$time.interest)
pdta1 <- get.partial.plot.data(partial.obj, target = 1)
pdta2 <- get.partial.plot.data(partial.obj, target = 2)
par(mfrow=c(2,2))
plot(lowess(pdta1$x, pdta1$yhat),
type = "l", xlab = "age", ylab = "adjusted years lost relapse")
plot(lowess(pdta2$x, pdta2$yhat),
type = "l", xlab = "age", ylab = "adjusted years lost death")
## partial effect of age on cif
partial.obj <- partial(follic.obj,
partial.type = "cif",
partial.xvar = "age",
partial.values = quantile(follic.obj$xvar$age),
partial.time = follic.obj$time.interest)
pdta1 <- get.partial.plot.data(partial.obj, target = 1)
pdta2 <- get.partial.plot.data(partial.obj, target = 2)
matplot(pdta1$partial.time, t(pdta1$yhat), type = "l", lty = 1,
xlab = "time", ylab = "age adjusted cif for relapse")
matplot(pdta2$partial.time, t(pdta2$yhat), type = "l", lty = 1,
xlab = "time", ylab = "age adjusted cif for death")
## ------------------------------------------------------------
##
## multivariate mixed outcomes
##
## ------------------------------------------------------------
mtcars2 <- mtcars
mtcars2$carb <- factor(mtcars2$carb)
mtcars2$cyl <- factor(mtcars2$cyl)
mtcars.mix <- rfsrc(Multivar(carb, mpg, cyl) ~ ., data = mtcars2)
## partial effect of displacement for each the three-outcomes
partial.obj <- partial(mtcars.mix,
partial.xvar = "disp",
partial.values = mtcars.mix$xvar$disp)
pdta1 <- get.partial.plot.data(partial.obj, m.target = "carb")
pdta2 <- get.partial.plot.data(partial.obj, m.target = "mpg")
pdta3 <- get.partial.plot.data(partial.obj, m.target = "cyl")
par(mfrow=c(2,2))
plot(lowess(pdta1$x, pdta1$yhat), type = "l", xlab="displacement", ylab="carb")
plot(lowess(pdta2$x, pdta2$yhat), type = "l", xlab="displacement", ylab="mpg")
plot(lowess(pdta3$x, pdta3$yhat), type = "l", xlab="displacement", ylab="cyl")
# }