Fits a binary classification model using XGBoost with IPCW for right-censored survival data.
Usage
ipcw_xgboost(
data,
tau,
time_var = "t",
status_var = "delta",
verbose = 0,
grid = ipcw_xgboost_default_grid(),
nrounds = 100,
early_stopping_rounds = 10,
nfold = 3,
nthread = 1
)
ipcw_xgboost_default_grid()Arguments
- data
A data frame containing the survival data. Must include columns for the observed time and event indicator.
- tau
Numeric scalar. The time horizon at which the survival probability is to be estimated.
- time_var
Character. The name of the variable in
datarepresenting the observed time to event or censoring. Default is"t".- status_var
Character. The name of the variable in
datarepresenting the event indicator (1 if event occurred, 0 if censored). Default is"delta".- verbose
Integer. Verbosity level for XGBoost training and cross-validation (default is 0).
- grid
Data frame. Grid of hyperparameters to test in cross-validation. The default is the return of
ipcw_xgboost_default_grid().- nrounds
Integer. Maximum number of boosting rounds for XGBoost training and cross-validation (default is 100).
- early_stopping_rounds
Integer. Number of rounds with no improvement to trigger early stopping during cross-validation (default is 10).
- nfold
Integer. Number of folds for cross-validation (default is 3).
- nthread
Integer. Number of threads to use for XGBoost training (default is 1).
Value
An object of class ipcwmodel.
Details
Training is performed using the xgboost package
(Chen and Guestrin 2016)
based on the "binary:logistic" objective.
Jackknife refits are computed to derive jackknife-based standard errors.
Hyperparameter tuning is done using three (nfold) fold cross-validation
with a grid of parameters. The best parameters are selected based on the
minimum test log loss over 100 (nrounds) rounds with early stopping
(10 rounds, early_stopping_rounds).
Note that the tested hyperparameters are
based on our simulation and will not be useful for all datasets.
The tested hyperparameters include:
booster:"gbtree"or"gblinear".eta: Learning rate, tested as1 / 10^(0:5).For
booster="gblinear":max_depth: Maximum depth of the tree, tested asc(12, 6, 3, 1).
With the best parameters, the model is trained on the full dataset.
XGBoost does not support categorical variables directly.
References
Chen T, Guestrin C (2016). “XGBoost: A Scalable Tree Boosting System.” In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '16, 785–794. ISBN 978-1-4503-4232-2, doi:10.1145/2939672.2939785 , http://doi.acm.org/10.1145/2939672.2939785.
See also
ipcw_weights() for the underlying implementation of the weights
and IPCWJK as well as (Jahn-Eimermacher et al. 2025)
for more information.
Other IPCW models:
ipcw_logistic_regression()
Examples
library(survival)
tau <- 100
df <- veteran[, c("time", "status", "trt")]
newdata <- data.frame(trt = c(1, 2))
fit <- ipcw_xgboost(df,
tau = tau, time_var = "time",
status_var = "status"
)
predict(fit, newdata)
#> prediction lower upper se
#> 1 0.4362208 0.3787636 0.4954429 0.02989897
#> 2 0.4047836 0.3287710 0.4856550 0.04034125