Skip to contents

Fits a binary classification model using XGBoost with IPCW for right-censored survival data.

Usage

ipcw_xgboost(
  data,
  tau,
  time_var = "t",
  status_var = "delta",
  verbose = 0,
  grid = ipcw_xgboost_default_grid(),
  nrounds = 100,
  early_stopping_rounds = 10,
  nfold = 3,
  nthread = 1
)

ipcw_xgboost_default_grid()

Arguments

data

A data frame containing the survival data. Must include columns for the observed time and event indicator.

tau

Numeric scalar. The time horizon at which the survival probability is to be estimated.

time_var

Character. The name of the variable in data representing the observed time to event or censoring. Default is "t".

status_var

Character. The name of the variable in data representing the event indicator (1 if event occurred, 0 if censored). Default is "delta".

verbose

Integer. Verbosity level for XGBoost training and cross-validation (default is 0).

grid

Data frame. Grid of hyperparameters to test in cross-validation. The default is the return of ipcw_xgboost_default_grid().

nrounds

Integer. Maximum number of boosting rounds for XGBoost training and cross-validation (default is 100).

early_stopping_rounds

Integer. Number of rounds with no improvement to trigger early stopping during cross-validation (default is 10).

nfold

Integer. Number of folds for cross-validation (default is 3).

nthread

Integer. Number of threads to use for XGBoost training (default is 1).

Value

An object of class ipcwmodel.

Details

Training is performed using the xgboost package (Chen and Guestrin 2016) based on the "binary:logistic" objective. Jackknife refits are computed to derive jackknife-based standard errors.

Hyperparameter tuning is done using three (nfold) fold cross-validation with a grid of parameters. The best parameters are selected based on the minimum test log loss over 100 (nrounds) rounds with early stopping (10 rounds, early_stopping_rounds). Note that the tested hyperparameters are based on our simulation and will not be useful for all datasets.

The tested hyperparameters include:

  • booster: "gbtree" or "gblinear".

  • eta: Learning rate, tested as 1 / 10^(0:5).

  • Forbooster="gblinear":

    • max_depth: Maximum depth of the tree, tested as c(12, 6, 3, 1).

With the best parameters, the model is trained on the full dataset.

XGBoost does not support categorical variables directly.

Functions

  • ipcw_xgboost_default_grid(): Returns a default grid of hyperparameters.

References

Chen T, Guestrin C (2016). “XGBoost: A Scalable Tree Boosting System.” In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '16, 785–794. ISBN 978-1-4503-4232-2, doi:10.1145/2939672.2939785 , http://doi.acm.org/10.1145/2939672.2939785.

See also

ipcw_weights() for the underlying implementation of the weights and IPCWJK as well as (Jahn-Eimermacher et al. 2025) for more information.

Other IPCW models: ipcw_logistic_regression()

Examples

library(survival)
tau <- 100
df <- veteran[, c("time", "status", "trt")]
newdata <- data.frame(trt = c(1, 2))

fit <- ipcw_xgboost(df,
  tau = tau, time_var = "time",
  status_var = "status"
)
predict(fit, newdata)
#>   prediction     lower     upper         se
#> 1  0.4362208 0.3787636 0.4954429 0.02989897
#> 2  0.4047836 0.3287710 0.4856550 0.04034125