IPCWJK: Jackknife Approach to Estimate the Prediction Uncertainty from Binary Classifiers under Right-Censoring

Provides functions for fitting binary classification models with inverse probability of censoring weights (IPCW) to estimate survival probabilities, and implements jackknife resampling methods for unbiased prediction error estimation, see (Jahn-Eimermacher et al. 2025) .

Details

For \(n\) individuals we observe realizations of the random variables of covariates, time to event and event indicators \((X_i,T_i,\delta_i)\). We assume the variables are independent identically distributed between individuals. The time to to event is right-censored. For the underlying unobserved time to event \(T^*_i\) and time to censoring \(C_i\) we assume them to be independent conditional on \(X_i\).

Prediction Target and IPCW

The goal is to predict the survival probability \(p_i\) at a time horizon \(\tau\). This probability is defined as

\[ p_i:=P(T^* \geq \tau | X=x_i) = P(Y=1|X=x_i).\]

The random variable \(Y:=\mathbf{I}(T^* > \tau)\) represents the dichotomized outcome that a binary classifier could use as the dependent variable for predicting \(p_i\). This random variable is unobservable for individuals, who were censored before \(\tau\). The removal of these individuals leads to worse discrimination and calibration of the model (Reps et al. 2021; Kvamme and Borgan 2023) . Inverse probability of censoring weights (IPCW) can correct this. The calculation of these weights is implemented in ipcw_weights().

When using a maximum-likelihood based model IPCW weights can either be applied to the contribution to the loss of a model (IPCW-GLM) or to to the outcome of an individual (\(Y_i\), OIPCW). In our setting, these are equivalent (Blanche et al. 2023) . For many model algorithms, this distinction depends on how training weights are implemented.

In this package the names of the observations of the random variables \((T_i,\delta_i)\) are given in the arguments time_var and status_var. \(\tau\) is specified with tau.

Standard Error (SE) using the Delta method

For many applications uncertainty of predictions plays a large role. Uncertainty is communicated with standard errors (SEs) or a confidence interval of the prediction. When a differentiable function is used to calculate predictions from asymptotically normal distributed random variables, the delta method can be used to calculate a standard error. We provide this functionality implemented for trained models in deltamethod_from_model(). A more flexible interface is available with deltamethod_pred_function().

The function deltamethod_from_model() for example supports the implementation of the IPCW-GLM logistic regression in mets::logitIPCW(). From the fitted model, both the naive and for randomness of the weights adjusted variance estimators can be used (Blanche et al. 2023; Holst et al. 2016; Scheike et al. 2014) .

This correction of the variance estimator has only been described for GLMs, we provide a model-agnostic estimation using a weighted jackknife approach.

Confidence intervals (CIs)

Wald confidence intervals (CIs) \(\hat{p}\pm z_{1-\alpha/2}SE(\hat{p}) \) in their commonly used form can lead to CIs outside the 0,1 range. Better interpretable CIs are provided by the delta method with intervals calculated on the logit scale. This assumes asymptotic normality of the prediction (Perme and Manevski 2019) .

\[ \Big[\frac{\exp(LL_{logit})}{1+\exp(LL_{logit})}; \frac{\exp(UL_{logit})}{1+\exp(UL_{logit})}\Big] \]

\[ LL_{logit} / UL_{logit}= \ln(\frac{\hat{p}}{1-\hat{p}}) \pm z_{1-\alpha/2} \frac{SE(\hat{p})}{\hat{p}(1-\hat{p})} \]

This approach is used for all CIs returned by this package.

Standard Error using a weighted jackknife Estimator

For the jackknife (Efron and Hastie 2016) estimate of the prediction standard error \(n\) models are trained on the training data. For the \(i\)th model the \(i\)th individual gets removed from the data. The prediction of the model trained on the full data will be referred to as \(\hat{p}\) and the prediction of the model with the removed individual as \(\hat{p}_{-i}\). The unweighted jackknife estimator is defined as:

\[\hat{Var}(\hat{p})=\frac{n-1}{n}\sum_{i=1}^n(\hat{p}-\hat{p}_{-i} )^2.\]

This is used by the predict() function, when naive argument is set to TRUE. When using IPCW weights the influence of a individual depends on the weights and therefore the weights need to be accounted for in the estimation. Instead of weighting each \( \hat{p}_{-i}\) prediction with \(\frac{n-1}{n}\), each model prediction with a non-zero weight gets weighted with \(1-\tilde{w}_i\). Here we assume the weights to already sum to one. This is used by default, with naive being set to FALSE.

Models

IPCW weighting and weighted jackknife standard error estimation are implemented in models inheriting from ipcwmodel.

References

Blanche PF, Holt A, Scheike T (2023). “On logistic regression with right censored data, with or without competing risks, and its use for estimating treatment effects.” Lifetime Data Analysis, 29(2), 441–482. ISSN 1380-7870, doi:10.1007/s10985-022-09564-6 .

Efron B, Hastie T (2016). Computer age statistical inference: Algorithms, evidence, and data science. Cambridge University Press.

Holst KK, Scheike TH, Hjelmborg JB (2016). “The Liability Threshold Model for Censored Twin Data.” Computational Statistics and Data Analysis, 93, 324-335. doi:10.1016/j.csda.2015.01.014 .

Jahn-Eimermacher A, Klein L, Grieser G (2025). “A Jackknife Approach to Estimate the Prediction Uncertainty from Binary Classifiers under Right-Censoring.” Statistical Methods in Medical Research. doi:10.1177/09622802251393626 .

Kvamme H, Borgan Ø (2023). “The Brier Score under Administrative Censoring: Problems and Solutions.” Journal of Machine Learning Research, 24, 1–26. 1912.08581.

Perme MP, Manevski D (2019). “Confidence intervals for the Mann–Whitney test.” Statistical Methods in Medical Research, 28(12), 3755–3768. ISSN 14770334, doi:10.1177/0962280218814556 , http://www.ncbi.nlm.nih.gov/pubmed/30514179.

Reps JM, Rijnbeek P, Cuthbert A, Ryan PB, Pratt N, Schuemie M (2021). “An empirical analysis of dealing with patients who are lost to follow-up when developing prognostic models using a cohort design.” BMC medical informatics and decision making, 21(1), 43. ISSN 1472-6947, doi:10.1186/s12911-021-01408-x , http://www.ncbi.nlm.nih.gov/pubmed/33549087.

Scheike TH, Holst KK, B.Hjelmborg J (2014). “Estimating heritability for cause specific mortality based on twin studies.” Lifetime Data Analysis, 20(2), 210-233. doi:10.1007/s10985-013-9244-x .

Author

Maintainer: Lukas Klein lukas.klein@h-da.de

Authors:

Antje Jahn-Eimermacher