Z-residual: Computing and Diagnosing Gaussian-like Residuals • Zresidual

Z-residual

The Zresidual package implements diagnostic residuals based on the predictive distribution of each observation. By utilizing the full probabilistic information of the model, the package generates residuals that are approximately normally distributed, enabling further standard diagnostics for Pearson’s residuals for OLS.

The foundation of this approach is the Randomized Survival Probability (RSP). For a given observation y_i, the RSP represents the value of the predictive cumulative distribution function (CDF) at y_i, with a randomization term to ensure continuity for discrete outcomes. It is defined as:

RSP_i(y_i \mid \theta) = S_i(y_i \mid \theta) + U_i \, p_i(y_i \mid \theta)

where S_i and p_i represent the survival function and probability mass function, respectively, derived from the predictive distribution of y_i given covariates \mathbf{x}_i and parameters \theta. U_i \sim \text{Unif}(0,1) is a random uniform variable used to handle discrete outcomes. Under a correctly specified model where the observed data y_i arises from the assumed predictive distribution, the RSP follows a \text{Unif}(0,1) distribution.

The Z-residual is then derived by transforming the RSP via the inverse standard normal cumulative distribution function:

z_i^{RSP}(y_i \mid \theta) = -\Phi^{-1}[RSP_i(y_i \mid \theta)]

Under correct model specification, these residuals follow a standard normal N(0,1) distribution. This mapping allows researchers to assess the quality of the predictive distribution—including its mean, variance, and shape—using standard diagnostics for Gaussian OLS.

In practical applications, the parameter vector \theta must be estimated. The package supports two primary paradigms for constructing these residuals:

Frequentist Models

For frequentist models (e.g., survival, glmmTMB), the package computes residuals by plugging in the estimated parameters \hat{\theta}. Furthermore, for coxph models, the package provides cross-validatory Z-residuals to allow for more rigorous validation of survival models within the frequentist framework.

Bayesian Models

For Bayesian models (e.g., brms), the package accounts for parameter uncertainty by integrating over the posterior distribution:

Posterior Z-residuals: These are obtained by averaging RSPs over the full-data posterior f(\theta \mid y). Although this approach involves “double use of data” since the same observations for both estimation and residual calculation, the z-residuals are asymptotically distributed as N(0,1) and the test p-values are asymptotically Uniform (0,1) as the approach of “summarize first then test” is used.
Importance-Sampling Cross-Validatory (ISCV) Z-residuals: To provide a more robust diagnostic, the package implements ISCV Z-residuals. This method approximates leave-one-out (LOO) RSPs using importance sampling based on the full-data posterior. As for deleted residuals for OLS, ISCV Z-residuals have variance slightly greater than 1.

The resulting posterior and ISCV Z-residuals are approximately standard normal, enabling a full range of graphical and numerical diagnostics for Bayesian models.

Supported Models

The Zresidual package provides built-in support for objects generated by several prominent R modeling packages. The current implementation includes:

survival: Support for Cox Proportional Hazards models (coxph) and parametric survival models (survreg).
glmmTMB: Support for Generalized Linear Mixed Models, including those with zero-inflation and complex covariance structures.
brms: Support for a wide array of Bayesian regression models via the brmsfit interface.
stats: Support for standard Generalized Linear Models (glm). (still incomplete)
Custom Models: Support for any user-defined framework by providing log_cdf and log_pmf arguments to the generic Zresidual method. (in progress)

Installation

You can install the development version of this package from GitHub with:

# install.packages("devtools")
devtools::install_github("tiw150/Zresidual")

References

Feng, C., Li, L., Sadeghpour, A., 2020. A comparison of residual diagnosis tools for diagnosing regression models for count data. BMC Medical Research Methodology 20, 175. https://doi.org/10.1186/s12874-020-01055-2 (OA).
Li L, Wu T, Feng C. Model diagnostics for censored regression via randomized survival probabilities. Statistics in Medicine. 2021; 40: 1482–1497. https://doi.org/10.1002/sim.8852; Reprint version
Wu, T., Li, L., & Feng, C. (2024). Z-residual diagnostic tool for assessing covariate functional form in shared frailty models. Journal of Applied Statistics, 52(1), 28–58. https://doi.org/10.1080/02664763.2024.2355551; Reprint version
Wu, T., Feng, C., & Li, L. (2024). Cross-Validatory Z-Residual for Diagnosing Shared Frailty Models. The American Statistician, 79(2), 198–211. https://doi.org/10.1080/00031305.2024.2421370; Reprint version

Zresidual: Computing and Diagnosing Gaussian-like Residuals