Type Package
Title ROCModels: ROC Models and AUC Estimation for
diffrent models
Version 1.0.0
Date 2025-12-09
Encoding UTF-8
Depends R (>= 2.14)
Imports ggplot2, kedd, dplyr, survival, nleqslv,
HDInterval, MASS, doParallel, foreach, pbivnorm, nor1mix, parallel
Description The receiver operating characteristic (ROC)
curve is one of the most widely used tools for evaluating diagnostic and
prognostic biomarkers across diverse scientific fields, particularly in
medicine. Despite its ubiquity, ROC estimation and testing methods
differ substantially in their assumptions and resulting curve
properties. This package provides a unified framework for constructing,
visualizing, and comparing parametric, nonparametric, semiparametric,
and Bayesian ROC curves. ‘ROCModels’ helps researchers identify and
implement ROC inference methods most suitable for their data.
License GPL
NeedsCompilation yes
Author Ruhul Ali Khan [cre, aut] (ORCID: https://orcid.org/0000-0003-1173-8345),
Raja Nakka [aut],
Musie Ghebremichael [aut]
Maintainer Ruhul Ali Khan <rkhan23@mgh.harvard.edu>, <ruhulali.khan@gmail.com>
Repository CRAN
Date/Publication 2025-12-09 13:58:45 UTC
Contents
ROCModels-package ROCModels
Description
The receiver operating characteristic (ROC) curve is a fundamental tool for evaluating diagnostic and prognostic biomarkers, particularly in medical research. However, ROC estimation methods differ substantially in their underlying assumptions, statistical properties, and inferential objectives.
The ROCModels package offers a unified framework for
constructing, visualizing, and comparing ROC curves using a wide range
of modeling approaches:
Nonparametric Methods
Parametric Methods
Semiparametric Method
Bayesian Methods
Parametric Bayesian (Bayesian Bi-Weibull)
Semiparametric Bayesian (Dirichlet process mixture of normals)
Nonparametric Bayesian (Bayesian Bootstrap ROC Curve)
Except for the empirical and order-restricted estimators, all other methods produce smooth ROC curves. This package helps researchers identify and implement inference methods most appropriate for their data, promoting transparent, reproducible, and methodologically rigorous ROC analysis. Alonzo, T. A., and Pepe, M. S. (2002) <doi: 10.1093/biostatistics/3.3.421>, Andrews, D. F., and Herzberg, A. M. (1985) <doi: 10.1007/978-1-4612-5098-2>, Bamber, D. (1975) <doi: 10.1016/0022-2496(75)90001-2>, Cox, D. R. (1972) <doi: 10.1111/j.2517-6161.1972>, Cox, D. R. (1975) <doi: 10.1093/biomet/62.2.269>, DeLong, E. R., DeLong, D. M., and Clarke-Pearson, D. L. (1988) <doi: 10.2307/2531595>, Dorfman, D. D., and Alf, E. (1969) <doi: 10.1016/0022-2496(69)90019-4>, Dorfman, D. D., Berbaum, K. S., and Metz, C. E. (1997) <doi: 10.1016/s1076-6332(97)80013-x>, Erkanli, A., Sung, L., and Stamey, J. D. (2006) <doi: 10.1002/sim.2496>, Faraggi, D., and Reiser, B. (2002) <doi: 10.1002/sim.1228>, Ghebremichael, M., and Habtemicael, S. (2018) <doi: 10.1080/02664763.2017.1420758>, Ghebremichael, M., and Michael, H. (2024) <doi: 10.1080/03610918.2022.2032159>, Ghebremichael, M., et al. (2019) <doi: 10.3844/jmssp.2019.55.64>, Gönen, M., and Heller, G. (2010) <doi: 10.1177/0272989X09360067>, Gopalakrishnan, V., et al. (2020) <doi: 10.1186/s12879-020-05458-w>, Green, D. M., and Swets, J. A. (1966) <doi: 10.117959845>, Gu, J., and Ghosal, S. (2009) <doi: 10.1016/j.jspi.2008.09.014>, Gu, Y., Ghosal, S., and Roy, A. (2008) <doi: 10.1002/sim.3366>, Guidoum, A. C. (2020) <doi: 10.32614/CRAN.package.kedd>, <doi: 10.48550/arXiv.2012.06102>, Guo, B. (2015) <doi: 10.1184/rid/d-scholarship/23590>, Hanley, J. A., and McNeil, B. J. (1982) <doi: 10.1148/radiology.143.1.7063747>, Hsieh, F., and Turnbull, B. W. (1996) <doi: 10.1214/aos/1033066197>, Hussain, E. (2012) <doi: 10.6000/1927-5129.2012.08.02.09>, Ishwaran, H., and James, L. F. (2002) <doi: 10.1198/106186002411>, Jokiel-Rokita, A., and Topolnicki, R. (2020) <doi: 10.1016/j.csda.2019.106820>, Krzanowski, W. J., and Hand, D. J. (2009) <doi: 10.1201/9781439800225>, Kundu, D., and Gupta, R. D. (2006) <doi: 10.1109/TR.2006.874918>, Lloyd, C. J. (1998) <doi: 10.1080/01621459.1998.10473797>, Lehmann, E. L. (1953) <doi: 10.1214/aoms/1177729080>, Metz, C. E., Herman, B. A., and Shen, J. H. (1998) <doi: 10.1002/(sici)1097-0258(19980515)17:9<1033::aid-sim784>3.0.co;2-z>, Pepe, M. S. (2003) <doi: 10.1093/oso/9780198509844.001.0001>, Pundir, S., and Amala, R. (2014) <doi: 10.22237/jmasm/1398917940>, Silverman, B. W. (2018) <doi: 10.1201/9781315140919>, Yeo, I. K., and Johnson, R. A. (2000) <doi: 10.1093/biomet/87.4.954>, Zhou, X. H., McClish, D. K., and Obuchowski, N. A. (2009) <doi: 10.1002/9780470906514>, Zou, K. H., Hall, W. J., and Shapiro, D. E. (1997) <doi: 10.1002/(sici)1097-0258(19971015)16:19<2143::aid-sim655>3.0.co;2-3>.
Details
The core functionality of the ROCModels package centers
around the AUC() function, which computes the area under
the ROC curve (AUC), its confidence interval (CI), and generates the
corresponding ROC curve.
Users can choose from a wide variety of modeling approaches, as outlined in the description above. These include parametric, nonparametric, semiparametric, and Bayesian methods. Within each modeling framework, the package supports multiple options for constructing ROC curves and selecting appropriate confidence interval techniques.
Subsequent sections of this documentation provide detailed mathematical formulations, implementation specifications, and code examples for each modeling approach and supported CI method.
This flexibility allows researchers to tailor ROC estimation and inference to the specific characteristics of their data and scientific objectives, promoting transparent, reproducible, and methodologically sound analysis.
Authors Ruhul Ali Khan, Raja Nakka, Musie Ghebremichael. Maintainer: Ruhul Ali Khan <rkhan23@mgh.harvard.edu>, <ruhulali.khan@gmail.com>
Abbreviations
The following abbreviations are employed extensively in this package:
References
Alonzo, T. A., & Pepe, M. S. (2002). Distribution-free ROC analysis using binary regression techniques. Biostatistics, 3(3), 421–432. https://doi.org/10.1093/biostatistics/3.3.421
Andrews, D. F., & Herzberg, A. M. (1985). Data: A Collection of Problems from Many Fields for the Student and Research Worker. Springer-Verlag, Berlin. https://doi.org/10.1007/978-1-4612-5098-2
Bamber, D. (1975). The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of Mathematical Psychology, 12(4), 387–415. https://doi.org/10.1016/0022-2496(75)90001-2
Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society: Series B, 34, 187–220. https://doi.org/10.1111/j.2517-6161.1972
Cox, D. R. (1975). Partial likelihood. Biometrika, 62, 269–276. https://doi.org/10.1093/biomet/62.2.269
Daniel, F., Ooi, H., Calaway, R., Microsoft, & Weston, S. (2022). doParallel: Foreach Parallel Adaptor for the ‘parallel’ Package. R package version 1.0.17. https://CRAN.R-project.org/package=doParallel
Daniel, F., Ooi, H., Calaway, R., Microsoft, & Weston, S. (2022). foreach: Provides Foreach Looping Construct for R. R package version 1.5.2. https://CRAN.R-project.org/package=foreach
DeLong, E. R., DeLong, D. M., & Clarke-Pearson, D. L. (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics, 44(3), 837–845. https://doi.org/10.2307/2531595
Dorfman, D. D., & Alf, E. (1969). Maximum-likelihood estimation of parameters of signal-detection theory and determination of confidence intervals-rating method data. Journal of Mathematical Psychology, 6, 487–496. https://doi.org/10.1016/0022-2496(69)90019-4
Dorfman, D. D., Berbaum, K. S., & Metz, C. E. (1997). Proper receiver operating characteristic analysis: The bigamma model. Academic Radiology, 4, 138–149. https://doi.org/10.1016/s1076-6332(97)80013-x
Erkanli, A., Sung, L., & Stamey, J. D. (2006). Bayesian semi-parametric ROC curve estimation. Statistics in Medicine, 25, 3905–3928. https://doi.org/10.1002/sim.2496
Faraggi, D., & Reiser, B. (2002). Estimation of the area under the ROC curve. Statistics in Medicine, 21, 3093–3106. https://doi.org/10.1002/sim.1228
Ghebremichael, M., & Habtemicael, S. (2018). Effect of tuberculosis on immune restoration among HIV-infected patients receiving antiretroviral therapy. Journal of Applied Statistics, 45(13), 2357–2364. https://doi.org/10.1080/02664763.2017.1420758
Ghebremichael, M., & Michael, H. (2024). Comparison of the binormal and Lehmann receiver operating characteristic curves. Communications in Statistics—Simulation and Computation, 53(2), 772–785. https://doi.org/10.1080/03610918.2022.2032159
Ghebremichael, M., et al. (2019). Comparing the diagnostic accuracy of CD4+ T-lymphocyte count and percent as surrogate markers of pediatric HIV disease. Journal of Mathematics and Statistics, 15(1), 55–64. https://doi.org/10.3844/jmssp.2019.55.64
Gönen, M., & Heller, G. (2010). Lehmann family of ROC curves. Medical Decision Making, 30(4), 509–517. https://doi.org/10.1177/0272989X09360067
Gopalakrishnan, V., et al. (2020). Pre-HAART CD4+ T-lymphocytes as biomarkers of post-HAART immune recovery in HIV-infected children with or without TB co-infection. BMC Infectious Diseases, 20, 1–8. https://doi.org/10.1186/s12879-020-05458-w
Green, D. M., & Swets, J. A. (1966). Signal Detection Theory and Psychophysics, Vol. 1. Wiley, New York. https://www.semanticscholar.org/paper/b11fa6f41f9bbc17bfe1b94e857ee76b6f0bd7f5
Gu, J., & Ghosal, S. (2009). Bayesian ROC curve estimation under binormality using a rank likelihood. Journal of Statistical Planning and Inference, 139(6), 2076–2083. https://doi.org/10.1016/j.jspi.2008.09.014
Gu, Y., Ghosal, S., & Roy, A. (2008). Bayesian bootstrap for ROC curve estimation. Bayesian Analysis, 3(3), 659–676. https://doi.org/10.1002/sim.3366
Guidoum, A. C. (2020). kedd: Kernel Estimator and Bandwidth
Selection for Density and Its Derivatives. R package.
CRAN DOI: 10.32614/CRAN.package.kedd
arXiv preprint: https://doi.org/10.48550/arXiv.2012.06102
Guo, B. (2015). On the effect of improperness of binormal ROC curves for estimating full area under the curve. PhD Thesis, University of Pittsburgh. http://d-scholarship.pitt.edu/id/eprint/23590
Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1), 29–36. https://doi.org/10.1148/radiology.143.1.7063747
Hasselman, B. (2022). nleqslv: Solve Systems of Nonlinear Equations. R package version 3.3.5. https://CRAN.R-project.org/package=nleqslv
Hsieh, F., & Turnbull, B. W. (1996). Nonparametric and semiparametric estimation of the ROC curve. Annals of Statistics, 24(1), 25–40. https://doi.org/10.1214/aos/1033066197
Hussain, E. (2012). The Bi-Gamma ROC Curve in a Straightforward Manner. Journal of Basic and Applied Sciences, 8(2). https://doi.org/10.6000/1927-5129.2012.08.02.09
Ishwaran, H., & James, L. F. (2002). Approximate Dirichlet process computing in finite normal mixtures. Journal of Computational and Graphical Statistics, 11(3), 508–532. https://doi.org/10.1198/106186002411
Jokiel-Rokita, A., & Topolnicki, R. (2020). Estimation of the ROC curve from the Lehmann family. Computational Statistics & Data Analysis, 142, 106820. https://doi.org/10.1016/j.csda.2019.106820
Kenkel. B., Genz, A. (2015). pbivnorm: Vectorized Computation of the Bivariate Normal Probabilities. R package version 0.6.0. https://CRAN.R-project.org/package=pbivnorm
Krzanowski, W. J., & Hand, D. J. (2009). ROC Curves for Continuous Data. CRC Press. https://doi.org/10.1201/9781439800225
Kundu, D., & Gupta, R. D. (2006). Estimation of \(P[Y < X]\) for Weibull distributions. IEEE Transactions on Reliability, 55(2), 270–280. https://doi.org/10.1109/TR.2006.874918
Lloyd, C. J. (1998). Using smoothed receiver operating characteristic curves to summarize and compare diagnostic systems. Journal of the American Statistical Association, 93(444), 1356–1364. https://doi.org/10.1080/01621459.1998.10473797
Lehmann, E. L. (1953). The power of rank tests. Annals of Mathematical Statistics, 24, 23–43. https://doi.org/10.1214/aoms/1177729080
Maechler, M. (2024). nor1mix: Normal Mixture Models with One Unknown Component. R package version 1.2-3. https://CRAN.R-project.org/package=nor1mix
Metz, C. E., Herman, B. A., & Shen, J. H. (1998). Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data. Statistics in Medicine, 17, 1033–1053. https://doi.org/10.1002/(sici)1097-0258(19980515)17:9%3C1033::aid-sim784%3E3.0.co;2-z
Ngumbang, J., Meredith, M., & Kruschke, J. K. (2023). HDInterval: Highest (Posterior) Density Intervals. R package version 0.2.5. https://CRAN.R-project.org/package=HDInterval
Pepe, M. S. (2003). The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford University Press. https://doi.org/10.1093/oso/9780198509844.001.0001
Pundir, S., & Amala, R. (2014). Evaluation of area under the constant shape bi-Weibull ROC curve. Journal of Modern Applied Statistical Methods, 13(1), 20. https://doi.org/10.22237/jmasm/1398917940
R Core Team (2023). parallel: Support for Parallel Computation in R. Part of R base distribution. https://stat.ethz.ch/R-manual/R-devel/library/parallel/doc/parallel.pdf
Silverman, B. W. (2018). Density Estimation for Statistics and Data Analysis. Chapman & Hall/CRC. https://doi.org/10.1201/9781315140919
Therneau, T. M. (2023). A Package for Survival Analysis in R. R package version 3.5-7. https://CRAN.R-project.org/package=survival
Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag, New York. ISBN 978-3-319-24277-4. https://ggplot2.tidyverse.org
Wickham, H., François, R., Henry, L., Müller, K. & Vaughan, D. (2023). dplyr: A Grammar of Data Manipulation. R package version 1.1.3. https://CRAN.R-project.org/package=dplyr
Yeo, I. K., & Johnson, R. A. (2000). A new family of power transformations to improve normality or symmetry. Biometrika, 87(4), 954–959. https://doi.org/10.1093/biomet/87.4.954
Zhou, X. H., McClish, D. K., & Obuchowski, N. A. (2009). Statistical Methods in Diagnostic Medicine. John Wiley & Sons. https://doi.org/10.1002/9780470906514
Zou, K. H., Hall, W. J., & Shapiro, D. E. (1997). Smooth nonparametric receiver operating characteristic (ROC) curves for continuous data. Statistics in Medicine, 16, 2143–2156. https://doi.org/10.1002/(sici)1097-0258(19971015)16:19%3C2143::aid-sim655%3E3.0.co;2-3
Installation
To install the ROCModels package, ensure your R session
is connected to the internet. Then, run the following command in the R
console:
Then load the package in R:
From this point, the examples below assume that ROCModels is loaded.
Data Format Preparing Your Dataset for Use with
ROCModels
Before using the package, it is essential to format your dataset according to the following guidelines.
The main function, AUC(), requires a data frame named
data that contains exactly two
columns:
biomarker: Numeric values representing the diagnostic
marker.status: Disease status encoded as a character or factor
with two levels:
"0" for non-diseased individuals
(controls)"1" for diseased individuals
(cases)These column names and coding conventions must be followed precisely to ensure compatibility with the package’s functions.
DMDmodified Default Dataset
This package includes a built-in dataset to show immediate functionality. This dataset comprises 209 records from female individuals assessed for potential carrier status of Duchenne Muscular Dystrophy (DMD). Among them, 75 are identified as carriers and 134 as non-carriers. The dataset includes demographic information and biochemical measurements from four serum markers commonly used in clinical screening, which may show elevated levels in carriers despite the absence of symptoms.
Variables
For demonstration purposes, we have filtered the original dataset to
focus specifically on the CK biomarker. In this
modified version, CK is treated as biomarker, and the
Class column serves as the status indicator with
levels "0" and "1"
("0" denotes the normal (non‑diseased or
controls) and "1" denotes the carrier
cases. (diseased)).
This curated dataset is included in the package under the name
DMDmodified for illustration purpose.This dataset follows
the required data format for the package.
Reference
AUC Compute the Area Under the ROC Curve and
Plot the ROC Curve
Details
The AUC() function is the central component of the
ROCModels package. It calculates the area under the ROC
curve (AUC), estimates its confidence interval (CI), and produces the
corresponding ROC plot.
Usage
Arguments
data
A data frame containing two columns:
biomarker: numeric values representing the diagnostic
markerstatus: character or factor with levels
"0" (controls) and "1" (cases)method
A character string specifying the ROC/AUC modeling approach. Supported
options include:
"empirical" – empirical ROC,"order" – ROC curve under stochastic order
constraints,"norm_silver" – kernel ROC with normal
kernel and Silverman bandwidth,"norm_ucv" – kernel ROC with normal
kernel and UCV bandwidth,"bi_silver" – kernel ROC with biweight
kernel and Silverman bandwidth,"bi_ucv" – kernel ROC with biweight
kernel and UCV bandwidth,"binormal" – classical binormal ROC model,"biweibull" – parametric bi‑Weibull
ROC,"bigamma" – parametric ROC assuming gamma
distributions,"lehmann" – ROC under the Lehmann alternative,"bayesbiweibull" – Bayesian bi‑Weibull
ROC (MCMC‑based),"BB" – Bayesian bootstrap ROC,"dpm" – Dirichlet process mixture
based ROC.Method names are case-sensitive and must match exactly. Each method is described in detail in later sections.
ci
Logical. If TRUE (default), the function computes
confidence intervals for the AUC (and, in some models, credible
intervals for Bayesian methods).
ci_method
Specifies the type of interval estimation:
"delong" – DeLong’s variance-based normal
approximation"bootstrap" – nonparametric bootstrap interval"hm" – Hanley–McNeil variance-based interval"mle" – likelihood-based interval"all" – computes all applicable interval types for the
selected methodNot all CI methods are compatible with every model. Each method has a default CI approach, and compatibility will be discussed in the corresponding documentation sections.
siglevel
Significance level \(\alpha\) for the
confidence interval.
The corresponding confidence level is \(1 -
\alpha\). For example, siglevel = 0.05 yields a 95%
interval.
boot_iter
Number of bootstrap resamples used only when
ci_method = "bootstrap" or when "all" is
requested. Larger values give more stable intervals but increase
computation time. This option is applicable only when the confidence
interval is computed using the bootstrap method.
Value
The primary behavior of the AUC() function is to:
The exact structure of the returned object may vary depending on the chosen model. For typical usage:
AUC()$summary to access the printed outputAUC()$plot to retrieve the ROC curve
visualizationExamples
# Import well formated dataset
data(DMDmodified)
# Calculate AUC summary and ROC plot
auc <- AUC(
data=DMDmodified,
method = "empirical",
ci = TRUE
)
# Get the AUC summary
auc$summary
# Get the ROC plot
auc$plotNext we describe, at a high level, the methods invoked by the
method argument.
empirical Empirical ROC
To apply this method, set method = "empirical" in the
AUC() function. The following options are available for
ci_method:
"delong" – DeLong’s variance-based normal
approximation"bootstrap" – nonparametric bootstrap percentile
method"hm" – Hanley–McNeil variance-based interval"all" – computes all applicable interval types for the
selected methodUsage
AUC(
data = data,
method = "empirical",
ci = TRUE,
ci_method = "delong",
siglevel = 0.05,
boot_iter = 1000
)Description
The empirical ROC method is a fully nonparametric approach that makes no assumptions about the underlying distribution of the biomarker in either group. It is based on the Mann–Whitney U statistic, including adjustments for tied values, and provides a widely accepted estimate of both the ROC curve and the AUC.
The empirical ROC curve is defined as:
\[ ROC_{\text{emp}}(t) = 1 - G_n\left(F_m^{-1}(1 - t)\right), \quad \text{for } 0 < t < 1 \]
where \(F_m\) and \(G_n\) are empirical estimator. The corresponding AUC estimator is:
\[ \widehat{\mathrm{AUC}}_{\text{emp}} = \frac{1}{mn} \sum_{i=1}^m \sum_{j=1}^n \left[ I(X_i < Y_j) + \frac{1}{2} I(X_i = Y_j) \right] \]
where \(X_1, \dots, X_m\) are biomarker values from controls and \(Y_1, \dots, Y_n\) are from cases.
This method produces a jagged, step-like ROC curve. For small datasets, the curve may appear more irregular and less stable.
Example
# Load the formatted dataset
data(DMDmodified)
# Compute AUC summary and ROC plot
auc <- AUC(
data = DMDmodified,
method = "empirical",
ci = TRUE,
ci_method = "delong",
siglevel = 0.05,
boot_iter = 1000
)
# Display AUC summary
auc$summary
# Display ROC plot
auc$plotReferences
Bamber, D. (1975). The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of Mathematical Psychology, 12(4), 387–415. https://doi.org/10.1016/0022-2496(75)90001-2
DeLong, E. R., DeLong, D. M., & Clarke-Pearson, D. L. (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics, 44(3), 837–845. https://doi.org/10.2307/2531595
Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1), 29–36. https://doi.org/10.1148/radiology.143.1.7063747
Hsieh, F., & Turnbull, B. W. (1996). Nonparametric and semiparametric estimation of the ROC curve. Annals of Statistics, 24(1), 25–40. https://doi.org/10.1214/aos/1033066197
Pepe, M. S. (2003). The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford University Press. https://doi.org/10.1093/oso/9780198509844.001.0001
order Order-Restricted ROCTo apply this method, set method = "order" in the
AUC() function. The following options are available for
ci_method:
"bootstrap" – nonparametric bootstrap percentile
interval (recommended default)"delong" – DeLong’s variance-based normal approximation
(large samples)"hm" – Hanley–McNeil variance-based interval (large
samples)"all" – computes all applicable interval types for the
selected methodUsage
AUC(
data = data,
method = "order",
ci = TRUE,
ci_method = "bootstrap",
siglevel = 0.05,
boot_iter = 1000
)Description
For a useful binary classifier, the true positive rate (TPR) should be greater than or equal to the false positive rate (FPR) across all thresholds. Geometrically, this places the ROC curve always on or above the diagonal, with the AUC lying between 0.5 (random allocation) and 1 (perfect classification). In practice, however, the empirical distribution functions \(F_m\) and \(G_n\) obtained from finite samples may not respect this order due to sampling variability. To address this, order-restricted ROC methods enforce \(\overline{G}_n(u) \ge \overline{F}_m(u)\) constraints that are biologically or theoretically reasonable, leading to smoother, more stable ROC curves and more accurate AUC estimates.
Under the order restriction framework, Jokiel-Rokita and Topolnicki (2020) extended the methodology to ROC estimation. Let \(F_m\) and \(G_n\) be the empirical distribution functions of the controls and cases, respectively. Define the empirical distribution based on the combined samples \[ P_{mn}(t) = \frac{m}{m+n} F_m(t) + \frac{n}{m+n} G_n(t), \] and the order-restricted estimators \[ F_{mn}(t) = \max\{ F_m(t), P_{mn}(t) \}, \qquad G_{mn}(t) = \min\{ G_n(t), P_{mn}(t) \}. \] The order-restricted ROC curve is then defined by \[ ROC_{\text{or}}(t) = 1 - G_{mn}\left( F_{mn}^{-1}(1 - t) \right), \quad 0 < t < 1, \] where \(F_{mn}^{-1}\) denotes inverse of \(F_{mn}\). The area under the order-restricted ROC curve is defined as \[ \widehat{\mathrm{AUC}}_{\text{or}} = \int_0^1 {ROC}_{\text{or}}(t), dt, \] where \({ROC}_{\text{or}}(t)\) is the estimated order-restricted ROC curve. Under suitable regularity conditions, the asymptotic distributions of \(\widehat{\mathrm{AUC}}_{\text{or}}\) and \(\widehat{\mathrm{AUC}}_{\text{emp}}\) are equivalent. Consequently, for large sample sizes, variance approximations developed for the empirical AUC—such as those by Hanley & McNeil or DeLong—can also be used for the order-restricted AUC as a large-sample approximation.
This is a nonparametric method produces a jagged, step-like ROC curve but little smoother than empirical ROC curve. This method is particularly useful when:
Example
# Load the formatted dataset
data(DMDmodified)
# Compute order-restricted AUC summary and ROC plot
auc <- AUC(
data = DMDmodified,
method = "order",
ci = TRUE,
ci_method = "bootstrap",
siglevel = 0.05,
boot_iter = 1000
)
# Display AUC summary
auc$summary
# Display order-restricted ROC plot
auc$plotReferences
norm_silver, norm_ucv,
bi_silver, bi_ucv Kernel Density–Based
Smooth ROCTo apply a kernel density–based ROC method, set method
to one of the following options in the AUC() function:
"norm_silver" – Gaussian kernel with Silverman
bandwidth"norm_ucv" – Gaussian kernel with unbiased
cross-validation (UCV) bandwidth"bi_silver" – Biweight kernel with Silverman
bandwidth"bi_ucv" – Biweight kernel with UCV bandwidthEach combination defines both the kernel function \(K(\cdot)\) and the bandwidth selection rule
used to smooth the ROC curve. The following options are available for
ci_method:
"delong" – DeLong’s variance-based normal
approximation"bootstrap" – nonparametric bootstrap percentile
method"all" – computes all applicable interval types for the
selected method, i.e., "all"Usage
AUC(
data = data,
method = "norm_silver",
ci = TRUE,
ci_method = "bootstrap",
siglevel = 0.05,
boot_iter = 1000
)Description
To obtain a smoother nonparametric and more interpretable estimate, kernel density estimation (KDE) can be used to estimate the underlying distribution functions of the marker in each group. The resulting ROC curve is continuous and differentiable, offering both interpretability and visual smoothness.
Kernel Density Estimation
Let ( X_1, , X_m ) denote marker values from controls and ( Y_1, , Y_n ) from cases. The kernel density estimators are given by
\[ \hat{f}(x) = \frac{1}{m h_m} \sum_{i=1}^m K\left(\frac{x - X_i}{h_m}\right), \quad \hat{g}(x) = \frac{1}{n h_n} \sum_{i=1}^n K\left(\frac{x - Y_i}{h_n}\right), \] where \(h_m\) and \(h_n\) are the bandwidths controlling smoothness, and \(K(\cdot)\) is a kernel function that integrates to one.
The corresponding cumulative distribution estimators are
\[ \hat{F}(t) = \int_{-\infty}^{t} \hat{f}(x), dx, \qquad \hat{G}(t) = \int_{-\infty}^{t} \hat{g}(x), dx. \]
Then the kernel-smoothed ROC curve is defined as
\[ \label{kde_roc} \widehat{ROC}_{kde}(t) = 1 - \hat{G}\left(\hat{F}^{-1}(1 - t)\right), \quad 0 < t < 1, \] where \(\hat{F}^{-1}(t) = \inf{x : \hat{F}(x) \ge t}\).
Bandwidth and Kernel Selection
Choosing an appropriate bandwidth and kernel is crucial for balancing bias and variance:
Two bandwidth selection methods are available:
Available kernel functions include:
There is no universally optimal kernel, but the Gaussian and biweight kernels are widely used and perform robustly across diverse data conditions.
AUC Estimation
The area under the kernel-smoothed ROC curve is given by
\[ \label{auc_kde} \widehat{\mathrm{AUC}}_{kde} = \int_0^1 \widehat{ROC}_{kde}(t), dt, \] which is evaluated numerically using trapezoidal rule.
The variance of \(\widehat{\mathrm{AUC}}_{kde}\) can be estimated using bootstrap resampling, which accounts for uncertainty in both the kernel estimation and the sampling process. Zou et al. (1997), stated that the smoothing introduced by KDE has negligible effect on the first-order variance of \(\widehat{\mathrm{AUC}}_{kde}\). To ensure confidence intervals remain within the \((0,1)\) range, a log-transformation is recommended: \[ -\log(1 - \widehat{\mathrm{AUC}}_{kde}), \] constructing intervals on this transformed scale and then back-transforming for interpretation.
As a nonparametric method, kernel-smoothed ROC curve provides several advantages:
While flexible, kernel-based ROC methods also have several limitations:
Example
# Load formatted dataset
data(DMDmodified)
# Compute smooth ROC using Gaussian kernel and Silverman bandwidth
auc <- AUC(
data = DMDmodified,
method = "norm_silver",
ci = TRUE,
ci_method = "bootstrap",
siglevel = 0.05,
boot_iter = 1000
)
# Display AUC summary
auc$summary
# Display smooth ROC plot
auc$plotReferences
Zou, K. H., Hall, W. J & Shapiro, D. E. (1997). Smooth nonparametric receiver operating characteristic (ROC) curves for continuous data. Statistics in Medicine, 16, 2143–2156. https://doi.org/10.1002/(sici)1097-0258(19971015)16:19%3C2143::aid-sim655%3E3.0.co;2-3
Lloyd, C. J. (1998). Using smoothed receiver operating characteristic curves to summarize and compare diagnostic systems. Journal of the American Statistical Association, 93(444), 1356–1364. https://doi.org/10.1080/01621459.1998.10473797
Silverman, B. W. (2018). Density Estimation for Statistics and Data Analysis. Chapman & Hall/CRC. https://doi.org/10.1201/9781315140919
binormal Binormal ROC CurveTo apply this method, set method = "binormal" in the
AUC() function. The following options are available for
ci_method:
"mle" – likelihood-based interval"bootstrap" – parametric bootstrap percentile
interval"all" – computes both likelihood-based and bootstrap
intervalsUsage
AUC(
data = data,
method = "binormal",
ci = TRUE,
ci_method = "bootstrap",
siglevel = 0.05,
boot_iter = 1000
)Description
The bi-normal ROC model is one of the most widely used parametric approaches to ROC analysis. It assumes that biomarker values for both the non-diseased \(F\) and diseased \(G\) populations follow normal distributions, but with potentially different means and variances. The model assumes: \[ F(x) = \Phi\left( \frac{x - \mu_0}{\sigma_0} \right), \quad G(y) = \Phi\left( \frac{y - \mu_1}{\sigma_1} \right), \] where \(\Phi(\cdot)\) is the standard normal CDF, and \(\mu_0, \sigma_0^2, \mu_1, \sigma_1^2\) are the means and variances for the two groups.
Defining \[ a = \frac{\mu_1 - \mu_0}{\sigma_1}, \qquad b = \frac{\sigma_0}{\sigma_1}, \] the bi-normal ROC curve can be expressed as
\[ ROC_{\text{Bin}}(t) = \Phi\left( a + b,\Phi^{-1}(t) \right), \] where \(\Phi^{-1}(\cdot)\) is the quantile function of the standard normal distribution. The corresponding AUC is given by \[ AUC_{\text{Bin}} = \Phi\left( \frac{a}{\sqrt{1 + b^2}} \right). \]
The parameters \(\mu_0, \sigma_0, \mu_1, \sigma_1\) are estimated using Maximum likelihood estimation (MLE), assuming normality. MLE provides asymptotically efficient estimates and allows for likelihood-based confidence intervals on the AUC.
Because the AUC has a closed-form expression, confidence intervals can be obtained using either:
ci_method = "mle") derived from the delta method (via MLE
covariance estimates), orci_method = "bootstrap") for more robust inference under
small samples or mild deviations from normality.Example
# Load formatted dataset
data(DMDmodified)
# Compute bi-normal AUC summary and ROC plot
auc <- AUC(
data = DMDmodified,
method = "binormal",
ci = TRUE,
ci_method = "mle",
siglevel = 0.05,
boot_iter = 1000
)
# Display AUC summary
auc$summary
# Display bi-normal ROC plot
auc$plotReferences
Dorfman, D. D., & Alf, E. (1969). Maximum-likelihood estimation of parameters of signal-detection theory and determination of confidence intervals—rating method data. Journal of Mathematical Psychology, 6, 487–496. https://doi.org/10.1016/0022-2496(69)90019-4
Hsieh, F., & Turnbull, B. W. (1996). Nonparametric and semiparametric estimation of the ROC curve. Annals of Statistics, 24(1), 25–40. https://doi.org/10.1214/aos/1033066197
Metz, C. E., Herman, B. A., & Shen, J. H. (1998). Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data. Statistics in Medicine, 17, 1033–1053. https://doi.org/10.1002/(sici)1097-0258(19980515)17:9%3C1033::aid-sim784%3E3.0.co;2-z
Faraggi, D., & Reiser, B. (2002). Estimation of the area under the ROC curve. Statistics in Medicine, 21, 3093–3106. https://doi.org/10.1002/sim.1228
Yeo, I. K., & Johnson, R. A. (2000). A new family of power transformations to improve normality or symmetry. Biometrika, 87(4), 954–959. https://doi.org/10.1093/biomet/87.4.954
biweibull Constant-shape bi-Weibull ROC
CurveTo apply this method, set method = "biweibull" in the
AUC() function. The following options are available for
ci_method:
"mle" – likelihood-based interval"bootstrap" – parametric bootstrap percentile
interval"all" – computes both likelihood-based and bootstrap
intervalsUsage
AUC(
data = data,
method = "biweibull",
ci = TRUE,
ci_method = "mle",
siglevel = 0.05,
boot_iter = 1000
)Description
The constant-shape bi-Weibull model is a flexible parametric model for ROC analysis. Let \(X\) and \(Y\) denote biomarker values for non-diseased and diseased subjects, respectively. Assume that both follow Weibull distributions with a common shape parameter \(\alpha\), but possibly different scale parameters \(\theta_0\) and \(\theta_1\). Then, the bi-Weibull ROC curve:
\[ ROC_{\text{Biw}}(t) = t^{\frac{\theta_0}{\theta_1}}, \quad t \in (0,1). \]
The corresponding AUC has a simple closed-form expression which is given by
\[ AUC_{\text{Biw}} = \frac{\theta_1}{\theta_0 + \theta_1}. \]
The model parameters \(\alpha, \theta_0, \theta_1\) are typically estimated via maximum likelihood estimation (MLE), which provides consistent and efficient estimators under the Weibull assumption. Then, confidence intervals are calculated using
ci_method = "mle") via asymptotic normality of MLEs,
orci_method = "bootstrap") for more robust inference under
small samples.The bi-Weibull ROC curve assumes biomarker values for the non-diseased and diseased populations each follow Weibull distributions. Owing to its adaptable shape parameter, the Weibull family can approximate several common distributions—including the exponential, Rayleigh, and even log-normal-like forms—making it particularly effective for modeling skewed or heavy-tailed biomedical data, where symmetric distributions such as the normal may perform poorly. Under the Weibull distributional assumption, this parametric model also outperforms for small sample sizes.
Example
# Load formatted dataset
data(DMDmodified)
# Compute bi-Weibull AUC summary and ROC plot
auc <- AUC(
data = DMDmodified,
method = "biweibull",
ci = TRUE,
ci_method = "mle",
siglevel = 0.05,
boot_iter = 1000
)
# Display AUC summary
auc$summary
# Display bi-Weibull ROC plot
auc$plotReferences
Pundir, S., & Amala, R. (2014). Evaluation of area under the constant shape bi-Weibull ROC curve. Journal of Modern Applied Statistical Methods, 13(1), 20. https://doi.org/10.22237/jmasm/1398917940
Kundu, D., & Gupta, R. D. (2006). Estimation of \(P[Y < X]\) for Weibull distributions. IEEE Transactions on Reliability, 55(2), 270–280. https://doi.org/10.1109/TR.2006.874918
Khan, R. A., & Ghebremichael, M. (2025). Comparing estimation methods for the area under the bi‐Weibull ROC curve. Pharmaceutical Statistics, 24(5), e70038. https://doi.org/10.1002/pst.70038
bigamma Bi-Gamma ROC CurveTo apply this method, set method = "bigamma" in the
AUC() function. The option
ci_method = "bootstrap" refers to the computation of the
parametric bootstrap percentile interval, which is the only available
option for Bayesian Bootstrap inference.
Usage
AUC(
data = data,
method = "bigamma",
ci = TRUE,
ci_method = "bootstrap",
siglevel = 0.05,
boot_iter = 1000
)Description
In the bi-Gamma ROC model, both populations are assumed to follow independent Gamma distributions but with potentially different shape and scale parameters, allowing for flexible modeling of skewness and dispersion. Let
where \(k_1, k_2\) are shape parameters and \(\theta_1, \theta_2\) are scale parameters. The probability density functions (PDFs) are:
\[ f(x; k_1, \theta_1) = \frac{1}{\Gamma(k_1)\theta_1^{k_1}}x^{k_1-1} e^{-x / \theta_1}, \quad x > 0, \] \[ g(y; k_2, \theta_2) = \frac{1}{\Gamma(k_2)\theta_2^{k_2}}y^{k_2-1} e^{-y / \theta_2}, \quad y > 0. \]
The bi-Gamma ROC curve is given by
\[ ROC_{\text{Big}}(t) = 1 - \frac{\gamma\left(k_2, \frac{k_1}{\theta_2}\gamma^{-1}(k_1, 1 - t)\right)}{\Gamma(k_2)}, \] where \(\gamma^{-1}(a, \cdot)\) is the inverse lower incomplete Gamma function.
The area under the bi-Gamma ROC curve \((AUC_{\text{Big}})\) can be expressed as
\[ \label{AUC_gam_F} AUC_{\text{Big}} = F_F\left(\frac{k_2 \theta_2}{k_1 \theta_1}; , 2k_1, 2k_2\right), \] where \(F_F(\cdot; 2k_1, 2k_2)\) denotes the CDF of an F-distributed random variable with \(2k_1\) and \(2k_2\) degrees of freedom.
The model parameters \((k_1, \theta_1, k_2,
\theta_2)\) are estimated by maximum likelihood
estimation (MLE) based on independent samples from the
non-diseased and diseased groups. The parametric percentile
bootstrap (ci_method = "bootstrap") is recommended
for constructing percentile-based confidence intervals, especially when
sample sizes are small-moderate or data deviate from ideal Gamma
assumptions.
The bi-Gamma ROC model is a parametric ROC framework and is suitable for data that are positively skewed or have heavy right tails, characteristics commonly observed in biomedical and reliability studies.
Example
# Load formatted dataset
data(DMDmodified)
# Compute bi-Gamma AUC summary and ROC plot
auc <- AUC(
data = DMDmodified,
method = "gamma",
ci = TRUE,
ci_method = "bootstrap",
siglevel = 0.05,
boot_iter = 1000
)
# Display AUC summary
auc$summary
# Display bi-Gamma ROC plot
auc$plotReferences
Dorfman, D. D., Berbaum, K. S., & Metz, C. E. (1997). Proper receiver operating characteristic analysis: The bigamma model. Academic Radiology, 4, 138–149. https://doi.org/10.1016/s1076-6332(97)80013-x
Hussain, E. (2012). The Bi-Gamma ROC Curve in a Straightforward Manner. Journal of Basic and Applied Sciences, 8(2). http://dx.doi.org/10.6000/1927-5129.2012.08.02.09
Guo, B. (2015). On the effect of improperness of binormal ROC curves for estimating full area under the curve. PhD Thesis, University of Pittsburgh. http://d-scholarship.pitt.edu/id/eprint/23590
lehmann Semiparametric ROC Curve under the
Lehmann ModelThe option ci_method = "ple": refers to the computation
of the confidence interval based on partial likelihood-based method (via
proportional hazards model), which is the only available option for
Lehmann model.
Usage
AUC(
data = data,
method = "lehmann",
ci = TRUE,
ci_method = "ple",
siglevel = 0.05,
boot_iter = 1000
)Description
The Lehmann model provides a semiparametric framework for ROC curve estimation that assumes a simple power relationship between the survivor functions of the diseased and non-diseased populations:
\[ \overline{G}(t) = [\overline{F}(t)]^{\delta}, \qquad 0 < \delta \le 1, \] where \(\overline{F}(t)\) and \(\overline{G}(t)\) are the survivor functions of the biomarker for the non-diseased and diseased groups, respectively, and \(\delta\) is a single diagnostic accuracy parameter. Smaller values of \(\delta\) correspond to stronger discriminatory ability of the biomarker.
Under this assumption, the ROC curve and its corresponding area have simple analytical forms:
\[ ROC_{\text{le}}(t) = t^{\delta}, \qquad t \in [0, 1], \] and \[ AUC_{\text{le}} = \int_0^1 t^{\delta} dt = \frac{1}{1 + \delta}. \]
This produces a smooth, monotonic ROC curve that is both interpretable and computationally efficient. The single parameter \(\delta\) controls the shape of the ROC and directly determines the AUC.
The Lehmann assumption is equivalent to the proportional hazards (PH) formulation in survival analysis, where the ratio of hazard functions for the diseased and non-diseased groups is constant:
\[ \frac{h_Y(t)}{h_X(t)} = e^{\beta}. \]
Here, the Lehmann parameter and PH coefficient are linked by \(\delta = e^{\beta}\). Thus, estimation of
\(\hat{\beta}\) proceeds by fitting a
Cox proportional hazards model using survival package and
consequently, \[
\hat{\delta} = e^{\hat{\beta}}, \quad \widehat{AUC}_{\text{le}} =
\frac{1}{1 + \hat{\delta}}.
\]
The confidence interval for \(AUC_{\text{le}}\) can then be derived using the delta method, based on the estimated variance of \(\hat{\beta}\).
This method naturally accommodates covariates through the Cox proportional hazards framework, allowing for adjusted ROC analysis that accounts for additional variables. The parameter \(\delta = e^{\beta}\) offers direct clinical interpretability as a hazard ratio, making the results meaningful in applied biomedical contexts. It is computationally simple, relying on standard Cox regression routines without requiring complex optimization procedures or Bayesian sampling. The approach is also robust, maintaining statistical efficiency without imposing distributional assumptions on the biomarker data.
The Lehmann ROC model bridges parametric and nonparametric approaches by imposing a simple, interpretable relationship between sensitivity and specificity while leaving the biomarker distributions unspecified. This balance of robustness, flexibility, and efficiency makes it particularly suitable for heterogeneous biomedical datasets, especially where biomarkers are influenced by covariates or measured repeatedly over time.
Example
# Load formatted dataset
data(DMDmodified)
# Compute semiparametric ROC under the Lehmann assumption
auc <- AUC(
data = DMDmodified,
method = "lehmann",
ci = TRUE,
ci_method = "ple",
siglevel = 0.05,
boot_iter = 1000
)
# Display AUC summary
auc$summary
# Display Lehmann ROC plot
auc$plotReferences
Lehmann, E. L. (1953). The power of rank tests. Annals of Mathematical Statistics, 24, 23–43. https://doi.org/10.1214/aoms/1177729080
Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society: Series B, 34, 187–220. https://www.jstor.org/stable/2985181
Cox, D. R. (1975). Partial likelihood. Biometrika, 62, 269–276. https://doi.org/10.1093/biomet/62.2.269
Gönen, M., & Heller, G. (2010). Lehmann family of ROC curves. Medical Decision Making, 30(4), 509–517. https://doi.org/10.1177/0272989X09360067
Ghebremichael, M., & Habtemicael, S. (2018). Effect of tuberculosis on immune restoration among HIV-infected patients receiving antiretroviral therapy. Journal of Applied Statistics, 45(13), 2357–2364. https://doi.org/10.1080/02664763.2017.1420758
Ghebremichael, M., & Michael, H. (2024). Comparison of the binormal and Lehmann receiver operating characteristic curves. Communications in Statistics—Simulation and Computation, 53(2), 772–785. https://doi.org/10.1080/03610918.2022.2032159
Ghebremichael, M., et al. (2019). Comparing the diagnostic accuracy of CD4+ T-lymphocyte count and percent as surrogate markers of pediatric HIV disease. Journal of Mathematics and Statistics, 15(1), 55–64. https://doi.org/10.3844/jmssp.2019.55.64
Jokiel-Rokita, A., & Topolnicki, R. (2020). Estimation of the ROC curve from the Lehmann family. Computational Statistics & Data Analysis, 142, 106820. https://doi.org/10.1016/j.csda.2019.106820
bayesbiweibull Bayesian Bi-Weibull ROC CurveTo apply this method, set method = "bayesbiweibull" in
the AUC() function. The option
ci_method = "mcmc" refers to the computation of the
Bayesian Bootstrap credible interval, which is the only available option
for Bayesian Bootstrap inference. The boot_iter option is
inactive for this method, as the number of MCMC iterations is fixed at
11,000 (comprising 1000 burn-in and 10000 retained samples).
Usage
AUC(
data = data,
method = "bayesbiweibull",
ci = TRUE,
ci_method = "mcmc",
siglevel = 0.05,
boot_iter = 1000
)Description
The Bayesian Bi-Weibull ROC curve is a parametric Bayesian extension of the constant-shape Bi-Weibull model. In the Bayesian paradigm, the unknown model parameters are treated as random variables with prior distributions that reflect prior knowledge or beliefs about their possible values. These priors are updated with the observed data through Bayes’ theorem, yielding posterior distributions for both the ROC curve and its area under the curve (AUC). Posterior summaries (such as the posterior mean or credible intervals) serve as Bayesian estimates of ROC quantities.
As described in the frequentist Bi-Weibull section, we assume the biomarker values for the non-diseased and diseased populations follow Weibull distributions with a shared shape parameter \(\alpha\), but distinct scale parameters \(\theta_0\) and \(\theta_1\). The parameters \(\theta_0\), \(\theta_1\), and \(\alpha\) are treated as random variables with the following prior distributions:
\[ \theta_j \sim \mathrm{IG}(a_j, b_j), \quad j = 0, 1, \] \[ \alpha \sim \mathrm{Gamma}(k, \beta), \] where \(a_j, b_j, k, \beta > 0\). Here, \(\mathrm{IG}(a, b)\) denotes the inverse-gamma distribution with density \[ \pi_{1j}(\theta_j) = \frac{b_j^{a_j}}{\Gamma(a_j)} , \theta_j^{-(a_j + 1)} e^{-b_j / \theta_j}, \] and all priors are assumed independent.
Given data \(x_1, \dots, x_m\) (controls) and \(y_1, \dots, y_n\) (cases), the likelihood function is:
\[ L(\alpha, \theta_0, \theta_1 \mid \text{data}) \propto \alpha^{m+n} , \theta_0^{-m} , \theta_1^{-n} \left( \prod_{i=1}^{m} x_i^{\alpha - 1} \right) \left( \prod_{j=1}^{n} y_j^{\alpha - 1} \right) \exp\left(-\frac{\sum_{i=1}^{m} x_i^{\alpha}}{\theta_0}\right) \exp\left(-\frac{\sum_{j=1}^{n} y_j^{\alpha}}{\theta_1}\right). \]
The posterior distribution is proportional to the product of the likelihood and priors:
\[ p(\alpha, \theta_0, \theta_1 \mid \text{data}) \propto L(\alpha, \theta_0, \theta_1 \mid \text{data}) , \pi_{10}(\theta_0)\pi_{11}(\theta_1)\pi(\alpha). \]
Because the posterior distribution cannot be evaluated analytically, parameter estimation is performed using Markov Chain Monte Carlo (MCMC)-typically through Gibbs sampling with a Metropolis–Hastings step for \(\alpha\). At each iteration (s), new samples \(\theta_0^{(s)}\), \(\theta_1^{(s)}\), and \(\alpha^{(s)}\) are drawn from their respective conditional posterior distributions.
From these samples, the AUC for iteration (s) is computed as:
\[ AUC^{(s)} = \frac{\theta_1^{(s)}}{\theta_0^{(s)} + \theta_1^{(s)}}. \]
The posterior mean AUC and its 95% highest posterior density (HPD) credible interval are then estimated as:
\[ \widehat{AUC}_{\text{Biw}}^{\text{Bayes}} = \frac{1}{S} \sum_{s=1}^{S} AUC^{(s)}, \quad CI_{95\%} = \mathrm{HPD}_{0.95}\{AUC^{(1)}, \dots, AUC^{(S)}\}. \]
For this implementation, 11,000 MCMC iterations were performed, discarding the first 1,000 iterations as burn-in and retaining the remaining 10,000 samples for posterior inference. The AU and its 95% HPD credible intervals were computed using non-informative priors, with \(a_1 = a_2 = b_1 = b_2 = 0\) and \(\alpha \sim \text{Gamma}(0.1, 1)\). Note that these priors for \(\theta_0\), \(\theta_1\), and \(\alpha\) are non-proper, meaning they do not integrate to one but still yield proper posteriors when combined with the likelihood.
The Bayesian Bi-Weibull ROC approach offers a
flexible and robust framework for ROC analysis by combining prior
knowledge with observed data. It generates full posterior distributions
for the ROC curve and AUC through MCMC simulation, providing direct
quantification of uncertainty without relying on asymptotic
approximations. Averaging over posterior draws yields smooth and stable
ROC estimates, even in small samples. The 95% highest posterior density
(HPD) credible intervals, computed using the
HDInterval package in R.
Example
# Load formatted dataset
data(DMDmodified)
# Bayesian estimation of the Bi-Weibull AUC and ROC
auc <- AUC(
data = DMDmodified,
method = "bayesbiweibull",
ci = TRUE,
ci_method = "mcmc",
siglevel = 0.05,
boot_iter = 1000
)
# Display Bayesian AUC summary
auc$summary
# Display posterior ROC plot
auc$plotReferences
Kundu, D., & Gupta, R. D. (2006). Estimation of \(P[Y < X]\) for Weibull distributions. IEEE Transactions on Reliability, 55(2), 270–280. https://doi.org/10.1109/TR.2006.874918
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2020). Bayesian Data Analysis (4th ed.). CRC Press. https://doi.org/10.1201/9780429258480
Meredith, M., & Kruschke, J. K. (2022). HDInterval: Highest (Posterior) Density Intervals. R package version 0.2.4. Available at: https://CRAN.R-project.org/package=HDInterval
dpm Bayesian Semiparametric ROC (Dirichlet
Process Mixture of Normals)To apply this method, set method = "dpm" in the
AUC() function. The option ci_method = "dpm"
refers to the computation of the Bayesian Bootstrap credible interval,
which is the only available option for Bayesian Bootstrap inference. The
boot_iter= option is inactive for this method, as the
number of MCMC iterations is fixed at 500 (comprising 100 burn-in and
400 retained samples).
Usage
Description
The Bayesian semiparametric ROC approach combines the interpretability of parametric models with the flexibility of nonparametric inference by assigning infinite-dimensional priors, such as Dirichlet processes (DPs), to the biomarker distributions. This allows the model to capture complex data features — including multimodality, skewness, and heterogeneity — that cannot be adequately represented by simple single-component parametric models.
A key implementation of this framework is the Dirichlet Process Mixture (DPM) of normal distributions proposed by Erkanli et al. (2006). In this model, the biomarker distributions for the non-diseased and diseased populations are represented as mixtures of normal components, with random mixing distributions drawn from independent DPs.
Let \({X}_{i=1}^m \sim F\) and \({Y}_{i=1}^n \sim G\) denote biomarker measurements from the non-diseased and diseased groups, respectively. Then,
\[ F(x) = \sum_{l=1}^{L} p_l \Phi(x \mid \mu_l, \sigma_l^2), \qquad G(y) = \sum_{l=1}^{L'} p_l' \Phi(y \mid \mu_l', \sigma_l'^2), \] where \(\Phi(\cdot \mid \mu, \sigma^2)\) is the normal cumulative distribution function, and \(L, L'\) are truncation levels that approximate the infinite Dirichlet process mixture.
The mixture weights \({p_l}\) follow a stick-breaking process:
\[ p_l = \begin{cases} R_1, & l = 1, \\ R_l \prod_{r=1}^{l-1}(1 - R_r), & l = 2, \dots, L - 1, \\ \prod_{r=1}^{L-1}(1 - R_r), & l = L, \end{cases} \] with a similar construction for \({p_l'}\) in the diseased group. The priors are specified as follows:
\[ R_r \sim \mathrm{Beta}(1, \alpha), \quad \alpha \sim \mathrm{Gamma}(a, b), \] \[ \mu_l \sim N(m_0, S_0), \quad \sigma_l^{-2} \sim \mathrm{Gamma}(c, d), \] where \(\alpha\) controls the number of mixture components and the model complexity. This finite truncation approach, following Ishwaran and James (2002), yields a computationally tractable approximation to the Dirichlet process.
Given \(F\) and \(G\), the ROC curve is defined as
\[ ROC_{\text{DPM}}(p) = 1 - G(F^{-1}(1 - p)), \qquad 0 < p < 1, \] and the corresponding AUC is
\[ AUC_{\text{DPM}} = \int_0^1 [1 - G(F^{-1}(1 - p))] , dp. \]
Both quantities are evaluated numerically at each iteration of the MCMC algorithm using the current mixture parameter draws.
Posterior inference for the ROC curve and AUC is obtained using Gibbs sampling with Metropolis–Hastings updates when needed. At each iteration, the algorithm updates mixture component parameters and stick-breaking weights for both groups, computes the corresponding ROC curve and AUC, and stores these posterior samples. The posterior mean ROC curve and 95% credible intervals are then obtained by averaging across MCMC iterations.
In this implementation, the semiparametric Bayesian estimator \(\widehat{\text{AUC}}_{\text{DPM}}\) is computed using posterior samples from the Dirichlet process mixture model. Weakly informative priors are used to ensure flexibility and stability: \(\alpha \sim \text{Gamma}(1, 1)\), \(\mu_l \sim \mathcal{N}(0, 100)\), and \(\tau_l = \sigma_l^{-2} \sim \text{Gamma}(0.1, 0.1)\), providing vague yet regularized estimates. Stick-breaking variables follow \(R_r \sim \text{Beta}(1, \alpha)\), and both groups are modeled identically. The truncation level is fixed at \(L = L' = 10\), which balances computational efficiency with representational power. Posterior inference was based on 500 MCMC iterations (100 burn-in and 400 retained samples), yielding stable AUC estimates across replications.
Although \(\widehat{\text{AUC}}_{\text{DPM}}\) often produced narrower credible intervals than competing methods, it was somewhat sensitive to prior choices in small-sample scenarios. Despite a higher computational cost, this approach remains highly flexible and robust—particularly valuable when the biomarker distributions are skewed, heavy-tailed, or multimodal.
Example
# Load formatted dataset
data(DMDmodified)
# Bayesian semiparametric ROC using Dirichlet process mixture of normals
auc <- AUC(
data = DMDmodified,
method = "dpm",
ci = TRUE,
ci_method = "dpm",
siglevel = 0.05,
boot_iter = 1000
)
# Display AUC summary (posterior mean and credible interval)
auc$summary
# Display DPM-based ROC plot (posterior mean ROC with bands)
auc$plotReferences
Erkanli, A., Sung, L., & Stamey, J. D. (2006). Bayesian semi-parametric ROC curve estimation. Statistics in Medicine, 25, 3905–3928. https://doi.org/10.1002/sim.2496
Ishwaran, H., & James, L. F. (2002). Approximate Dirichlet process computing in finite normal mixtures. Journal of Computational and Graphical Statistics, 11(3), 508–532. https://doi.org/10.1198/106186002411
BB Bayesian Bootstrap ROC CurveTo apply this method, set method = "BB" in the
AUC() function. The option ci_method = "bb"
refers to the computation of the Bayesian Bootstrap credible interval,
which is the only available option for Bayesian Bootstrap inference.
boot_iter= refers to the number of bootstrap
replications.
Usage
Description
The Bayesian Bootstrap (BB), introduced by Rubin (1981), provides a fully nonparametric Bayesian method for estimating smooth ROC curves and AUC values. Unlike classical bootstrapping, which resamples data points, BB assigns random Dirichlet weights to observed data, generating a posterior distribution over ROC curves that reflects uncertainty without relying on large-sample approximations or bandwidth selection.
In empirical ROC estimation, each observation contributes equally (weights of \(1/m\) for controls and \(1/n\) for cases). BB replaces these fixed weights with random draws from a Dirichlet(1, …, 1) distribution. Averaging across replicates yields a smooth posterior mean ROC curve, and variation among replicates quantifies uncertainty in AUC.
Let \(X = (X_1, \dots, X_m)\) be controls and \(Y = (Y_1, \dots, Y_n)\) be cases. For each bootstrap replicate \(b = 1, \dots, B\):
Draw \((p_1, \dots, p_m) \sim
\text{Dirichlet}(1, \dots, 1)\), or equivalently \(p_i = w_i / \sum_j w_j\) with \(w_i \sim \text{Exponential}(1)\).
Define weighted empirical CDF:
\[
F^{(b)}(u) = \sum_{i=1}^{m} p_i \mathbf{1}(X_i \le u)
\]
Compute placement values:
\[
U_j^{(b)} = 1 - F^{(b)}(Y_j), \quad j = 1, \dots, n
\]
Draw \((q_1, \dots, q_n) \sim
\text{Dirichlet}(1, \dots, 1)\)
Construct ROC curve:
\[
ROC_{m,n}^{(b)}(t) = \sum_{j=1}^{n} q_j \mathbf{1}(U_j^{(b)} \le t)
\]
Estimate AUC numerically:
\[
AUC^{(b)} = \int_0^1 ROC_{m,n}^{(b)}(t) \, dt
\]
Combine the results from all \(B\) replicates to produce the posterior mean estimates: \[ \widehat{\text{ROC}}_{\text{BB}}(t) = \frac{1}{B} \sum_{b=1}^B \text{ROC}_{m,n}^{(b)}(t), \quad \widehat{\text{AUC}}_{\text{BB}} = \frac{1}{B} \sum_{b=1}^B \text{AUC}^{(b)}. \]
Posterior variance:
\[
\text{Var}(\widehat{AUC}_{\text{BB}}) = \frac{1}{B - 1} \sum_{b=1}^{B}
\left(AUC^{(b)} - \widehat{AUC}_{\text{BB}}\right)^2
\]
A \(100(1-\alpha)\%\) credible interval for the AUC is obtained by taking the \(\alpha/2\) and \(1-\alpha/2\) quantiles of the empirical distribution \(\{\text{AUC}^{(1)}, \dots, \text{AUC}^{(B)}\}\).
The Bayesian Bootstrap generates smooth ROC curves by averaging over random weighted distributions, avoiding kernel smoothing or parametric assumptions. It is especially useful for small or irregular samples, offering robust, data-driven inference with direct posterior uncertainty quantification. The method is also computationally efficient, relying on simple resampling rather than full MCMC.
Example
# Load formatted dataset
data(DMDmodified)
# Bayesian Bootstrap ROC and AUC estimation
auc <- AUC(
data = DMDmodified,
method = "BB",
ci = TRUE,
ci_method = "bb",
siglevel = 0.05,
boot_iter = 1000
)
# Display posterior AUC summary and credible interval
auc$summary
# Display smooth Bayesian Bootstrap ROC plot
auc$plotReferences
Rubin, D. B. (1981). The Bayesian bootstrap. Annals of Statistics, 9, 130–134. https://doi.org/10.1214/aos/1176345338
Gu, Y., Ghosal, S., & Roy, A. (2008). Bayesian bootstrap for ROC curve estimation. Bayesian Analysis, 3(3), 659–676. https://doi.org/10.1002/sim.3366