Package ‘ROCModels’

December 09, 2025

Type Package
Title ROCModels: ROC Models and AUC Estimation for diffrent models
Version 1.0.0
Date 2025-12-09
Encoding UTF-8
Depends R (>= 2.14)
Imports ggplot2, kedd, dplyr, survival, nleqslv, HDInterval, MASS, doParallel, foreach, pbivnorm, nor1mix, parallel
Description The receiver operating characteristic (ROC) curve is one of the most widely used tools for evaluating diagnostic and prognostic biomarkers across diverse scientific fields, particularly in medicine. Despite its ubiquity, ROC estimation and testing methods differ substantially in their assumptions and resulting curve properties. This package provides a unified framework for constructing, visualizing, and comparing parametric, nonparametric, semiparametric, and Bayesian ROC curves. ‘ROCModels’ helps researchers identify and implement ROC inference methods most suitable for their data.
License GPL
NeedsCompilation yes
Author Ruhul Ali Khan [cre, aut] (ORCID: https://orcid.org/0000-0003-1173-8345),
    Raja Nakka [aut],
    Musie Ghebremichael [aut]
Maintainer Ruhul Ali Khan <rkhan23@mgh.harvard.edu>, <ruhulali.khan@gmail.com>
Repository CRAN
Date/Publication 2025-12-09 13:58:45 UTC

Contents


ROCModels-package          ROCModels


Description

The receiver operating characteristic (ROC) curve is a fundamental tool for evaluating diagnostic and prognostic biomarkers, particularly in medical research. However, ROC estimation methods differ substantially in their underlying assumptions, statistical properties, and inferential objectives.

The ROCModels package offers a unified framework for constructing, visualizing, and comparing ROC curves using a wide range of modeling approaches:

Nonparametric Methods

Parametric Methods

Semiparametric Method

Bayesian Methods

Except for the empirical and order-restricted estimators, all other methods produce smooth ROC curves. This package helps researchers identify and implement inference methods most appropriate for their data, promoting transparent, reproducible, and methodologically rigorous ROC analysis. Alonzo, T. A., and Pepe, M. S. (2002) <doi: 10.1093/biostatistics/3.3.421>, Andrews, D. F., and Herzberg, A. M. (1985) <doi: 10.1007/978-1-4612-5098-2>, Bamber, D. (1975) <doi: 10.1016/0022-2496(75)90001-2>, Cox, D. R. (1972) <doi: 10.1111/j.2517-6161.1972>, Cox, D. R. (1975) <doi: 10.1093/biomet/62.2.269>, DeLong, E. R., DeLong, D. M., and Clarke-Pearson, D. L. (1988) <doi: 10.2307/2531595>, Dorfman, D. D., and Alf, E. (1969) <doi: 10.1016/0022-2496(69)90019-4>, Dorfman, D. D., Berbaum, K. S., and Metz, C. E. (1997) <doi: 10.1016/s1076-6332(97)80013-x>, Erkanli, A., Sung, L., and Stamey, J. D. (2006) <doi: 10.1002/sim.2496>, Faraggi, D., and Reiser, B. (2002) <doi: 10.1002/sim.1228>, Ghebremichael, M., and Habtemicael, S. (2018) <doi: 10.1080/02664763.2017.1420758>, Ghebremichael, M., and Michael, H. (2024) <doi: 10.1080/03610918.2022.2032159>, Ghebremichael, M., et al. (2019) <doi: 10.3844/jmssp.2019.55.64>, Gönen, M., and Heller, G. (2010) <doi: 10.1177/0272989X09360067>, Gopalakrishnan, V., et al. (2020) <doi: 10.1186/s12879-020-05458-w>, Green, D. M., and Swets, J. A. (1966) <doi: 10.117959845>, Gu, J., and Ghosal, S. (2009) <doi: 10.1016/j.jspi.2008.09.014>, Gu, Y., Ghosal, S., and Roy, A. (2008) <doi: 10.1002/sim.3366>, Guidoum, A. C. (2020) <doi: 10.32614/CRAN.package.kedd>, <doi: 10.48550/arXiv.2012.06102>, Guo, B. (2015) <doi: 10.1184/rid/d-scholarship/23590>, Hanley, J. A., and McNeil, B. J. (1982) <doi: 10.1148/radiology.143.1.7063747>, Hsieh, F., and Turnbull, B. W. (1996) <doi: 10.1214/aos/1033066197>, Hussain, E. (2012) <doi: 10.6000/1927-5129.2012.08.02.09>, Ishwaran, H., and James, L. F. (2002) <doi: 10.1198/106186002411>, Jokiel-Rokita, A., and Topolnicki, R. (2020) <doi: 10.1016/j.csda.2019.106820>, Krzanowski, W. J., and Hand, D. J. (2009) <doi: 10.1201/9781439800225>, Kundu, D., and Gupta, R. D. (2006) <doi: 10.1109/TR.2006.874918>, Lloyd, C. J. (1998) <doi: 10.1080/01621459.1998.10473797>, Lehmann, E. L. (1953) <doi: 10.1214/aoms/1177729080>, Metz, C. E., Herman, B. A., and Shen, J. H. (1998) <doi: 10.1002/(sici)1097-0258(19980515)17:9<1033::aid-sim784>3.0.co;2-z>, Pepe, M. S. (2003) <doi: 10.1093/oso/9780198509844.001.0001>, Pundir, S., and Amala, R. (2014) <doi: 10.22237/jmasm/1398917940>, Silverman, B. W. (2018) <doi: 10.1201/9781315140919>, Yeo, I. K., and Johnson, R. A. (2000) <doi: 10.1093/biomet/87.4.954>, Zhou, X. H., McClish, D. K., and Obuchowski, N. A. (2009) <doi: 10.1002/9780470906514>, Zou, K. H., Hall, W. J., and Shapiro, D. E. (1997) <doi: 10.1002/(sici)1097-0258(19971015)16:19<2143::aid-sim655>3.0.co;2-3>.

Details

The core functionality of the ROCModels package centers around the AUC() function, which computes the area under the ROC curve (AUC), its confidence interval (CI), and generates the corresponding ROC curve.

Users can choose from a wide variety of modeling approaches, as outlined in the description above. These include parametric, nonparametric, semiparametric, and Bayesian methods. Within each modeling framework, the package supports multiple options for constructing ROC curves and selecting appropriate confidence interval techniques.

Subsequent sections of this documentation provide detailed mathematical formulations, implementation specifications, and code examples for each modeling approach and supported CI method.

This flexibility allows researchers to tailor ROC estimation and inference to the specific characteristics of their data and scientific objectives, promoting transparent, reproducible, and methodologically sound analysis.

Authors Ruhul Ali Khan, Raja Nakka, Musie Ghebremichael. Maintainer: Ruhul Ali Khan <>, <>

Abbreviations

The following abbreviations are employed extensively in this package:

References

Alonzo, T. A., & Pepe, M. S. (2002). Distribution-free ROC analysis using binary regression techniques. Biostatistics, 3(3), 421–432. https://doi.org/10.1093/biostatistics/3.3.421

Andrews, D. F., & Herzberg, A. M. (1985). Data: A Collection of Problems from Many Fields for the Student and Research Worker. Springer-Verlag, Berlin. https://doi.org/10.1007/978-1-4612-5098-2

Bamber, D. (1975). The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of Mathematical Psychology, 12(4), 387–415. https://doi.org/10.1016/0022-2496(75)90001-2

Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society: Series B, 34, 187–220. https://doi.org/10.1111/j.2517-6161.1972

Cox, D. R. (1975). Partial likelihood. Biometrika, 62, 269–276. https://doi.org/10.1093/biomet/62.2.269

Daniel, F., Ooi, H., Calaway, R., Microsoft, & Weston, S. (2022). doParallel: Foreach Parallel Adaptor for the ‘parallel’ Package. R package version 1.0.17. https://CRAN.R-project.org/package=doParallel

Daniel, F., Ooi, H., Calaway, R., Microsoft, & Weston, S. (2022). foreach: Provides Foreach Looping Construct for R. R package version 1.5.2. https://CRAN.R-project.org/package=foreach

DeLong, E. R., DeLong, D. M., & Clarke-Pearson, D. L. (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics, 44(3), 837–845. https://doi.org/10.2307/2531595

Dorfman, D. D., & Alf, E. (1969). Maximum-likelihood estimation of parameters of signal-detection theory and determination of confidence intervals-rating method data. Journal of Mathematical Psychology, 6, 487–496. https://doi.org/10.1016/0022-2496(69)90019-4

Dorfman, D. D., Berbaum, K. S., & Metz, C. E. (1997). Proper receiver operating characteristic analysis: The bigamma model. Academic Radiology, 4, 138–149. https://doi.org/10.1016/s1076-6332(97)80013-x

Erkanli, A., Sung, L., & Stamey, J. D. (2006). Bayesian semi-parametric ROC curve estimation. Statistics in Medicine, 25, 3905–3928. https://doi.org/10.1002/sim.2496

Faraggi, D., & Reiser, B. (2002). Estimation of the area under the ROC curve. Statistics in Medicine, 21, 3093–3106. https://doi.org/10.1002/sim.1228

Ghebremichael, M., & Habtemicael, S. (2018). Effect of tuberculosis on immune restoration among HIV-infected patients receiving antiretroviral therapy. Journal of Applied Statistics, 45(13), 2357–2364. https://doi.org/10.1080/02664763.2017.1420758

Ghebremichael, M., & Michael, H. (2024). Comparison of the binormal and Lehmann receiver operating characteristic curves. Communications in Statistics—Simulation and Computation, 53(2), 772–785. https://doi.org/10.1080/03610918.2022.2032159

Ghebremichael, M., et al. (2019). Comparing the diagnostic accuracy of CD4+ T-lymphocyte count and percent as surrogate markers of pediatric HIV disease. Journal of Mathematics and Statistics, 15(1), 55–64. https://doi.org/10.3844/jmssp.2019.55.64

Gönen, M., & Heller, G. (2010). Lehmann family of ROC curves. Medical Decision Making, 30(4), 509–517. https://doi.org/10.1177/0272989X09360067

Gopalakrishnan, V., et al. (2020). Pre-HAART CD4+ T-lymphocytes as biomarkers of post-HAART immune recovery in HIV-infected children with or without TB co-infection. BMC Infectious Diseases, 20, 1–8. https://doi.org/10.1186/s12879-020-05458-w

Green, D. M., & Swets, J. A. (1966). Signal Detection Theory and Psychophysics, Vol. 1. Wiley, New York. https://www.semanticscholar.org/paper/b11fa6f41f9bbc17bfe1b94e857ee76b6f0bd7f5

Gu, J., & Ghosal, S. (2009). Bayesian ROC curve estimation under binormality using a rank likelihood. Journal of Statistical Planning and Inference, 139(6), 2076–2083. https://doi.org/10.1016/j.jspi.2008.09.014

Gu, Y., Ghosal, S., & Roy, A. (2008). Bayesian bootstrap for ROC curve estimation. Bayesian Analysis, 3(3), 659–676. https://doi.org/10.1002/sim.3366

Guidoum, A. C. (2020). kedd: Kernel Estimator and Bandwidth Selection for Density and Its Derivatives. R package.
CRAN DOI: 10.32614/CRAN.package.kedd
arXiv preprint: https://doi.org/10.48550/arXiv.2012.06102

Guo, B. (2015). On the effect of improperness of binormal ROC curves for estimating full area under the curve. PhD Thesis, University of Pittsburgh. http://d-scholarship.pitt.edu/id/eprint/23590

Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1), 29–36. https://doi.org/10.1148/radiology.143.1.7063747

Hasselman, B. (2022). nleqslv: Solve Systems of Nonlinear Equations. R package version 3.3.5. https://CRAN.R-project.org/package=nleqslv

Hsieh, F., & Turnbull, B. W. (1996). Nonparametric and semiparametric estimation of the ROC curve. Annals of Statistics, 24(1), 25–40. https://doi.org/10.1214/aos/1033066197

Hussain, E. (2012). The Bi-Gamma ROC Curve in a Straightforward Manner. Journal of Basic and Applied Sciences, 8(2). https://doi.org/10.6000/1927-5129.2012.08.02.09

Ishwaran, H., & James, L. F. (2002). Approximate Dirichlet process computing in finite normal mixtures. Journal of Computational and Graphical Statistics, 11(3), 508–532. https://doi.org/10.1198/106186002411

Jokiel-Rokita, A., & Topolnicki, R. (2020). Estimation of the ROC curve from the Lehmann family. Computational Statistics & Data Analysis, 142, 106820. https://doi.org/10.1016/j.csda.2019.106820

Kenkel. B., Genz, A. (2015). pbivnorm: Vectorized Computation of the Bivariate Normal Probabilities. R package version 0.6.0. https://CRAN.R-project.org/package=pbivnorm

Krzanowski, W. J., & Hand, D. J. (2009). ROC Curves for Continuous Data. CRC Press. https://doi.org/10.1201/9781439800225

Kundu, D., & Gupta, R. D. (2006). Estimation of \(P[Y < X]\) for Weibull distributions. IEEE Transactions on Reliability, 55(2), 270–280. https://doi.org/10.1109/TR.2006.874918

Lloyd, C. J. (1998). Using smoothed receiver operating characteristic curves to summarize and compare diagnostic systems. Journal of the American Statistical Association, 93(444), 1356–1364. https://doi.org/10.1080/01621459.1998.10473797

Lehmann, E. L. (1953). The power of rank tests. Annals of Mathematical Statistics, 24, 23–43. https://doi.org/10.1214/aoms/1177729080

Maechler, M. (2024). nor1mix: Normal Mixture Models with One Unknown Component. R package version 1.2-3. https://CRAN.R-project.org/package=nor1mix

Metz, C. E., Herman, B. A., & Shen, J. H. (1998). Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data. Statistics in Medicine, 17, 1033–1053. https://doi.org/10.1002/(sici)1097-0258(19980515)17:9%3C1033::aid-sim784%3E3.0.co;2-z

Ngumbang, J., Meredith, M., & Kruschke, J. K. (2023). HDInterval: Highest (Posterior) Density Intervals. R package version 0.2.5. https://CRAN.R-project.org/package=HDInterval

Pepe, M. S. (2003). The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford University Press. https://doi.org/10.1093/oso/9780198509844.001.0001

Pundir, S., & Amala, R. (2014). Evaluation of area under the constant shape bi-Weibull ROC curve. Journal of Modern Applied Statistical Methods, 13(1), 20. https://doi.org/10.22237/jmasm/1398917940

R Core Team (2023). parallel: Support for Parallel Computation in R. Part of R base distribution. https://stat.ethz.ch/R-manual/R-devel/library/parallel/doc/parallel.pdf

Silverman, B. W. (2018). Density Estimation for Statistics and Data Analysis. Chapman & Hall/CRC. https://doi.org/10.1201/9781315140919

Therneau, T. M. (2023). A Package for Survival Analysis in R. R package version 3.5-7. https://CRAN.R-project.org/package=survival

Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag, New York. ISBN 978-3-319-24277-4. https://ggplot2.tidyverse.org

Wickham, H., François, R., Henry, L., Müller, K. & Vaughan, D. (2023). dplyr: A Grammar of Data Manipulation. R package version 1.1.3. https://CRAN.R-project.org/package=dplyr

Yeo, I. K., & Johnson, R. A. (2000). A new family of power transformations to improve normality or symmetry. Biometrika, 87(4), 954–959. https://doi.org/10.1093/biomet/87.4.954

Zhou, X. H., McClish, D. K., & Obuchowski, N. A. (2009). Statistical Methods in Diagnostic Medicine. John Wiley & Sons. https://doi.org/10.1002/9780470906514

Zou, K. H., Hall, W. J., & Shapiro, D. E. (1997). Smooth nonparametric receiver operating characteristic (ROC) curves for continuous data. Statistics in Medicine, 16, 2143–2156. https://doi.org/10.1002/(sici)1097-0258(19971015)16:19%3C2143::aid-sim655%3E3.0.co;2-3

Installation

To install the ROCModels package, ensure your R session is connected to the internet. Then, run the following command in the R console:

install.packages("ROCModels") 

Then load the package in R:

library(ROCModels)

From this point, the examples below assume that ROCModels is loaded.


Data Format          Preparing Your Dataset for Use with ROCModels


Before using the package, it is essential to format your dataset according to the following guidelines.

The main function, AUC(), requires a data frame named data that contains exactly two columns:

These column names and coding conventions must be followed precisely to ensure compatibility with the package’s functions.


DMDmodified          Default Dataset


This package includes a built-in dataset to show immediate functionality. This dataset comprises 209 records from female individuals assessed for potential carrier status of Duchenne Muscular Dystrophy (DMD). Among them, 75 are identified as carriers and 134 as non-carriers. The dataset includes demographic information and biochemical measurements from four serum markers commonly used in clinical screening, which may show elevated levels in carriers despite the absence of symptoms.

Variables

For demonstration purposes, we have filtered the original dataset to focus specifically on the CK biomarker. In this modified version, CK is treated as biomarker, and the Class column serves as the status indicator with levels "0" and "1" ("0" denotes the normal (non‑diseased or controls) and "1" denotes the carrier cases. (diseased)).

This curated dataset is included in the package under the name DMDmodified for illustration purpose.This dataset follows the required data format for the package.

Reference


AUC          Compute the Area Under the ROC Curve and Plot the ROC Curve


Details

The AUC() function is the central component of the ROCModels package. It calculates the area under the ROC curve (AUC), estimates its confidence interval (CI), and produces the corresponding ROC plot.

Usage

AUC(
  data,
  method,
  ci        = TRUE,
  ci_method = "delong",
  siglevel  = 0.05,
  boot_iter = 1000
)

Arguments

Method names are case-sensitive and must match exactly. Each method is described in detail in later sections.

Not all CI methods are compatible with every model. Each method has a default CI approach, and compatibility will be discussed in the corresponding documentation sections.

Value

The primary behavior of the AUC() function is to:

  1. Display the AUC estimate
  2. Print one or more confidence intervals
  3. Return a ggplot object visualizing the ROC curve for the selected method

The exact structure of the returned object may vary depending on the chosen model. For typical usage:

Examples

# Import well formated dataset
data(DMDmodified) 
# Calculate AUC summary and ROC plot
auc <- AUC(
  data=DMDmodified,
  method = "empirical",
  ci        = TRUE
)
# Get the AUC summary
auc$summary
# Get the ROC plot
auc$plot

Next we describe, at a high level, the methods invoked by the method argument.


empirical          Empirical ROC


To apply this method, set method = "empirical" in the AUC() function. The following options are available for ci_method:

Usage

AUC(
  data      = data,
  method    = "empirical",
  ci        = TRUE,
  ci_method = "delong",
  siglevel  = 0.05,
  boot_iter = 1000
)

Description

The empirical ROC method is a fully nonparametric approach that makes no assumptions about the underlying distribution of the biomarker in either group. It is based on the Mann–Whitney U statistic, including adjustments for tied values, and provides a widely accepted estimate of both the ROC curve and the AUC.

The empirical ROC curve is defined as:

\[ ROC_{\text{emp}}(t) = 1 - G_n\left(F_m^{-1}(1 - t)\right), \quad \text{for } 0 < t < 1 \]

where \(F_m\) and \(G_n\) are empirical estimator. The corresponding AUC estimator is:

\[ \widehat{\mathrm{AUC}}_{\text{emp}} = \frac{1}{mn} \sum_{i=1}^m \sum_{j=1}^n \left[ I(X_i < Y_j) + \frac{1}{2} I(X_i = Y_j) \right] \]

where \(X_1, \dots, X_m\) are biomarker values from controls and \(Y_1, \dots, Y_n\) are from cases.

This method produces a jagged, step-like ROC curve. For small datasets, the curve may appear more irregular and less stable.

Example

# Load the formatted dataset
data(DMDmodified)

# Compute AUC summary and ROC plot
auc <- AUC(
  data      = DMDmodified,
  method    = "empirical",
  ci        = TRUE,
  ci_method = "delong",
  siglevel  = 0.05,
  boot_iter = 1000
)

# Display AUC summary
auc$summary

# Display ROC plot
auc$plot

References


order          Order-Restricted ROC


To apply this method, set method = "order" in the AUC() function. The following options are available for ci_method:

Usage

AUC(
  data      = data,
  method    = "order",
  ci        = TRUE,
  ci_method = "bootstrap",
  siglevel  = 0.05,
  boot_iter = 1000
)

Description

For a useful binary classifier, the true positive rate (TPR) should be greater than or equal to the false positive rate (FPR) across all thresholds. Geometrically, this places the ROC curve always on or above the diagonal, with the AUC lying between 0.5 (random allocation) and 1 (perfect classification). In practice, however, the empirical distribution functions \(F_m\) and \(G_n\) obtained from finite samples may not respect this order due to sampling variability. To address this, order-restricted ROC methods enforce \(\overline{G}_n(u) \ge \overline{F}_m(u)\) constraints that are biologically or theoretically reasonable, leading to smoother, more stable ROC curves and more accurate AUC estimates.

Under the order restriction framework, Jokiel-Rokita and Topolnicki (2020) extended the methodology to ROC estimation. Let \(F_m\) and \(G_n\) be the empirical distribution functions of the controls and cases, respectively. Define the empirical distribution based on the combined samples \[ P_{mn}(t) = \frac{m}{m+n} F_m(t) + \frac{n}{m+n} G_n(t), \] and the order-restricted estimators \[ F_{mn}(t) = \max\{ F_m(t), P_{mn}(t) \}, \qquad G_{mn}(t) = \min\{ G_n(t), P_{mn}(t) \}. \] The order-restricted ROC curve is then defined by \[ ROC_{\text{or}}(t) = 1 - G_{mn}\left( F_{mn}^{-1}(1 - t) \right), \quad 0 < t < 1, \] where \(F_{mn}^{-1}\) denotes inverse of \(F_{mn}\). The area under the order-restricted ROC curve is defined as \[ \widehat{\mathrm{AUC}}_{\text{or}} = \int_0^1 {ROC}_{\text{or}}(t), dt, \] where \({ROC}_{\text{or}}(t)\) is the estimated order-restricted ROC curve. Under suitable regularity conditions, the asymptotic distributions of \(\widehat{\mathrm{AUC}}_{\text{or}}\) and \(\widehat{\mathrm{AUC}}_{\text{emp}}\) are equivalent. Consequently, for large sample sizes, variance approximations developed for the empirical AUC—such as those by Hanley & McNeil or DeLong—can also be used for the order-restricted AUC as a large-sample approximation.

This is a nonparametric method produces a jagged, step-like ROC curve but little smoother than empirical ROC curve. This method is particularly useful when:

Example

# Load the formatted dataset
data(DMDmodified)

# Compute order-restricted AUC summary and ROC plot
auc <- AUC(
  data      = DMDmodified,
  method    = "order",
  ci        = TRUE,
  ci_method = "bootstrap",
  siglevel  = 0.05,
  boot_iter = 1000
)

# Display AUC summary
auc$summary

# Display order-restricted ROC plot
auc$plot

References


norm_silver, norm_ucv, bi_silver, bi_ucv       Kernel Density–Based Smooth ROC


To apply a kernel density–based ROC method, set method to one of the following options in the AUC() function:

Each combination defines both the kernel function \(K(\cdot)\) and the bandwidth selection rule used to smooth the ROC curve. The following options are available for ci_method:

Usage

AUC(
  data      = data,
  method    = "norm_silver",
  ci        = TRUE,
  ci_method = "bootstrap",
  siglevel  = 0.05,
  boot_iter = 1000
)

Description

To obtain a smoother nonparametric and more interpretable estimate, kernel density estimation (KDE) can be used to estimate the underlying distribution functions of the marker in each group. The resulting ROC curve is continuous and differentiable, offering both interpretability and visual smoothness.

Kernel Density Estimation

Let ( X_1, , X_m ) denote marker values from controls and ( Y_1, , Y_n ) from cases. The kernel density estimators are given by

\[ \hat{f}(x) = \frac{1}{m h_m} \sum_{i=1}^m K\left(\frac{x - X_i}{h_m}\right), \quad \hat{g}(x) = \frac{1}{n h_n} \sum_{i=1}^n K\left(\frac{x - Y_i}{h_n}\right), \] where \(h_m\) and \(h_n\) are the bandwidths controlling smoothness, and \(K(\cdot)\) is a kernel function that integrates to one.

The corresponding cumulative distribution estimators are

\[ \hat{F}(t) = \int_{-\infty}^{t} \hat{f}(x), dx, \qquad \hat{G}(t) = \int_{-\infty}^{t} \hat{g}(x), dx. \]

Then the kernel-smoothed ROC curve is defined as

\[ \label{kde_roc} \widehat{ROC}_{kde}(t) = 1 - \hat{G}\left(\hat{F}^{-1}(1 - t)\right), \quad 0 < t < 1, \] where \(\hat{F}^{-1}(t) = \inf{x : \hat{F}(x) \ge t}\).

Bandwidth and Kernel Selection

Choosing an appropriate bandwidth and kernel is crucial for balancing bias and variance:

Two bandwidth selection methods are available:

Available kernel functions include:

There is no universally optimal kernel, but the Gaussian and biweight kernels are widely used and perform robustly across diverse data conditions.

AUC Estimation

The area under the kernel-smoothed ROC curve is given by

\[ \label{auc_kde} \widehat{\mathrm{AUC}}_{kde} = \int_0^1 \widehat{ROC}_{kde}(t), dt, \] which is evaluated numerically using trapezoidal rule.

The variance of \(\widehat{\mathrm{AUC}}_{kde}\) can be estimated using bootstrap resampling, which accounts for uncertainty in both the kernel estimation and the sampling process. Zou et al. (1997), stated that the smoothing introduced by KDE has negligible effect on the first-order variance of \(\widehat{\mathrm{AUC}}_{kde}\). To ensure confidence intervals remain within the \((0,1)\) range, a log-transformation is recommended: \[ -\log(1 - \widehat{\mathrm{AUC}}_{kde}), \] constructing intervals on this transformed scale and then back-transforming for interpretation.

As a nonparametric method, kernel-smoothed ROC curve provides several advantages:

While flexible, kernel-based ROC methods also have several limitations:

  1. Boundary bias: KDE performs poorly near FPR values close to 0 or 1.
  2. Bandwidth sensitivity: Requires careful tuning of \(h_m\) and \(h_n\).
  3. Computational cost: Bootstrapping smooth ROC curves can be intensive for large datasets.

Example

# Load formatted dataset
data(DMDmodified)

# Compute smooth ROC using Gaussian kernel and Silverman bandwidth
auc <- AUC(
  data      = DMDmodified,
  method    = "norm_silver",
  ci        = TRUE,
  ci_method = "bootstrap",
  siglevel  = 0.05,
  boot_iter = 1000
)

# Display AUC summary
auc$summary

# Display smooth ROC plot
auc$plot

References


binormal       Binormal ROC Curve


To apply this method, set method = "binormal" in the AUC() function. The following options are available for ci_method:

Usage

AUC(
  data      = data,
  method    = "binormal",
  ci        = TRUE,
  ci_method = "bootstrap",
  siglevel  = 0.05,
  boot_iter = 1000
)

Description

The bi-normal ROC model is one of the most widely used parametric approaches to ROC analysis. It assumes that biomarker values for both the non-diseased \(F\) and diseased \(G\) populations follow normal distributions, but with potentially different means and variances. The model assumes: \[ F(x) = \Phi\left( \frac{x - \mu_0}{\sigma_0} \right), \quad G(y) = \Phi\left( \frac{y - \mu_1}{\sigma_1} \right), \] where \(\Phi(\cdot)\) is the standard normal CDF, and \(\mu_0, \sigma_0^2, \mu_1, \sigma_1^2\) are the means and variances for the two groups.

Defining \[ a = \frac{\mu_1 - \mu_0}{\sigma_1}, \qquad b = \frac{\sigma_0}{\sigma_1}, \] the bi-normal ROC curve can be expressed as

\[ ROC_{\text{Bin}}(t) = \Phi\left( a + b,\Phi^{-1}(t) \right), \] where \(\Phi^{-1}(\cdot)\) is the quantile function of the standard normal distribution. The corresponding AUC is given by \[ AUC_{\text{Bin}} = \Phi\left( \frac{a}{\sqrt{1 + b^2}} \right). \]

The parameters \(\mu_0, \sigma_0, \mu_1, \sigma_1\) are estimated using Maximum likelihood estimation (MLE), assuming normality. MLE provides asymptotically efficient estimates and allows for likelihood-based confidence intervals on the AUC.

Because the AUC has a closed-form expression, confidence intervals can be obtained using either:

Example

# Load formatted dataset
data(DMDmodified)

# Compute bi-normal AUC summary and ROC plot
auc <- AUC(
  data      = DMDmodified,
  method    = "binormal",
  ci        = TRUE,
  ci_method = "mle",
  siglevel  = 0.05,
  boot_iter = 1000
)

# Display AUC summary
auc$summary

# Display bi-normal ROC plot
auc$plot

References


biweibull       Constant-shape bi-Weibull ROC Curve


To apply this method, set method = "biweibull" in the AUC() function. The following options are available for ci_method:

Usage

AUC(
  data      = data,
  method    = "biweibull",
  ci        = TRUE,
  ci_method = "mle",
  siglevel  = 0.05,
  boot_iter = 1000
)

Description

The constant-shape bi-Weibull model is a flexible parametric model for ROC analysis. Let \(X\) and \(Y\) denote biomarker values for non-diseased and diseased subjects, respectively. Assume that both follow Weibull distributions with a common shape parameter \(\alpha\), but possibly different scale parameters \(\theta_0\) and \(\theta_1\). Then, the bi-Weibull ROC curve:

\[ ROC_{\text{Biw}}(t) = t^{\frac{\theta_0}{\theta_1}}, \quad t \in (0,1). \]

The corresponding AUC has a simple closed-form expression which is given by

\[ AUC_{\text{Biw}} = \frac{\theta_1}{\theta_0 + \theta_1}. \]

The model parameters \(\alpha, \theta_0, \theta_1\) are typically estimated via maximum likelihood estimation (MLE), which provides consistent and efficient estimators under the Weibull assumption. Then, confidence intervals are calculated using

The bi-Weibull ROC curve assumes biomarker values for the non-diseased and diseased populations each follow Weibull distributions. Owing to its adaptable shape parameter, the Weibull family can approximate several common distributions—including the exponential, Rayleigh, and even log-normal-like forms—making it particularly effective for modeling skewed or heavy-tailed biomedical data, where symmetric distributions such as the normal may perform poorly. Under the Weibull distributional assumption, this parametric model also outperforms for small sample sizes.

Example

# Load formatted dataset
data(DMDmodified)

# Compute bi-Weibull AUC summary and ROC plot
auc <- AUC(
  data      = DMDmodified,
  method    = "biweibull",
  ci        = TRUE,
  ci_method = "mle",
  siglevel  = 0.05,
  boot_iter = 1000
)

# Display AUC summary
auc$summary

# Display bi-Weibull ROC plot
auc$plot

References


bigamma       Bi-Gamma ROC Curve


To apply this method, set method = "bigamma" in the AUC() function. The option ci_method = "bootstrap" refers to the computation of the parametric bootstrap percentile interval, which is the only available option for Bayesian Bootstrap inference.

Usage

AUC(
  data      = data,
  method    = "bigamma",
  ci        = TRUE,
  ci_method = "bootstrap",
  siglevel  = 0.05,
  boot_iter = 1000
)

Description

In the bi-Gamma ROC model, both populations are assumed to follow independent Gamma distributions but with potentially different shape and scale parameters, allowing for flexible modeling of skewness and dispersion. Let

where \(k_1, k_2\) are shape parameters and \(\theta_1, \theta_2\) are scale parameters. The probability density functions (PDFs) are:

\[ f(x; k_1, \theta_1) = \frac{1}{\Gamma(k_1)\theta_1^{k_1}}x^{k_1-1} e^{-x / \theta_1}, \quad x > 0, \] \[ g(y; k_2, \theta_2) = \frac{1}{\Gamma(k_2)\theta_2^{k_2}}y^{k_2-1} e^{-y / \theta_2}, \quad y > 0. \]

The bi-Gamma ROC curve is given by

\[ ROC_{\text{Big}}(t) = 1 - \frac{\gamma\left(k_2, \frac{k_1}{\theta_2}\gamma^{-1}(k_1, 1 - t)\right)}{\Gamma(k_2)}, \] where \(\gamma^{-1}(a, \cdot)\) is the inverse lower incomplete Gamma function.

The area under the bi-Gamma ROC curve \((AUC_{\text{Big}})\) can be expressed as

\[ \label{AUC_gam_F} AUC_{\text{Big}} = F_F\left(\frac{k_2 \theta_2}{k_1 \theta_1}; , 2k_1, 2k_2\right), \] where \(F_F(\cdot; 2k_1, 2k_2)\) denotes the CDF of an F-distributed random variable with \(2k_1\) and \(2k_2\) degrees of freedom.

The model parameters \((k_1, \theta_1, k_2, \theta_2)\) are estimated by maximum likelihood estimation (MLE) based on independent samples from the non-diseased and diseased groups. The parametric percentile bootstrap (ci_method = "bootstrap") is recommended for constructing percentile-based confidence intervals, especially when sample sizes are small-moderate or data deviate from ideal Gamma assumptions.

The bi-Gamma ROC model is a parametric ROC framework and is suitable for data that are positively skewed or have heavy right tails, characteristics commonly observed in biomedical and reliability studies.

Example

# Load formatted dataset
data(DMDmodified)

# Compute bi-Gamma AUC summary and ROC plot
auc <- AUC(
  data      = DMDmodified,
  method    = "gamma",
  ci        = TRUE,
  ci_method = "bootstrap",
  siglevel  = 0.05,
  boot_iter = 1000
)

# Display AUC summary
auc$summary

# Display bi-Gamma ROC plot
auc$plot

References


lehmann       Semiparametric ROC Curve under the Lehmann Model


The option ci_method = "ple": refers to the computation of the confidence interval based on partial likelihood-based method (via proportional hazards model), which is the only available option for Lehmann model.

Usage

AUC(
  data      = data,
  method    = "lehmann",
  ci        = TRUE,
  ci_method = "ple",
  siglevel  = 0.05,
  boot_iter = 1000
)

Description

The Lehmann model provides a semiparametric framework for ROC curve estimation that assumes a simple power relationship between the survivor functions of the diseased and non-diseased populations:

\[ \overline{G}(t) = [\overline{F}(t)]^{\delta}, \qquad 0 < \delta \le 1, \] where \(\overline{F}(t)\) and \(\overline{G}(t)\) are the survivor functions of the biomarker for the non-diseased and diseased groups, respectively, and \(\delta\) is a single diagnostic accuracy parameter. Smaller values of \(\delta\) correspond to stronger discriminatory ability of the biomarker.

Under this assumption, the ROC curve and its corresponding area have simple analytical forms:

\[ ROC_{\text{le}}(t) = t^{\delta}, \qquad t \in [0, 1], \] and \[ AUC_{\text{le}} = \int_0^1 t^{\delta} dt = \frac{1}{1 + \delta}. \]

This produces a smooth, monotonic ROC curve that is both interpretable and computationally efficient. The single parameter \(\delta\) controls the shape of the ROC and directly determines the AUC.

The Lehmann assumption is equivalent to the proportional hazards (PH) formulation in survival analysis, where the ratio of hazard functions for the diseased and non-diseased groups is constant:

\[ \frac{h_Y(t)}{h_X(t)} = e^{\beta}. \]

Here, the Lehmann parameter and PH coefficient are linked by \(\delta = e^{\beta}\). Thus, estimation of \(\hat{\beta}\) proceeds by fitting a Cox proportional hazards model using survival package and consequently, \[ \hat{\delta} = e^{\hat{\beta}}, \quad \widehat{AUC}_{\text{le}} = \frac{1}{1 + \hat{\delta}}. \]

The confidence interval for \(AUC_{\text{le}}\) can then be derived using the delta method, based on the estimated variance of \(\hat{\beta}\).

This method naturally accommodates covariates through the Cox proportional hazards framework, allowing for adjusted ROC analysis that accounts for additional variables. The parameter \(\delta = e^{\beta}\) offers direct clinical interpretability as a hazard ratio, making the results meaningful in applied biomedical contexts. It is computationally simple, relying on standard Cox regression routines without requiring complex optimization procedures or Bayesian sampling. The approach is also robust, maintaining statistical efficiency without imposing distributional assumptions on the biomarker data.

The Lehmann ROC model bridges parametric and nonparametric approaches by imposing a simple, interpretable relationship between sensitivity and specificity while leaving the biomarker distributions unspecified. This balance of robustness, flexibility, and efficiency makes it particularly suitable for heterogeneous biomedical datasets, especially where biomarkers are influenced by covariates or measured repeatedly over time.

Example

# Load formatted dataset
data(DMDmodified)

# Compute semiparametric ROC under the Lehmann assumption
auc <- AUC(
  data      = DMDmodified,
  method    = "lehmann",
  ci        = TRUE,
  ci_method = "ple",
  siglevel  = 0.05,
  boot_iter = 1000
)

# Display AUC summary
auc$summary

# Display Lehmann ROC plot
auc$plot

References


bayesbiweibull       Bayesian Bi-Weibull ROC Curve


To apply this method, set method = "bayesbiweibull" in the AUC() function. The option ci_method = "mcmc" refers to the computation of the Bayesian Bootstrap credible interval, which is the only available option for Bayesian Bootstrap inference. The boot_iter option is inactive for this method, as the number of MCMC iterations is fixed at 11,000 (comprising 1000 burn-in and 10000 retained samples).

Usage

AUC(
  data      = data,
  method    = "bayesbiweibull",
  ci        = TRUE,
  ci_method = "mcmc",
  siglevel  = 0.05,
  boot_iter = 1000
)

Description

The Bayesian Bi-Weibull ROC curve is a parametric Bayesian extension of the constant-shape Bi-Weibull model. In the Bayesian paradigm, the unknown model parameters are treated as random variables with prior distributions that reflect prior knowledge or beliefs about their possible values. These priors are updated with the observed data through Bayes’ theorem, yielding posterior distributions for both the ROC curve and its area under the curve (AUC). Posterior summaries (such as the posterior mean or credible intervals) serve as Bayesian estimates of ROC quantities.

As described in the frequentist Bi-Weibull section, we assume the biomarker values for the non-diseased and diseased populations follow Weibull distributions with a shared shape parameter \(\alpha\), but distinct scale parameters \(\theta_0\) and \(\theta_1\). The parameters \(\theta_0\), \(\theta_1\), and \(\alpha\) are treated as random variables with the following prior distributions:

\[ \theta_j \sim \mathrm{IG}(a_j, b_j), \quad j = 0, 1, \] \[ \alpha \sim \mathrm{Gamma}(k, \beta), \] where \(a_j, b_j, k, \beta > 0\). Here, \(\mathrm{IG}(a, b)\) denotes the inverse-gamma distribution with density \[ \pi_{1j}(\theta_j) = \frac{b_j^{a_j}}{\Gamma(a_j)} , \theta_j^{-(a_j + 1)} e^{-b_j / \theta_j}, \] and all priors are assumed independent.

Given data \(x_1, \dots, x_m\) (controls) and \(y_1, \dots, y_n\) (cases), the likelihood function is:

\[ L(\alpha, \theta_0, \theta_1 \mid \text{data}) \propto \alpha^{m+n} , \theta_0^{-m} , \theta_1^{-n} \left( \prod_{i=1}^{m} x_i^{\alpha - 1} \right) \left( \prod_{j=1}^{n} y_j^{\alpha - 1} \right) \exp\left(-\frac{\sum_{i=1}^{m} x_i^{\alpha}}{\theta_0}\right) \exp\left(-\frac{\sum_{j=1}^{n} y_j^{\alpha}}{\theta_1}\right). \]

The posterior distribution is proportional to the product of the likelihood and priors:

\[ p(\alpha, \theta_0, \theta_1 \mid \text{data}) \propto L(\alpha, \theta_0, \theta_1 \mid \text{data}) , \pi_{10}(\theta_0)\pi_{11}(\theta_1)\pi(\alpha). \]

Because the posterior distribution cannot be evaluated analytically, parameter estimation is performed using Markov Chain Monte Carlo (MCMC)-typically through Gibbs sampling with a Metropolis–Hastings step for \(\alpha\). At each iteration (s), new samples \(\theta_0^{(s)}\), \(\theta_1^{(s)}\), and \(\alpha^{(s)}\) are drawn from their respective conditional posterior distributions.

From these samples, the AUC for iteration (s) is computed as:

\[ AUC^{(s)} = \frac{\theta_1^{(s)}}{\theta_0^{(s)} + \theta_1^{(s)}}. \]

The posterior mean AUC and its 95% highest posterior density (HPD) credible interval are then estimated as:

\[ \widehat{AUC}_{\text{Biw}}^{\text{Bayes}} = \frac{1}{S} \sum_{s=1}^{S} AUC^{(s)}, \quad CI_{95\%} = \mathrm{HPD}_{0.95}\{AUC^{(1)}, \dots, AUC^{(S)}\}. \]

For this implementation, 11,000 MCMC iterations were performed, discarding the first 1,000 iterations as burn-in and retaining the remaining 10,000 samples for posterior inference. The AU and its 95% HPD credible intervals were computed using non-informative priors, with \(a_1 = a_2 = b_1 = b_2 = 0\) and \(\alpha \sim \text{Gamma}(0.1, 1)\). Note that these priors for \(\theta_0\), \(\theta_1\), and \(\alpha\) are non-proper, meaning they do not integrate to one but still yield proper posteriors when combined with the likelihood.

The Bayesian Bi-Weibull ROC approach offers a flexible and robust framework for ROC analysis by combining prior knowledge with observed data. It generates full posterior distributions for the ROC curve and AUC through MCMC simulation, providing direct quantification of uncertainty without relying on asymptotic approximations. Averaging over posterior draws yields smooth and stable ROC estimates, even in small samples. The 95% highest posterior density (HPD) credible intervals, computed using the HDInterval package in R.

Example

# Load formatted dataset
data(DMDmodified)

# Bayesian estimation of the Bi-Weibull AUC and ROC
auc <- AUC(
  data      = DMDmodified,
  method    = "bayesbiweibull",
  ci        = TRUE,
  ci_method = "mcmc",
  siglevel  = 0.05,
  boot_iter = 1000
)

# Display Bayesian AUC summary
auc$summary

# Display posterior ROC plot
auc$plot

References


dpm       Bayesian Semiparametric ROC (Dirichlet Process Mixture of Normals)


To apply this method, set method = "dpm" in the AUC() function. The option ci_method = "dpm" refers to the computation of the Bayesian Bootstrap credible interval, which is the only available option for Bayesian Bootstrap inference. The boot_iter= option is inactive for this method, as the number of MCMC iterations is fixed at 500 (comprising 100 burn-in and 400 retained samples).

Usage

AUC(
  data      = data,
  method    = "dpm",
  ci        = TRUE,
  ci_method = "dpm",
  siglevel  = 0.05,
  boot_iter = 1000
)

Description

The Bayesian semiparametric ROC approach combines the interpretability of parametric models with the flexibility of nonparametric inference by assigning infinite-dimensional priors, such as Dirichlet processes (DPs), to the biomarker distributions. This allows the model to capture complex data features — including multimodality, skewness, and heterogeneity — that cannot be adequately represented by simple single-component parametric models.

A key implementation of this framework is the Dirichlet Process Mixture (DPM) of normal distributions proposed by Erkanli et al. (2006). In this model, the biomarker distributions for the non-diseased and diseased populations are represented as mixtures of normal components, with random mixing distributions drawn from independent DPs.

Let \({X}_{i=1}^m \sim F\) and \({Y}_{i=1}^n \sim G\) denote biomarker measurements from the non-diseased and diseased groups, respectively. Then,

\[ F(x) = \sum_{l=1}^{L} p_l \Phi(x \mid \mu_l, \sigma_l^2), \qquad G(y) = \sum_{l=1}^{L'} p_l' \Phi(y \mid \mu_l', \sigma_l'^2), \] where \(\Phi(\cdot \mid \mu, \sigma^2)\) is the normal cumulative distribution function, and \(L, L'\) are truncation levels that approximate the infinite Dirichlet process mixture.

The mixture weights \({p_l}\) follow a stick-breaking process:

\[ p_l = \begin{cases} R_1, & l = 1, \\ R_l \prod_{r=1}^{l-1}(1 - R_r), & l = 2, \dots, L - 1, \\ \prod_{r=1}^{L-1}(1 - R_r), & l = L, \end{cases} \] with a similar construction for \({p_l'}\) in the diseased group. The priors are specified as follows:

\[ R_r \sim \mathrm{Beta}(1, \alpha), \quad \alpha \sim \mathrm{Gamma}(a, b), \] \[ \mu_l \sim N(m_0, S_0), \quad \sigma_l^{-2} \sim \mathrm{Gamma}(c, d), \] where \(\alpha\) controls the number of mixture components and the model complexity. This finite truncation approach, following Ishwaran and James (2002), yields a computationally tractable approximation to the Dirichlet process.

Given \(F\) and \(G\), the ROC curve is defined as

\[ ROC_{\text{DPM}}(p) = 1 - G(F^{-1}(1 - p)), \qquad 0 < p < 1, \] and the corresponding AUC is

\[ AUC_{\text{DPM}} = \int_0^1 [1 - G(F^{-1}(1 - p))] , dp. \]

Both quantities are evaluated numerically at each iteration of the MCMC algorithm using the current mixture parameter draws.

Posterior inference for the ROC curve and AUC is obtained using Gibbs sampling with Metropolis–Hastings updates when needed. At each iteration, the algorithm updates mixture component parameters and stick-breaking weights for both groups, computes the corresponding ROC curve and AUC, and stores these posterior samples. The posterior mean ROC curve and 95% credible intervals are then obtained by averaging across MCMC iterations.

In this implementation, the semiparametric Bayesian estimator \(\widehat{\text{AUC}}_{\text{DPM}}\) is computed using posterior samples from the Dirichlet process mixture model. Weakly informative priors are used to ensure flexibility and stability: \(\alpha \sim \text{Gamma}(1, 1)\), \(\mu_l \sim \mathcal{N}(0, 100)\), and \(\tau_l = \sigma_l^{-2} \sim \text{Gamma}(0.1, 0.1)\), providing vague yet regularized estimates. Stick-breaking variables follow \(R_r \sim \text{Beta}(1, \alpha)\), and both groups are modeled identically. The truncation level is fixed at \(L = L' = 10\), which balances computational efficiency with representational power. Posterior inference was based on 500 MCMC iterations (100 burn-in and 400 retained samples), yielding stable AUC estimates across replications.

Although \(\widehat{\text{AUC}}_{\text{DPM}}\) often produced narrower credible intervals than competing methods, it was somewhat sensitive to prior choices in small-sample scenarios. Despite a higher computational cost, this approach remains highly flexible and robust—particularly valuable when the biomarker distributions are skewed, heavy-tailed, or multimodal.

Example

# Load formatted dataset
data(DMDmodified)

# Bayesian semiparametric ROC using Dirichlet process mixture of normals
auc <- AUC(
  data      = DMDmodified,
  method    = "dpm",
  ci        = TRUE,
  ci_method = "dpm",
  siglevel  = 0.05,
  boot_iter = 1000
)

# Display AUC summary (posterior mean and credible interval)
auc$summary

# Display DPM-based ROC plot (posterior mean ROC with bands)
auc$plot

References


BB       Bayesian Bootstrap ROC Curve


To apply this method, set method = "BB" in the AUC() function. The option ci_method = "bb" refers to the computation of the Bayesian Bootstrap credible interval, which is the only available option for Bayesian Bootstrap inference. boot_iter= refers to the number of bootstrap replications.

Usage

AUC(
  data      = data,
  method    = "BB",
  ci        = TRUE,
  ci_method = "bb",
  siglevel  = 0.05,
  boot_iter = 1000
)

Description

The Bayesian Bootstrap (BB), introduced by Rubin (1981), provides a fully nonparametric Bayesian method for estimating smooth ROC curves and AUC values. Unlike classical bootstrapping, which resamples data points, BB assigns random Dirichlet weights to observed data, generating a posterior distribution over ROC curves that reflects uncertainty without relying on large-sample approximations or bandwidth selection.

In empirical ROC estimation, each observation contributes equally (weights of \(1/m\) for controls and \(1/n\) for cases). BB replaces these fixed weights with random draws from a Dirichlet(1, …, 1) distribution. Averaging across replicates yields a smooth posterior mean ROC curve, and variation among replicates quantifies uncertainty in AUC.

Let \(X = (X_1, \dots, X_m)\) be controls and \(Y = (Y_1, \dots, Y_n)\) be cases. For each bootstrap replicate \(b = 1, \dots, B\):

  1. Draw \((p_1, \dots, p_m) \sim \text{Dirichlet}(1, \dots, 1)\), or equivalently \(p_i = w_i / \sum_j w_j\) with \(w_i \sim \text{Exponential}(1)\).
    Define weighted empirical CDF:
    \[ F^{(b)}(u) = \sum_{i=1}^{m} p_i \mathbf{1}(X_i \le u) \]
    Compute placement values:
    \[ U_j^{(b)} = 1 - F^{(b)}(Y_j), \quad j = 1, \dots, n \]

  2. Draw \((q_1, \dots, q_n) \sim \text{Dirichlet}(1, \dots, 1)\)
    Construct ROC curve:
    \[ ROC_{m,n}^{(b)}(t) = \sum_{j=1}^{n} q_j \mathbf{1}(U_j^{(b)} \le t) \]
    Estimate AUC numerically:
    \[ AUC^{(b)} = \int_0^1 ROC_{m,n}^{(b)}(t) \, dt \]

  3. Combine the results from all \(B\) replicates to produce the posterior mean estimates: \[ \widehat{\text{ROC}}_{\text{BB}}(t) = \frac{1}{B} \sum_{b=1}^B \text{ROC}_{m,n}^{(b)}(t), \quad \widehat{\text{AUC}}_{\text{BB}} = \frac{1}{B} \sum_{b=1}^B \text{AUC}^{(b)}. \]

    Posterior variance:
    \[ \text{Var}(\widehat{AUC}_{\text{BB}}) = \frac{1}{B - 1} \sum_{b=1}^{B} \left(AUC^{(b)} - \widehat{AUC}_{\text{BB}}\right)^2 \]

    A \(100(1-\alpha)\%\) credible interval for the AUC is obtained by taking the \(\alpha/2\) and \(1-\alpha/2\) quantiles of the empirical distribution \(\{\text{AUC}^{(1)}, \dots, \text{AUC}^{(B)}\}\).

The Bayesian Bootstrap generates smooth ROC curves by averaging over random weighted distributions, avoiding kernel smoothing or parametric assumptions. It is especially useful for small or irregular samples, offering robust, data-driven inference with direct posterior uncertainty quantification. The method is also computationally efficient, relying on simple resampling rather than full MCMC.

Example

# Load formatted dataset
data(DMDmodified)

# Bayesian Bootstrap ROC and AUC estimation
auc <- AUC(
  data      = DMDmodified,
  method    = "BB",
  ci        = TRUE,
  ci_method = "bb",
  siglevel  = 0.05,
  boot_iter = 1000
)

# Display posterior AUC summary and credible interval
auc$summary

# Display smooth Bayesian Bootstrap ROC plot
auc$plot

References