Federated Computation of ROC and PR Curves

Xuefeng Xu and Graham Cormode

University of Warwick

Preprint 2025

TL;DR: Privacy-preserving ROC/PR curve approximation for federated learning.

Introduction

Federated Learning allows multiple clients to collaboratively train models without sharing raw data. However, evaluation is often limited to aggregate simple metrics such as accuracy or loss, which provide an incomplete picture of model performance.

We propose a method to approximate Receiver Operating Characteristic (ROC) and Precision-Recall (PR) curves in federated settings, without accessing raw client data. Our approach supports both Secure Aggregation and Differential Privacy, providing provable error guarantees and low communication cost.

Methods Overview

Our method consists of five main steps:

Local Histograms: Clients build histograms for positive/negative scores.
Aggregation: The server collects and combines histograms.
Quantile Estimation: Quantiles are estimated from bin counts.
ECDF Approximation: Interpolate quantiles to reconstruct ECDFs.
Curve Construction: Use ECDFs to compute ROC and PR curves.

Quantiles Estimation via Histograms

To estimate quantiles, each client builds a hierarchical histogram (Figure 1) by recursively dividing the score range into equal-width bins and counting examples in each. The server aggregates these histograms and computes global quantiles based on the combined bin counts and boundaries.

Figure 1: Hierarchical histogram structure.

To ensure client’s privacy, we consider two mechanisms:

Secure Aggregation: The server sees only the total, not individual bins from clients.
Differential Privacy: Clients add independent noise to each bin before sending.

Curve Approximation via Quantiles

Let \Phi^-(s) and \Phi^+(s) be the ECDFs of prediction score distributions for negative and positive examples. We estimate Q evenly spaced quantiles (Q=6 in Figure 2), and apply monotone piecewise cubic polynomial interpolation (PCHIP) to approximate the full ECDFs.

Figure 2: ECDFs reconstructed from Q quantiles for both classes.

For the ROC curve, we then compute:

\begin{equation*} T(s)=1-\Phi^+(s), \end{equation*} \tag{1}

\begin{equation*} F(s)=1-\Phi^-(s), \end{equation*} \tag{2}

where T(s) and F(s) denote the true positive rate (TPR) and false positive rate (FPR).

For the PR curve, recall is equivalent to TPR, and precision is computed by:

\begin{equation*} P(s)=\frac{T(s)n^+}{T(s)n^+ + F(s)n^-} \end{equation*} \tag{3}

Here, n^+ and n^- are the number of positive and negative examples. Figure 3 shows the resulting approximated ROC and PR curves.

Figure 3: ROC and PR curves approximated from ECDFs.

Theoretical Guarantees

To quantify approximation quality, we define the Area Error (AE) as:

Definition 1 AE is the integral of the absolute difference between the true and estimated curves: \begin{equation*} \text{AE}_\text{ROC} = \int_0^1 |T(f) - \hat{T}(f)| df, \end{equation*} \tag{4}

\begin{equation*} \text{AE}_\text{PR} = \int_0^1 |P(t) - \hat{P}(t)| dt, \end{equation*} \tag{5}

where T(f) = T(F^{-1}(f)) and P(t) = \frac{tn^+}{tn^+ + F(T^{-1}(t))n^-} are the true ROC and PR curves, and \hat{T}(f) = \hat{T}(\hat{F}^{-1}(f)) and \hat{P}(t) = \frac{tn^+}{tn^+ + \hat{F}(\hat{T}^{-1}(t))n^-} are their estimates.

Assuming Lipschitz continuity of score distributions \Phi^-(s) and \Phi^+(s), we bound the AE as follows:

Theorem 1 Let Q be the number of quantiles used. Then:

Under Secure Aggregation: \text{AE}_\text{ROC}\le O(1/Q) and \text{AE}_\text{PR}\le\tilde{O}(1/Q).
Under \varepsilon-Differential Privacy: \text{AE}\le\tilde{O}\left(\frac{1}{Q} + \frac{1}{n\varepsilon}\right), where n is the number of examples.

Empirical Evaluation

We evaluate the method using the Adult dataset and XGBoost classifier in Figure 4. We test both Secure Aggregation (SA) and Distributed Differential Privacy (DDP), varying Q from 4 to 1024 and \varepsilon\in\{0.1,0.3,1\}.

Figure 4: Area Error of ROC and PR curves vs. number of quantiles.

Key observations are as follows:

Area Error decreases with more quantiles Q.
PR curve incurs slightly higher error than ROC curve.
With DP, error plateaus and grows as \varepsilon decreases (stronger privacy).

Citation

@misc{Xu2025fedcurve,
  title={Federated Computation of ROC and PR Curves},
  author={Xuefeng Xu and Graham Cormode},
  year={2025},
  eprint={2510.04979},
  archivePrefix={arXiv},
  primaryClass={cs.LG},
  url={https://arxiv.org/abs/2510.04979},
}