From Effect Size to Real-World Predictive Utility

E2P Simulator estimates the real-world predictive utility of research findings by accounting for measurement reliability and outcome base rate. It is designed to help researchers interpret findings and plan studies across biomedical and behavioral sciences, particularly in individual differences research, biomarker development, predictive modelling, and precision medicine.

See Get Started guide for more details.

Developed by Povilas Karvelis

Outcomes
Effects
True Obs.
Cohen's d
Cohen's U3
Odds Ratio
Log Odds Ratio
Point-biserial rpb
η² (rpb²)
Predictor reliability, ICC1
Predictor reliability, ICC2
Outcome reliability, κ
Base rate, \(\phi\)

Accuracy: 0.00

Sensitivity: 0.00

Specificity: 0.00

BA: 0.00

PPV: 0.00

NPV: 0.00

F1: 0.00

MCC: 0.00

Show more metrics

LR+: 0.00

LR-: 0.00

DOR: 0.00

P(D|+): 0.00

P(D|−): 0.00

G-Mean: 0.00

J: 0.00

Cohen’s κ: 0.00

From Single to Multiple Predictors

While the interactive graph above explores a single predictor, here you can estimate the combined effect of multiple predictors and determine how many are needed to achieve a desired level predictive performance. The combined effect is estimated by first computing Mahalanobis D, a multivariate generalization of Cohen's d, which is then directly converted to ROC-AUC, and to PR-AUC by accounting for the base rate. For simplicity, the estimation assumes predictors to have the same effect sizes, uniform collinearity, and no interaction effects.
See the Get Started for more details.

A simplified formula for converting multiple predictors to ROC-AUC:

\[\text{ROC-AUC} = \Phi\left(d \sqrt{\frac{p}{2(1 + (p-1)r_{ij})}}\right)\]
Target ROC-AUC
Target PR-AUC
Base rate

Effect size of each predictor (d)
Collinearity among predictors (rij)
Number of predictors (p)

Required Sample Size for Multivariable Models

Determining the right sample size is crucial for developing reliable multivariable prediction models. Too small a sample risks overfitting, unstable estimates, and poor generalizability; too large wastes resources. A common rule of thumb is to ensure a minimum number of events per variable (EPV). However, more principled criteria that are based on desired model performance - such as minimizing overfitting or prediction error - can provide more accurate estimates.
See the Get Started for more details.

Number of predictors (p)
Base rate (\(\phi\))

EPV criterion (rule of thumb)

\[N = \dfrac{EPV \cdot p}{\phi}\]
EPV

Shrinkage criterion (minimizes overfitting)

\[N = \dfrac{p}{(S-1)\,\ln(1- R^2_{CS}/S)}\]
Anticipated R²cs
Shrinkage factor (S)

Mean Absolute Prediction Error (MAPE) criterion (p ≤ 30)

(minimizes prediction error)

\[N = \mathrm{exp}\left(\dfrac{-0.508 + 0.259\ln(\phi) + 0.504\ln(P) - \ln(\delta)}{0.544}\right)\]
Target MAPE (δ)

Calibration: From Test Set to Deployment (beta version)

Models trained and validated on test sets often encounter different conditions when deployed in real-world settings. This module explores how differences in measurement reliability and outcome base rates between test and deployment environments affect model calibration. A well-calibrated model's predicted probabilities should match observed outcome frequencies in the deployment set.

Test Set

True Effect Size (d)
Predictor reliability, ICC1
Predictor reliability, ICC2
Outcome reliability, κ
Base rate, φ

Deployment Set

True Effect Size (d)
Predictor reliability, ICC1
Predictor reliability, ICC2
Outcome reliability, κ
Base rate, φ

What to look for: Perfect calibration appears as points along the diagonal line. Deviations indicate miscalibration: points above the line mean the model underestimates risk, points below mean it overestimates risk.
Brier: 0.000
ECE: 0.000
Slope: 1.000
Intercept: 0.000