class weatherbench2.metrics.CRPS(ensemble_dim='realization')

Continuous Ranked Probability Score, averaged over space and time.

Given ground truth scalar random variable Y, and two iid predictions X, X’, the Continuously Ranked Probability Score is defined as

CRPS = E|X - Y| - 0.5 * E|X - X’|

where E is mathematical expectation, and |⋅| is absolute value. CRPS has a unique minimum when X is distributed the same as Y.

The associated spread/skill ratio is

SS(CRPS) = E|X - X’| / E|X - Y|.

Assuming Y is non-constant, SS(CRPS) = 0 only when X is constant. Since X, X’ are independent, |X - Y| < |X - Y| + |X - Y|, and thus 0 ≤ SS(CRPS) < 2. If X has the same distribution as Y, SS(CRPS) = 1. Caution, it is possible for SS(CRPS) = 1 even when X and Y have different distributions.

CRPS for multi-dimensional random variables is computed as a weighted average over components. The minimum is achieved by any prediction X with the correct marginals.

In our case, each prediction is conditioned on the start time t. Given T different start times, this class estimates time and ensemble averaged quantities for each tendency “V”, producing entries

V_spread := (1 / T) Σₜ ‖Xₜ - Xₜ’‖ V_skill := (1 / T) Σₜ ‖Xₜ - Yₜ‖ V_score := V_skill - 0.5 * V_spread

‖⋅‖ is the area-averaged L1 norm. Estimation is done separately for each tendency, level, and lag time.

If N ensemble members are available, the ensemble mean is taken using the PWM method from [Zamo & Naveau, 2018].

So long as 2 or more ensemble members are used, the estimates of spread, skill and CRPS are unbiased at each time. Therefore, assuming some ergodicity, one can average over many time points and obtain highly accurate estimates.

NaN values propagate through and result in NaN in the corresponding output position.

References: [Gneiting & Raftery, 2012], Strictly Proper Scoring Rules, Prediction, and


[Zamo & Naveau, 2018], Estimation of the Continuous Ranked Probability Score

with Limited Information and Applications to Ensemble Weather Forecasts.


ensemble_dim (str) –


ensemble_dim (str) –

Return type:




compute(forecast, truth[, region])

Evaluate this metric on datasets with full temporal coverages.

compute_chunk(forecast, truth[, region])

CRPS, averaged over space, for a time chunk of data.