import xarray as xr

WeatherBench 2 Data Guide

One core part of WeatherBench 2 are ready-to-use, cloud-based datasets. This page lists and describes all the available datasets.

The datasets are stored in this public Google Cloud bucket: gs://weatherbench2/datasets.

Please also check the LICENSE files for each dataset in the respective GCS buckets. Some datasets allow commercial use. Others only permit research use.

A note on resolutions

We provide the datasets at different resolutions. All files will have the number of longitude X latitude grid points in their filename, e.g. 64x32. For the WeatherBench 2 paper, all evaluation was done at 240x121 = 1.5 degree resolution. All datasets were regridded using first-order conservative regridding, i.e., with weights proportional to the area of overlap between grid cells on the original and desired grids.

The 1440x721 (= 0.25 degrees) and 240x121 files contain the poles, i.e. -90 and 90 degree latitude, denoted with with_poles. 64x32 files do not contain the poles to ensure equal spacing.

Ground-truth datasets

ERA5

Our ERA5 datasets were downloaded from the Copernicus Climate Data Store and have a time range from 1959 to 2023 (incl.). The data here have been downsampled to 6h and 13 levels, even though a raw hourly dataset with 37 levels is also available at gs://weatherbench2/datasets/era5/1959-2023_01_10-full_37-1h-0p25deg-chunk-1.zarr

Location: gs://weatherbench2/datasets/era5/

Files:

  • 1959-2023_01_10-full_37-1h-0p25deg-chunk-1.zarr

  • 1959-2023_01_10-wb13-6h-1440x721_with_derived_variables.zarr

  • 1959-2023_01_10-6h-240x121_equiangular_with_poles_conservative.zarr

  • 1959-2023_01_10-6h-64x32_equiangular_conservative.zarr

Note: Older version of the ERA5 files exist in the bucket to ensure continuity.

See output below for a list of variables. The file also contains several derived variables which were computed using these methods.

xr.open_zarr('gs://weatherbench2/datasets/era5/1959-2023_01_10-wb13-6h-1440x721_with_derived_variables.zarr')
<xarray.Dataset>
Dimensions:                                           (time: 93544,
                                                       latitude: 721,
                                                       longitude: 1440,
                                                       level: 13)
Coordinates:
  * latitude                                          (latitude) float32 90.0...
  * level                                             (level) int64 50 ... 1000
  * longitude                                         (longitude) float32 0.0...
  * time                                              (time) datetime64[ns] 1...
Data variables: (12/62)
    10m_u_component_of_wind                           (time, latitude, longitude) float32 dask.array<chunksize=(1, 721, 1440), meta=np.ndarray>
    10m_v_component_of_wind                           (time, latitude, longitude) float32 dask.array<chunksize=(1, 721, 1440), meta=np.ndarray>
    10m_wind_speed                                    (time, latitude, longitude) float32 dask.array<chunksize=(1, 721, 1440), meta=np.ndarray>
    2m_dewpoint_temperature                           (time, latitude, longitude) float32 dask.array<chunksize=(1, 721, 1440), meta=np.ndarray>
    2m_temperature                                    (time, latitude, longitude) float32 dask.array<chunksize=(1, 721, 1440), meta=np.ndarray>
    above_ground                                      (time, level, latitude, longitude) float32 dask.array<chunksize=(1, 13, 721, 1440), meta=np.ndarray>
    ...                                                ...
    volumetric_soil_water_layer_1                     (time, latitude, longitude) float32 dask.array<chunksize=(1, 721, 1440), meta=np.ndarray>
    volumetric_soil_water_layer_2                     (time, latitude, longitude) float32 dask.array<chunksize=(1, 721, 1440), meta=np.ndarray>
    volumetric_soil_water_layer_3                     (time, latitude, longitude) float32 dask.array<chunksize=(1, 721, 1440), meta=np.ndarray>
    volumetric_soil_water_layer_4                     (time, latitude, longitude) float32 dask.array<chunksize=(1, 721, 1440), meta=np.ndarray>
    vorticity                                         (time, level, latitude, longitude) float32 dask.array<chunksize=(1, 13, 721, 1440), meta=np.ndarray>
    wind_speed                                        (time, level, latitude, longitude) float32 dask.array<chunksize=(1, 13, 721, 1440), meta=np.ndarray>

ERA5 Climatology

A climatology is used for e.g. computing anomaly metrics such as the ACC. For WeatherBench 2, the climatology was computed using a running window for smoothing (see paper and script) for each day of year and sixth hour of day. We have computed climatologies for 1990-2017 and 1990-2019.

Location: gs://weatherbench2/datasets/era5-hourly-climatology/

Files:

  • 1990-2017_6h_1440x721.zarr

  • 1990-2017_6h_512x256_equiangular_conservative.zarr

  • 1990-2017_6h_240x121_equiangular_with_poles_conservative.zarr

  • 1990-2017_6h_64x32_equiangular_conservative.zarr

  • 1990-2019_6h_1440x721.zarr

  • 1990-2019_6h_512x256_equiangular_conservative.zarr

  • 1990-2019_6h_240x121_equiangular_with_poles_conservative.zarr

  • 1990-2019_6h_64x32_equiangular_conservative.zarr

xr.open_zarr('gs://weatherbench2/datasets/era5-hourly-climatology/1990-2019_6h_1440x721.zarr')
<xarray.Dataset>
Dimensions:                                         (hour: 4, dayofyear: 366,
                                                     latitude: 721,
                                                     longitude: 1440, level: 13)
Coordinates:
  * dayofyear                                       (dayofyear) int64 1 ... 366
  * hour                                            (hour) int64 0 6 12 18
  * latitude                                        (latitude) float32 90.0 ....
  * level                                           (level) int64 50 ... 1000
  * longitude                                       (longitude) float32 0.0 ....
Data variables: (12/52)
    10m_u_component_of_wind                         (hour, dayofyear, latitude, longitude) float32 dask.array<chunksize=(3, 3, 721, 1440), meta=np.ndarray>
    10m_v_component_of_wind                         (hour, dayofyear, latitude, longitude) float32 dask.array<chunksize=(3, 3, 721, 1440), meta=np.ndarray>
    10m_wind_speed                                  (hour, dayofyear, latitude, longitude) float32 dask.array<chunksize=(3, 3, 721, 1440), meta=np.ndarray>
    2m_dewpoint_temperature                         (hour, dayofyear, latitude, longitude) float32 dask.array<chunksize=(3, 3, 721, 1440), meta=np.ndarray>
    2m_temperature                                  (hour, dayofyear, latitude, longitude) float32 dask.array<chunksize=(3, 3, 721, 1440), meta=np.ndarray>
    ageostrophic_wind_speed                         (hour, dayofyear, level, latitude, longitude) float32 dask.array<chunksize=(3, 3, 1, 721, 1440), meta=np.ndarray>
    ...                                              ...
    volumetric_soil_water_layer_1                   (hour, dayofyear, latitude, longitude) float32 dask.array<chunksize=(3, 3, 721, 1440), meta=np.ndarray>
    volumetric_soil_water_layer_2                   (hour, dayofyear, latitude, longitude) float32 dask.array<chunksize=(3, 3, 721, 1440), meta=np.ndarray>
    volumetric_soil_water_layer_3                   (hour, dayofyear, latitude, longitude) float32 dask.array<chunksize=(3, 3, 721, 1440), meta=np.ndarray>
    volumetric_soil_water_layer_4                   (hour, dayofyear, latitude, longitude) float32 dask.array<chunksize=(3, 3, 721, 1440), meta=np.ndarray>
    vorticity                                       (hour, dayofyear, level, latitude, longitude) float32 dask.array<chunksize=(3, 3, 1, 721, 1440), meta=np.ndarray>
    wind_speed                                      (hour, dayofyear, level, latitude, longitude) float32 dask.array<chunksize=(3, 3, 1, 721, 1440), meta=np.ndarray>

IFS HRES t=0 “Analysis”

To evaluate IFS forecasts, we use the IFS analysis as the ground truth. Note that here we use the initial conditions of the HRES forecasts, i.e. the forecasts at lead time zero as analysis. This is not exactly the same as the analysis dataset provided by ECMWF (see paper for details).

Location: gs://weatherbench2/datasets/hres_t0/

Files:

  • 2016-2022-6h-1440x721.zarr

  • 2016-2022-6h-512x256_equiangular_conservative.zarr

  • 2016-2022-6h-240x121_equiangular_with_poles_conservative.zarr

  • 2016-2022-6h-64x32_equiangular_conservative.zarr

xr.open_zarr('gs://weatherbench2/datasets/hres_t0/2016-2022-6h-1440x721.zarr')
<xarray.Dataset>
Dimensions:                  (time: 10268, latitude: 721, longitude: 1440,
                              level: 13)
Coordinates:
  * latitude                 (latitude) float32 -90.0 -89.75 ... 89.75 90.0
  * level                    (level) int32 50 100 150 200 ... 700 850 925 1000
  * longitude                (longitude) float32 0.0 0.25 0.5 ... 359.5 359.8
  * time                     (time) datetime64[ns] 2016-01-01 ... 2023-01-10T...
Data variables: (12/14)
    10m_u_component_of_wind  (time, latitude, longitude) float32 dask.array<chunksize=(1, 721, 1440), meta=np.ndarray>
    10m_v_component_of_wind  (time, latitude, longitude) float32 dask.array<chunksize=(1, 721, 1440), meta=np.ndarray>
    10m_wind_speed           (time, latitude, longitude) float32 dask.array<chunksize=(1, 721, 1440), meta=np.ndarray>
    2m_temperature           (time, latitude, longitude) float32 dask.array<chunksize=(1, 721, 1440), meta=np.ndarray>
    geopotential             (time, level, latitude, longitude) float32 dask.array<chunksize=(1, 13, 721, 1440), meta=np.ndarray>
    mean_sea_level_pressure  (time, latitude, longitude) float32 dask.array<chunksize=(1, 721, 1440), meta=np.ndarray>
    ...                       ...
    temperature              (time, level, latitude, longitude) float32 dask.array<chunksize=(1, 13, 721, 1440), meta=np.ndarray>
    total_precipitation_6hr  (time, latitude, longitude) float32 dask.array<chunksize=(1, 721, 1440), meta=np.ndarray>
    u_component_of_wind      (time, level, latitude, longitude) float32 dask.array<chunksize=(1, 13, 721, 1440), meta=np.ndarray>
    v_component_of_wind      (time, level, latitude, longitude) float32 dask.array<chunksize=(1, 13, 721, 1440), meta=np.ndarray>
    vertical_velocity        (time, level, latitude, longitude) float32 dask.array<chunksize=(1, 13, 721, 1440), meta=np.ndarray>
    wind_speed               (time, level, latitude, longitude) float32 dask.array<chunksize=(1, 13, 721, 1440), meta=np.ndarray>

Forecast datasets

IFS HRES

Here, we provide the 00 and 12 UTC initializations of HRES.

Location: gs://weatherbench2/datasets/hres/

Files:

  • 2016-2022-0012-1440x721.zarr

  • 2016-2022-0012-512x256_equiangular_conservative.zarr

  • 2016-2022-0012-240x121_equiangular_with_poles_conservative.zarr

  • 2016-2022-0012-64x32_equiangular_conservative.zarr

xr.open_zarr('gs://weatherbench2/datasets/hres/2016-2022-0012-1440x721.zarr')
<xarray.Dataset>
Dimensions:                   (time: 5134, prediction_timedelta: 41,
                               latitude: 721, longitude: 1440, level: 13)
Coordinates:
  * latitude                  (latitude) float32 -90.0 -89.75 ... 89.75 90.0
  * level                     (level) int32 50 100 150 200 ... 700 850 925 1000
  * longitude                 (longitude) float32 0.0 0.25 0.5 ... 359.5 359.8
  * prediction_timedelta      (prediction_timedelta) timedelta64[ns] 00:00:00...
  * time                      (time) datetime64[ns] 2016-01-01 ... 2023-01-10...
Data variables: (12/16)
    10m_u_component_of_wind   (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    10m_v_component_of_wind   (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    10m_wind_speed            (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    2m_temperature            (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    geopotential              (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 13, 721, 1440), meta=np.ndarray>
    mean_sea_level_pressure   (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    ...                        ...
    total_precipitation_24hr  (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    total_precipitation_6hr   (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    u_component_of_wind       (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 13, 721, 1440), meta=np.ndarray>
    v_component_of_wind       (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 13, 721, 1440), meta=np.ndarray>
    vertical_velocity         (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 13, 721, 1440), meta=np.ndarray>
    wind_speed                (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 13, 721, 1440), meta=np.ndarray>