import xarray as xr

WeatherBench 2 Data Guide

One core part of WeatherBench 2 are ready-to-use, cloud-based datasets. This page lists and describes all the available datasets.

The datasets are stored in this public Google Cloud bucket: gs://weatherbench2/datasets. Please also check the LICENSE files for each dataset in the respective GCS buckets. Some datasets allow commercial use. Others only permit research use.

A note on resolutions

We provide the datasets at different resolutions. All files will have the number of longitude X latitude grid points in their filename, e.g. 64x32. For the WeatherBench 2 paper, all evaluation was done at 240x121 = 1.5 degree resolution. All datasets were regridded using first-order conservative regridding, i.e., with weights proportional to the area of overlap between grid cells on the original and desired grids.

The ERA5 resolution files (1440x721 = 0.25 degrees) contain the poles, i.e. -90 and 90 degree latitude. Most regridded files also do, denoted with with_poles. The 512x256 files do not contain the pole grid points.

Ground-truth datasets

ERA5

Our ERA5 datasets were downloaded from the Copernicus Climate Data Store and have a time range from 1959 to 2022 (incl.). The data here have been downsampled to 6h and 13 levels, even though a raw hourly dataset with 37 levels is also available at gs://weatherbench2/datasets/era5/1959-2022-full_37-1h-0p25deg-chunk-1.zarr-v2

Location: gs://weatherbench2/datasets/era5/

Files:

  • 1959-2022-6h-1440x721.zarr

  • 1959-2022-6h-512x256_equiangular_conservative.zarr

  • 1959-2022-6h-240x121_equiangular_with_poles_conservative.zarr

  • 1959-2022-6h-128x64_equiangular_with_poles_conservative.zarr

  • 1959-2022-6h-64x32_equiangular_with_poles_conservative.zarr

See output below for a list of variables. Wind speed was derived using this method.

xr.open_zarr('gs://weatherbench2/datasets/era5/1959-2022-6h-1440x721.zarr')
/opt/miniconda3/envs/weatherbench2/lib/python3.11/site-packages/google/auth/_default.py:79: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK without a quota project. You might receive a "quota exceeded" or "API not enabled" error. We recommend you rerun `gcloud auth application-default login` and make sure a quota project is added. Or you can use service accounts instead. For more information about service accounts, see https://cloud.google.com/docs/authentication/
  warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)
<xarray.Dataset>
Dimensions:                                           (time: 92044,
                                                       latitude: 721,
                                                       longitude: 1440,
                                                       level: 13)
Coordinates:
  * latitude                                          (latitude) float32 90.0...
  * level                                             (level) int64 50 ... 1000
  * longitude                                         (longitude) float32 0.0...
  * time                                              (time) datetime64[ns] 1...
Data variables: (12/38)
    10m_u_component_of_wind                           (time, latitude, longitude) float32 dask.array<chunksize=(1, 721, 1440), meta=np.ndarray>
    10m_v_component_of_wind                           (time, latitude, longitude) float32 dask.array<chunksize=(1, 721, 1440), meta=np.ndarray>
    10m_wind_speed                                    (time, latitude, longitude) float32 dask.array<chunksize=(1, 721, 1440), meta=np.ndarray>
    2m_temperature                                    (time, latitude, longitude) float32 dask.array<chunksize=(1, 721, 1440), meta=np.ndarray>
    angle_of_sub_gridscale_orography                  (latitude, longitude) float32 dask.array<chunksize=(721, 1440), meta=np.ndarray>
    anisotropy_of_sub_gridscale_orography             (latitude, longitude) float32 dask.array<chunksize=(721, 1440), meta=np.ndarray>
    ...                                                ...
    type_of_high_vegetation                           (latitude, longitude) float32 dask.array<chunksize=(721, 1440), meta=np.ndarray>
    type_of_low_vegetation                            (latitude, longitude) float32 dask.array<chunksize=(721, 1440), meta=np.ndarray>
    u_component_of_wind                               (time, level, latitude, longitude) float32 dask.array<chunksize=(1, 13, 721, 1440), meta=np.ndarray>
    v_component_of_wind                               (time, level, latitude, longitude) float32 dask.array<chunksize=(1, 13, 721, 1440), meta=np.ndarray>
    vertical_velocity                                 (time, level, latitude, longitude) float32 dask.array<chunksize=(1, 13, 721, 1440), meta=np.ndarray>
    wind_speed                                        (time, level, latitude, longitude) float32 dask.array<chunksize=(1, 13, 721, 1440), meta=np.ndarray>

ERA5 Climatology

A climatology is used for e.g. computing anomaly metrics such as the ACC. For WeatherBench 2, the climatology was computed using a running window for smoothing (see paper and script) for each day of year and sixth hour of day. We have computed climatologies for 1990-2017 and 1990-2019.

Location: gs://weatherbench2/datasets/era5-hourly-climatology/

Files:

  • 1990-2017_6h_1440x721.zarr

  • 1990-2017_6h_512x256_equiangular_conservative.zarr

  • 1990-2017_6h_240x121_equiangular_with_poles_conservative.zarr

  • 1990-2017_6h_128x64_equiangular_with_poles_conservative.zarr

  • 1990-2017_6h_64x32_equiangular_with_poles_conservative.zarr

  • 1990-2019_6h_1440x721.zarr

  • 1990-2019_6h_512x256_equiangular_conservative.zarr

  • 1990-2019_6h_240x121_equiangular_with_poles_conservative.zarr

  • 1990-2019_6h_128x64_equiangular_with_poles_conservative.zarr

  • 1990-2019_6h_64x32_equiangular_with_poles_conservative.zarr

xr.open_zarr('gs://weatherbench2/datasets/era5-hourly-climatology/1990-2019_6h_1440x721.zarr')
<xarray.Dataset>
Dimensions:                                      (hour: 4, dayofyear: 366,
                                                  latitude: 721,
                                                  longitude: 1440, level: 13)
Coordinates:
  * dayofyear                                    (dayofyear) int64 1 2 ... 366
  * hour                                         (hour) int64 0 6 12 18
  * latitude                                     (latitude) float32 90.0 ... ...
  * level                                        (level) int64 50 100 ... 1000
  * longitude                                    (longitude) float32 0.0 ... ...
Data variables: (12/28)
    10m_u_component_of_wind                      (hour, dayofyear, latitude, longitude) float32 dask.array<chunksize=(3, 3, 721, 1440), meta=np.ndarray>
    10m_v_component_of_wind                      (hour, dayofyear, latitude, longitude) float32 dask.array<chunksize=(3, 3, 721, 1440), meta=np.ndarray>
    10m_wind_speed                               (hour, dayofyear, latitude, longitude) float32 dask.array<chunksize=(3, 3, 721, 1440), meta=np.ndarray>
    2m_temperature                               (hour, dayofyear, latitude, longitude) float32 dask.array<chunksize=(3, 3, 721, 1440), meta=np.ndarray>
    geopotential                                 (hour, dayofyear, level, latitude, longitude) float32 dask.array<chunksize=(3, 3, 1, 721, 1440), meta=np.ndarray>
    mean_sea_level_pressure                      (hour, dayofyear, latitude, longitude) float32 dask.array<chunksize=(3, 3, 721, 1440), meta=np.ndarray>
    ...                                           ...
    total_precipitation_6hr_seeps_dry_fraction   (hour, dayofyear, latitude, longitude) float32 dask.array<chunksize=(3, 3, 721, 1440), meta=np.ndarray>
    total_precipitation_6hr_seeps_threshold      (hour, dayofyear, latitude, longitude) float32 dask.array<chunksize=(3, 3, 721, 1440), meta=np.ndarray>
    u_component_of_wind                          (hour, dayofyear, level, latitude, longitude) float32 dask.array<chunksize=(3, 3, 1, 721, 1440), meta=np.ndarray>
    v_component_of_wind                          (hour, dayofyear, level, latitude, longitude) float32 dask.array<chunksize=(3, 3, 1, 721, 1440), meta=np.ndarray>
    vertical_velocity                            (hour, dayofyear, level, latitude, longitude) float32 dask.array<chunksize=(3, 3, 1, 721, 1440), meta=np.ndarray>
    wind_speed                                   (hour, dayofyear, level, latitude, longitude) float32 dask.array<chunksize=(3, 3, 1, 721, 1440), meta=np.ndarray>

IFS HRES t=0 “Analysis”

To evaluate IFS forecasts, we use the IFS analysis as the ground truth. Note that here we use the initial conditions of the HRES forecasts, i.e. the forecasts at lead time zero as analysis. This is not exactly the same as the analysis dataset provided by ECMWF (see paper for details).

Location: gs://weatherbench2/datasets/hres_t0/

Files:

  • 2016-2022-6h-1440x721.zarr

  • 2016-2022-6h-512x256_equiangular_conservative.zarr

  • 2016-2022-6h-240x121_equiangular_with_poles_conservative.zarr

  • 2016-2022-6h-128x64_equiangular_with_poles_conservative.zarr

  • 2016-2022-6h-64x32_equiangular_with_poles_conservative.zarr

xr.open_zarr('gs://weatherbench2/datasets/hres_t0/2016-2022-6h-1440x721.zarr')
<xarray.Dataset>
Dimensions:                  (time: 10268, latitude: 721, longitude: 1440,
                              level: 13)
Coordinates:
  * latitude                 (latitude) float32 -90.0 -89.75 ... 89.75 90.0
  * level                    (level) int32 50 100 150 200 ... 700 850 925 1000
  * longitude                (longitude) float32 0.0 0.25 0.5 ... 359.5 359.8
  * time                     (time) datetime64[ns] 2016-01-01 ... 2023-01-10T...
Data variables: (12/14)
    10m_u_component_of_wind  (time, latitude, longitude) float32 dask.array<chunksize=(1, 721, 1440), meta=np.ndarray>
    10m_v_component_of_wind  (time, latitude, longitude) float32 dask.array<chunksize=(1, 721, 1440), meta=np.ndarray>
    10m_wind_speed           (time, latitude, longitude) float32 dask.array<chunksize=(1, 721, 1440), meta=np.ndarray>
    2m_temperature           (time, latitude, longitude) float32 dask.array<chunksize=(1, 721, 1440), meta=np.ndarray>
    geopotential             (time, level, latitude, longitude) float32 dask.array<chunksize=(1, 13, 721, 1440), meta=np.ndarray>
    mean_sea_level_pressure  (time, latitude, longitude) float32 dask.array<chunksize=(1, 721, 1440), meta=np.ndarray>
    ...                       ...
    temperature              (time, level, latitude, longitude) float32 dask.array<chunksize=(1, 13, 721, 1440), meta=np.ndarray>
    total_precipitation_6hr  (time, latitude, longitude) float32 dask.array<chunksize=(1, 721, 1440), meta=np.ndarray>
    u_component_of_wind      (time, level, latitude, longitude) float32 dask.array<chunksize=(1, 13, 721, 1440), meta=np.ndarray>
    v_component_of_wind      (time, level, latitude, longitude) float32 dask.array<chunksize=(1, 13, 721, 1440), meta=np.ndarray>
    vertical_velocity        (time, level, latitude, longitude) float32 dask.array<chunksize=(1, 13, 721, 1440), meta=np.ndarray>
    wind_speed               (time, level, latitude, longitude) float32 dask.array<chunksize=(1, 13, 721, 1440), meta=np.ndarray>

Forecast datasets

IFS HRES

Here, we provide the 00 and 12 UTC initializations of HRES.

Location: gs://weatherbench2/datasets/hres/

Files:

  • 2016-2022-0012-1440x721.zarr

  • 2016-2022-0012-512x256_equiangular_conservative.zarr

  • 2016-2022-0012-240x121_equiangular_with_poles_conservative.zarr

  • 2016-2022-0012-128x64_equiangular_with_poles_conservative.zarr

  • 2016-2022-0012-64x32_equiangular_with_poles_conservative.zarr

xr.open_zarr('gs://weatherbench2/datasets/hres/2016-2022-0012-1440x721.zarr')
<xarray.Dataset>
Dimensions:                   (time: 5134, prediction_timedelta: 41,
                               latitude: 721, longitude: 1440, level: 13)
Coordinates:
  * latitude                  (latitude) float32 -90.0 -89.75 ... 89.75 90.0
  * level                     (level) int32 50 100 150 200 ... 700 850 925 1000
  * longitude                 (longitude) float32 0.0 0.25 0.5 ... 359.5 359.8
  * prediction_timedelta      (prediction_timedelta) timedelta64[ns] 00:00:00...
  * time                      (time) datetime64[ns] 2016-01-01 ... 2023-01-10...
Data variables: (12/16)
    10m_u_component_of_wind   (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    10m_v_component_of_wind   (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    10m_wind_speed            (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    2m_temperature            (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    geopotential              (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 13, 721, 1440), meta=np.ndarray>
    mean_sea_level_pressure   (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    ...                        ...
    total_precipitation_24hr  (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    total_precipitation_6hr   (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    u_component_of_wind       (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 13, 721, 1440), meta=np.ndarray>
    v_component_of_wind       (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 13, 721, 1440), meta=np.ndarray>
    vertical_velocity         (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 13, 721, 1440), meta=np.ndarray>
    wind_speed                (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 13, 721, 1440), meta=np.ndarray>

IFS ENS

Downloading the full ensemble takes a very long time. We downloaded ensemble data from the TIGGE archive for 2018 and 2020. More years are still being downloaded.

All data from the TIGGE archive can only be used for research purposes. Please check the license for more specific constraints.

Location: gs://weatherbench2/datasets/ens/

Files:

  • 2018-1440x721.zarr

  • 2018-512x256_equiangular_conservative.zarr

  • 2018-240x121_equiangular_with_poles_conservative.zarr

  • 2018-128x64_equiangular_with_poles_conservative.zarr

  • 2018-64x32_equiangular_with_poles_conservative.zarr

  • 2020-1440x721.zarr

  • 2020-512x256_equiangular_conservative.zarr

  • 2020-240x121_equiangular_with_poles_conservative.zarr

  • 2020-128x64_equiangular_with_poles_conservative.zarr

  • 2020-64x32_equiangular_with_poles_conservative.zarr

xr.open_zarr('gs://weatherbench2/datasets/ens/2020-1440x721.zarr')
<xarray.Dataset>
Dimensions:                   (time: 732, number: 50, prediction_timedelta: 61,
                               latitude: 721, longitude: 1440, level: 3)
Coordinates:
  * latitude                  (latitude) float32 -90.0 -89.75 ... 89.75 90.0
  * level                     (level) int32 500 700 850
  * longitude                 (longitude) float32 0.0 0.25 0.5 ... 359.5 359.8
  * number                    (number) int32 1 2 3 4 5 6 7 ... 45 46 47 48 49 50
  * prediction_timedelta      (prediction_timedelta) timedelta64[ns] 00:00:00...
  * time                      (time) datetime64[ns] 2020-01-01 ... 2020-12-31...
Data variables: (12/14)
    10m_u_component_of_wind   (time, number, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 50, 1, 721, 1440), meta=np.ndarray>
    10m_v_component_of_wind   (time, number, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 50, 1, 721, 1440), meta=np.ndarray>
    10m_wind_speed            (time, number, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 50, 1, 721, 1440), meta=np.ndarray>
    2m_temperature            (time, number, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 50, 1, 721, 1440), meta=np.ndarray>
    geopotential              (time, number, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 50, 1, 3, 721, 1440), meta=np.ndarray>
    mean_sea_level_pressure   (time, number, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 50, 1, 721, 1440), meta=np.ndarray>
    ...                        ...
    total_precipitation       (time, number, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 50, 1, 721, 1440), meta=np.ndarray>
    total_precipitation_24hr  (time, number, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 50, 1, 721, 1440), meta=np.ndarray>
    total_precipitation_6hr   (time, number, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 50, 1, 721, 1440), meta=np.ndarray>
    u_component_of_wind       (time, number, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 50, 1, 3, 721, 1440), meta=np.ndarray>
    v_component_of_wind       (time, number, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 50, 1, 3, 721, 1440), meta=np.ndarray>
    wind_speed                (time, number, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 50, 1, 3, 721, 1440), meta=np.ndarray>

IFS ENS mean

We also compute the ensemble mean and save it as a separate file, e.g. to use it as a deterministic baseline.

Location: gs://weatherbench2/datasets/ens/

Files:

  • 2018-1440x721_mean.zarr

  • 2018-512x256_equiangular_conservative_mean.zarr

  • 2018-240x121_equiangular_with_poles_conservative_mean.zarr

  • 2018-128x64_equiangular_with_poles_conservative_mean.zarr

  • 2018-64x32_equiangular_with_poles_conservative_mean.zarr

  • 2020-1440x721_mean.zarr

  • 2020-512x256_equiangular_conservative_mean.zarr

  • 2020-240x121_equiangular_with_poles_conservative_mean.zarr

  • 2020-128x64_equiangular_with_poles_conservative_mean.zarr

  • 2020-64x32_equiangular_with_poles_conservative_mean.zarr

xr.open_zarr('gs://weatherbench2/datasets/ens/2020-1440x721_mean.zarr')
<xarray.Dataset>
Dimensions:                   (time: 732, prediction_timedelta: 61,
                               latitude: 721, longitude: 1440, level: 3)
Coordinates:
  * latitude                  (latitude) float32 -90.0 -89.75 ... 89.75 90.0
  * level                     (level) int32 500 700 850
  * longitude                 (longitude) float32 0.0 0.25 0.5 ... 359.5 359.8
  * prediction_timedelta      (prediction_timedelta) timedelta64[ns] 00:00:00...
  * time                      (time) datetime64[ns] 2020-01-01 ... 2020-12-31...
Data variables: (12/14)
    10m_u_component_of_wind   (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    10m_v_component_of_wind   (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    10m_wind_speed            (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    2m_temperature            (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    geopotential              (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 3, 721, 1440), meta=np.ndarray>
    mean_sea_level_pressure   (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    ...                        ...
    total_precipitation       (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    total_precipitation_24hr  (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    total_precipitation_6hr   (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    u_component_of_wind       (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 3, 721, 1440), meta=np.ndarray>
    v_component_of_wind       (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 3, 721, 1440), meta=np.ndarray>
    wind_speed                (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 3, 721, 1440), meta=np.ndarray>