import xarray as xr

WeatherBench 2 Data Guide

One core part of WeatherBench 2 are ready-to-use, cloud-based datasets. This page lists and describes all the available datasets.

The datasets are stored in this public Google Cloud bucket: gs://weatherbench2/datasets.

Please also check the LICENSE files for each dataset in the respective GCS buckets. Some datasets allow commercial use. Others only permit research use.

A note on resolutions

We provide the datasets at different resolutions. All files will have the number of longitude X latitude grid points in their filename, e.g. 64x32. For the WeatherBench 2 paper, all evaluation was done at 240x121 = 1.5 degree resolution. All datasets were regridded using first-order conservative regridding, i.e., with weights proportional to the area of overlap between grid cells on the original and desired grids.

The 1440x721 (= 0.25 degrees) and 240x121 files contain the poles, i.e. -90 and 90 degree latitude, denoted with with_poles. 64x32 files do not contain the poles to ensure equal spacing.

Ground-truth datasets

ERA5

Our ERA5 datasets were downloaded from the Copernicus Climate Data Store and have a time range from 1959 to 2023 (incl.). The data here have been downsampled to 6h and 13 levels, even though a raw hourly dataset with 37 levels is also available at gs://weatherbench2/datasets/era5/1959-2023_01_10-full_37-1h-0p25deg-chunk-1.zarr

Location: gs://weatherbench2/datasets/era5/

Files:

  • 1959-2023_01_10-full_37-1h-0p25deg-chunk-1.zarr

  • 1959-2023_01_10-wb13-6h-1440x721_with_derived_variables.zarr

  • 1959-2023_01_10-6h-240x121_equiangular_with_poles_conservative.zarr

  • 1959-2023_01_10-6h-64x32_equiangular_conservative.zarr

Note: Older version of the ERA5 files exist in the bucket to ensure continuity.

See output below for a list of variables. The file also contains several derived variables which were computed using these methods.

xr.open_zarr('gs://weatherbench2/datasets/era5/1959-2023_01_10-wb13-6h-1440x721_with_derived_variables.zarr')
<xarray.Dataset>
Dimensions:                                           (time: 93544,
                                                       latitude: 721,
                                                       longitude: 1440,
                                                       level: 13)
Coordinates:
  * latitude                                          (latitude) float32 90.0...
  * level                                             (level) int64 50 ... 1000
  * longitude                                         (longitude) float32 0.0...
  * time                                              (time) datetime64[ns] 1...
Data variables: (12/62)
    10m_u_component_of_wind                           (time, latitude, longitude) float32 dask.array<chunksize=(1, 721, 1440), meta=np.ndarray>
    10m_v_component_of_wind                           (time, latitude, longitude) float32 dask.array<chunksize=(1, 721, 1440), meta=np.ndarray>
    10m_wind_speed                                    (time, latitude, longitude) float32 dask.array<chunksize=(1, 721, 1440), meta=np.ndarray>
    2m_dewpoint_temperature                           (time, latitude, longitude) float32 dask.array<chunksize=(1, 721, 1440), meta=np.ndarray>
    2m_temperature                                    (time, latitude, longitude) float32 dask.array<chunksize=(1, 721, 1440), meta=np.ndarray>
    above_ground                                      (time, level, latitude, longitude) float32 dask.array<chunksize=(1, 13, 721, 1440), meta=np.ndarray>
    ...                                                ...
    volumetric_soil_water_layer_1                     (time, latitude, longitude) float32 dask.array<chunksize=(1, 721, 1440), meta=np.ndarray>
    volumetric_soil_water_layer_2                     (time, latitude, longitude) float32 dask.array<chunksize=(1, 721, 1440), meta=np.ndarray>
    volumetric_soil_water_layer_3                     (time, latitude, longitude) float32 dask.array<chunksize=(1, 721, 1440), meta=np.ndarray>
    volumetric_soil_water_layer_4                     (time, latitude, longitude) float32 dask.array<chunksize=(1, 721, 1440), meta=np.ndarray>
    vorticity                                         (time, level, latitude, longitude) float32 dask.array<chunksize=(1, 13, 721, 1440), meta=np.ndarray>
    wind_speed                                        (time, level, latitude, longitude) float32 dask.array<chunksize=(1, 13, 721, 1440), meta=np.ndarray>

ERA5 Climatology

A climatology is used for e.g. computing anomaly metrics such as the ACC. For WeatherBench 2, the climatology was computed using a running window for smoothing (see paper and script) for each day of year and sixth hour of day. We have computed climatologies for 1990-2017 and 1990-2019.

Location: gs://weatherbench2/datasets/era5-hourly-climatology/

Files:

  • 1990-2017_6h_1440x721.zarr

  • 1990-2017_6h_512x256_equiangular_conservative.zarr

  • 1990-2017_6h_240x121_equiangular_with_poles_conservative.zarr

  • 1990-2017_6h_64x32_equiangular_conservative.zarr

  • 1990-2019_6h_1440x721.zarr

  • 1990-2019_6h_512x256_equiangular_conservative.zarr

  • 1990-2019_6h_240x121_equiangular_with_poles_conservative.zarr

  • 1990-2019_6h_64x32_equiangular_conservative.zarr

xr.open_zarr('gs://weatherbench2/datasets/era5-hourly-climatology/1990-2019_6h_1440x721.zarr')
<xarray.Dataset>
Dimensions:                                         (hour: 4, dayofyear: 366,
                                                     latitude: 721,
                                                     longitude: 1440, level: 13)
Coordinates:
  * dayofyear                                       (dayofyear) int64 1 ... 366
  * hour                                            (hour) int64 0 6 12 18
  * latitude                                        (latitude) float32 90.0 ....
  * level                                           (level) int64 50 ... 1000
  * longitude                                       (longitude) float32 0.0 ....
Data variables: (12/52)
    10m_u_component_of_wind                         (hour, dayofyear, latitude, longitude) float32 dask.array<chunksize=(3, 3, 721, 1440), meta=np.ndarray>
    10m_v_component_of_wind                         (hour, dayofyear, latitude, longitude) float32 dask.array<chunksize=(3, 3, 721, 1440), meta=np.ndarray>
    10m_wind_speed                                  (hour, dayofyear, latitude, longitude) float32 dask.array<chunksize=(3, 3, 721, 1440), meta=np.ndarray>
    2m_dewpoint_temperature                         (hour, dayofyear, latitude, longitude) float32 dask.array<chunksize=(3, 3, 721, 1440), meta=np.ndarray>
    2m_temperature                                  (hour, dayofyear, latitude, longitude) float32 dask.array<chunksize=(3, 3, 721, 1440), meta=np.ndarray>
    ageostrophic_wind_speed                         (hour, dayofyear, level, latitude, longitude) float32 dask.array<chunksize=(3, 3, 1, 721, 1440), meta=np.ndarray>
    ...                                              ...
    volumetric_soil_water_layer_1                   (hour, dayofyear, latitude, longitude) float32 dask.array<chunksize=(3, 3, 721, 1440), meta=np.ndarray>
    volumetric_soil_water_layer_2                   (hour, dayofyear, latitude, longitude) float32 dask.array<chunksize=(3, 3, 721, 1440), meta=np.ndarray>
    volumetric_soil_water_layer_3                   (hour, dayofyear, latitude, longitude) float32 dask.array<chunksize=(3, 3, 721, 1440), meta=np.ndarray>
    volumetric_soil_water_layer_4                   (hour, dayofyear, latitude, longitude) float32 dask.array<chunksize=(3, 3, 721, 1440), meta=np.ndarray>
    vorticity                                       (hour, dayofyear, level, latitude, longitude) float32 dask.array<chunksize=(3, 3, 1, 721, 1440), meta=np.ndarray>
    wind_speed                                      (hour, dayofyear, level, latitude, longitude) float32 dask.array<chunksize=(3, 3, 1, 721, 1440), meta=np.ndarray>

IFS HRES t=0 “Analysis”

To evaluate IFS forecasts, we use the IFS analysis as the ground truth. Note that here we use the initial conditions of the HRES forecasts, i.e. the forecasts at lead time zero as analysis. This is not exactly the same as the analysis dataset provided by ECMWF (see paper for details).

Location: gs://weatherbench2/datasets/hres_t0/

Files:

  • 2016-2022-6h-1440x721.zarr

  • 2016-2022-6h-512x256_equiangular_conservative.zarr

  • 2016-2022-6h-240x121_equiangular_with_poles_conservative.zarr

  • 2016-2022-6h-64x32_equiangular_conservative.zarr

xr.open_zarr('gs://weatherbench2/datasets/hres_t0/2016-2022-6h-1440x721.zarr')
<xarray.Dataset>
Dimensions:                  (time: 10268, latitude: 721, longitude: 1440,
                              level: 13)
Coordinates:
  * latitude                 (latitude) float32 -90.0 -89.75 ... 89.75 90.0
  * level                    (level) int32 50 100 150 200 ... 700 850 925 1000
  * longitude                (longitude) float32 0.0 0.25 0.5 ... 359.5 359.8
  * time                     (time) datetime64[ns] 2016-01-01 ... 2023-01-10T...
Data variables: (12/14)
    10m_u_component_of_wind  (time, latitude, longitude) float32 dask.array<chunksize=(1, 721, 1440), meta=np.ndarray>
    10m_v_component_of_wind  (time, latitude, longitude) float32 dask.array<chunksize=(1, 721, 1440), meta=np.ndarray>
    10m_wind_speed           (time, latitude, longitude) float32 dask.array<chunksize=(1, 721, 1440), meta=np.ndarray>
    2m_temperature           (time, latitude, longitude) float32 dask.array<chunksize=(1, 721, 1440), meta=np.ndarray>
    geopotential             (time, level, latitude, longitude) float32 dask.array<chunksize=(1, 13, 721, 1440), meta=np.ndarray>
    mean_sea_level_pressure  (time, latitude, longitude) float32 dask.array<chunksize=(1, 721, 1440), meta=np.ndarray>
    ...                       ...
    temperature              (time, level, latitude, longitude) float32 dask.array<chunksize=(1, 13, 721, 1440), meta=np.ndarray>
    total_precipitation_6hr  (time, latitude, longitude) float32 dask.array<chunksize=(1, 721, 1440), meta=np.ndarray>
    u_component_of_wind      (time, level, latitude, longitude) float32 dask.array<chunksize=(1, 13, 721, 1440), meta=np.ndarray>
    v_component_of_wind      (time, level, latitude, longitude) float32 dask.array<chunksize=(1, 13, 721, 1440), meta=np.ndarray>
    vertical_velocity        (time, level, latitude, longitude) float32 dask.array<chunksize=(1, 13, 721, 1440), meta=np.ndarray>
    wind_speed               (time, level, latitude, longitude) float32 dask.array<chunksize=(1, 13, 721, 1440), meta=np.ndarray>

Forecast datasets

IFS HRES

Here, we provide the 00 and 12 UTC initializations of HRES.

Location: gs://weatherbench2/datasets/hres/

Files:

  • 2016-2022-0012-1440x721.zarr

  • 2016-2022-0012-512x256_equiangular_conservative.zarr

  • 2016-2022-0012-240x121_equiangular_with_poles_conservative.zarr

  • 2016-2022-0012-64x32_equiangular_conservative.zarr

xr.open_zarr('gs://weatherbench2/datasets/hres/2016-2022-0012-1440x721.zarr')
<xarray.Dataset>
Dimensions:                   (time: 5134, prediction_timedelta: 41,
                               latitude: 721, longitude: 1440, level: 13)
Coordinates:
  * latitude                  (latitude) float32 -90.0 -89.75 ... 89.75 90.0
  * level                     (level) int32 50 100 150 200 ... 700 850 925 1000
  * longitude                 (longitude) float32 0.0 0.25 0.5 ... 359.5 359.8
  * prediction_timedelta      (prediction_timedelta) timedelta64[ns] 00:00:00...
  * time                      (time) datetime64[ns] 2016-01-01 ... 2023-01-10...
Data variables: (12/16)
    10m_u_component_of_wind   (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    10m_v_component_of_wind   (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    10m_wind_speed            (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    2m_temperature            (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    geopotential              (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 13, 721, 1440), meta=np.ndarray>
    mean_sea_level_pressure   (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    ...                        ...
    total_precipitation_24hr  (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    total_precipitation_6hr   (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    u_component_of_wind       (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 13, 721, 1440), meta=np.ndarray>
    v_component_of_wind       (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 13, 721, 1440), meta=np.ndarray>
    vertical_velocity         (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 13, 721, 1440), meta=np.ndarray>
    wind_speed                (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 13, 721, 1440), meta=np.ndarray>

IFS ENS

Downloading the full ensemble takes a very long time. We downloaded ensemble data from the TIGGE archive for 2018 to 2022 (incl.).

All data from the TIGGE archive can only be used for research purposes. Please check the license for more specific constraints.

Location: gs://weatherbench2/datasets/ifs_ens/

Files:

  • 2018-2022-1440x721.zarr

  • 2018-2022-240x121_equiangular_with_poles_conservative.zarr

  • 2018-2022-64x32_equiangular_conservative.zarr

xr.open_zarr('gs://weatherbench2/datasets/ifs_ens/2018-2022-1440x721.zarr')
<xarray.Dataset>
Dimensions:                   (time: 3652, number: 50,
                               prediction_timedelta: 61, latitude: 721,
                               longitude: 1440, level: 3)
Coordinates:
  * latitude                  (latitude) float32 -90.0 -89.75 ... 89.75 90.0
  * level                     (level) int32 500 700 850
  * longitude                 (longitude) float32 0.0 0.25 0.5 ... 359.5 359.8
  * number                    (number) int32 1 2 3 4 5 6 7 ... 45 46 47 48 49 50
  * prediction_timedelta      (prediction_timedelta) timedelta64[ns] 00:00:00...
  * time                      (time) datetime64[ns] 2018-01-01 ... 2022-12-31...
Data variables: (12/15)
    10m_u_component_of_wind   (time, number, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 50, 1, 721, 1440), meta=np.ndarray>
    10m_v_component_of_wind   (time, number, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 50, 1, 721, 1440), meta=np.ndarray>
    10m_wind_speed            (time, number, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 50, 1, 721, 1440), meta=np.ndarray>
    2m_temperature            (time, number, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 50, 1, 721, 1440), meta=np.ndarray>
    geopotential              (time, number, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 50, 1, 3, 721, 1440), meta=np.ndarray>
    mean_sea_level_pressure   (time, number, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 50, 1, 721, 1440), meta=np.ndarray>
    ...                        ...
    total_precipitation       (time, number, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 50, 1, 721, 1440), meta=np.ndarray>
    total_precipitation_24hr  (time, number, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 50, 1, 721, 1440), meta=np.ndarray>
    total_precipitation_6hr   (time, number, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 50, 1, 721, 1440), meta=np.ndarray>
    u_component_of_wind       (time, number, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 50, 1, 3, 721, 1440), meta=np.ndarray>
    v_component_of_wind       (time, number, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 50, 1, 3, 721, 1440), meta=np.ndarray>
    wind_speed                (time, number, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 50, 1, 3, 721, 1440), meta=np.ndarray>

IFS ENS mean

We also compute the ensemble mean and save it as a separate file, e.g. to use it as a deterministic baseline.

Location: gs://weatherbench2/datasets/ens/

Files:

  • 2018-2022-1440x721_mean.zarr

  • 2018-2022-240x121_equiangular_with_poles_conservative_mean.zarr

  • 2018-2022-64x32_equiangular_conservative_mean.zarr

xr.open_zarr('gs://weatherbench2/datasets/ifs_ens/2018-2022-1440x721_mean.zarr')
<xarray.Dataset>
Dimensions:                   (time: 3652, prediction_timedelta: 61,
                               latitude: 721, longitude: 1440, level: 3)
Coordinates:
  * latitude                  (latitude) float32 -90.0 -89.75 ... 89.75 90.0
  * level                     (level) int32 500 700 850
  * longitude                 (longitude) float32 0.0 0.25 0.5 ... 359.5 359.8
  * prediction_timedelta      (prediction_timedelta) timedelta64[ns] 00:00:00...
  * time                      (time) datetime64[ns] 2018-01-01 ... 2022-12-31...
Data variables: (12/15)
    10m_u_component_of_wind   (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    10m_v_component_of_wind   (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    10m_wind_speed            (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    2m_temperature            (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    geopotential              (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 3, 721, 1440), meta=np.ndarray>
    mean_sea_level_pressure   (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    ...                        ...
    total_precipitation       (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    total_precipitation_24hr  (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    total_precipitation_6hr   (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    u_component_of_wind       (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 3, 721, 1440), meta=np.ndarray>
    v_component_of_wind       (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 3, 721, 1440), meta=np.ndarray>
    wind_speed                (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 3, 721, 1440), meta=np.ndarray>

ERA5 forecast

As an apples-to-apples baseline for ML forecasts trained and evaluated with ERA5, we downloaded the ERA5 forecasts, a set of experimental forecasts run by ECMWF using the ERA5 IFS version, starting from ERA5 initial conditions. We downloaded data for 2018 and 2020.

Location: gs://weatherbench2/datasets/era5-forecasts/

Files:

  • 2018-1440x721.zarr/

  • 2018-240x121_equiangular_with_poles_conservative.zarr

  • 2018-64x32_equiangular_conservative.zarr

  • 2020-1440x721.zarr/

  • 2020-240x121_equiangular_with_poles_conservative.zarr

  • 2020-64x32_equiangular_conservative.zarr

xr.open_zarr('gs://weatherbench2/datasets/era5-forecasts/2020-1440x721.zarr')
<xarray.Dataset>
Dimensions:                  (time: 732, prediction_timedelta: 31,
                              latitude: 721, longitude: 1440, level: 3)
Coordinates:
  * latitude                 (latitude) float32 -90.0 -89.75 ... 89.75 90.0
  * level                    (level) int32 500 700 850
  * longitude                (longitude) float32 0.0 0.25 0.5 ... 359.5 359.8
  * prediction_timedelta     (prediction_timedelta) timedelta64[ns] 00:00:00 ...
  * time                     (time) datetime64[ns] 2020-01-01 ... 2020-12-31T...
Data variables:
    10m_u_component_of_wind  (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    10m_v_component_of_wind  (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    10m_wind_speed           (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    2m_temperature           (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    geopotential             (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 3, 721, 1440), meta=np.ndarray>
    mean_sea_level_pressure  (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    specific_humidity        (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 3, 721, 1440), meta=np.ndarray>
    temperature              (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 3, 721, 1440), meta=np.ndarray>
    u_component_of_wind      (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 3, 721, 1440), meta=np.ndarray>
    v_component_of_wind      (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 3, 721, 1440), meta=np.ndarray>
    vertical_velocity        (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 3, 721, 1440), meta=np.ndarray>
    wind_speed               (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 3, 721, 1440), meta=np.ndarray>

Keisler (2022)

Ryan Keisler provided us with forecast using the Graph Neural Network from his 2022 paper.

Location: gs://weatherbench2/datasets/keisler/

Files:

  • 2020-360x181.zarr

  • 2020-240x121_equiangular_with_poles_conservative.zarr

  • 2020-64x32_equiangular_conservative.zarr

xr.open_zarr('gs://weatherbench2/datasets/keisler/2020-360x181.zarr')
<xarray.Dataset>
Dimensions:               (level: 3, time: 732, prediction_timedelta: 41,
                           latitude: 181, longitude: 360)
Coordinates:
  * latitude              (latitude) float64 90.0 89.0 88.0 ... -89.0 -90.0
  * level                 (level) int64 500 700 850
  * longitude             (longitude) float64 0.0 1.0 2.0 ... 357.0 358.0 359.0
  * prediction_timedelta  (prediction_timedelta) timedelta64[ns] 00:00:00 ......
  * time                  (time) datetime64[ns] 2020-01-01 ... 2020-12-31T12:...
Data variables:
    geopotential          (level, time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 41, 181, 360), meta=np.ndarray>
    specific_humidity     (level, time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 41, 181, 360), meta=np.ndarray>
    temperature           (level, time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 41, 181, 360), meta=np.ndarray>
    u_component_of_wind   (level, time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 41, 181, 360), meta=np.ndarray>
    v_component_of_wind   (level, time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 41, 181, 360), meta=np.ndarray>
    wind_speed            (level, time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 41, 181, 360), meta=np.ndarray>

Pangu-Weather

We ran the Pangu model using the code available on GitHub.

Location: gs://weatherbench2/datasets/pangu/

Files:

  • 2018-2022_0012_0p25.zarr

  • 2018-2022_0012_240x121_equiangular_with_poles_conservative.zarr

  • 2018-2022_0012_64x32_equiangular_conservative.zarr

xr.open_zarr('gs://weatherbench2/datasets/pangu/2018-2022_0012_0p25.zarr')
<xarray.Dataset>
Dimensions:                  (time: 3652, prediction_timedelta: 40,
                              latitude: 721, longitude: 1440, level: 13)
Coordinates:
  * latitude                 (latitude) float32 90.0 89.75 89.5 ... -89.75 -90.0
  * level                    (level) int64 1000 925 850 700 ... 200 150 100 50
  * longitude                (longitude) float32 0.0 0.25 0.5 ... 359.5 359.8
  * prediction_timedelta     (prediction_timedelta) timedelta64[ns] 06:00:00 ...
  * time                     (time) datetime64[ns] 2018-01-01 ... 2022-12-31T...
Data variables:
    10m_u_component_of_wind  (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    10m_v_component_of_wind  (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    10m_wind_speed           (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    2m_temperature           (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    geopotential             (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 13, 721, 1440), meta=np.ndarray>
    mean_sea_level_pressure  (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    specific_humidity        (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 13, 721, 1440), meta=np.ndarray>
    temperature              (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 13, 721, 1440), meta=np.ndarray>
    u_component_of_wind      (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 13, 721, 1440), meta=np.ndarray>
    v_component_of_wind      (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 13, 721, 1440), meta=np.ndarray>
    wind_speed               (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 13, 721, 1440), meta=np.ndarray>

Pangu-Weather (operational)

We also ran Pangu in a quasi-operational setup with IFS HRES initial conditions.

Location: gs://weatherbench2/datasets/pangu_hres_init/

Files:

  • 2020_0012_0p25.zarr

  • 2020_0012_240x121_equiangular_with_poles_conservative.zarr

  • 2020_0012_64x32_equiangular_conservative.zarr

  • 2021_0012_0p25.zarr

  • 2021_0012_240x121_equiangular_with_poles_conservative.zarr

  • 2021_0012_64x32_equiangular_conservative.zarr

  • 2022_0012_0p25.zarr

  • 2022_0012_240x121_equiangular_with_poles_conservative.zarr

  • 2022_0012_64x32_equiangular_conservative.zarr

xr.open_zarr('gs://weatherbench2/datasets/pangu_hres_init/2020_0012_0p25.zarr')
<xarray.Dataset>
Dimensions:                  (time: 732, prediction_timedelta: 40,
                              latitude: 721, longitude: 1440, level: 13)
Coordinates:
  * latitude                 (latitude) float32 90.0 89.75 89.5 ... -89.75 -90.0
  * level                    (level) int64 1000 925 850 700 ... 200 150 100 50
  * longitude                (longitude) float32 0.0 0.25 0.5 ... 359.5 359.8
  * prediction_timedelta     (prediction_timedelta) timedelta64[ns] 06:00:00 ...
  * time                     (time) datetime64[ns] 2020-01-01 ... 2020-12-31T...
Data variables:
    10m_u_component_of_wind  (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    10m_v_component_of_wind  (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    10m_wind_speed           (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    2m_temperature           (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    geopotential             (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 13, 721, 1440), meta=np.ndarray>
    mean_sea_level_pressure  (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    specific_humidity        (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 13, 721, 1440), meta=np.ndarray>
    temperature              (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 13, 721, 1440), meta=np.ndarray>
    u_component_of_wind      (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 13, 721, 1440), meta=np.ndarray>
    v_component_of_wind      (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 13, 721, 1440), meta=np.ndarray>
    wind_speed               (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 1, 13, 721, 1440), meta=np.ndarray>

GraphCast

GraphCast forecasts are available for 2018 and 2020. As described in the paper, the 2018 forecast were created with a model trained with data up to and including 2017, while the 2020 forecasts were created with a model trained with data up to and including 2019. These forecasts are initialized using ERA5.

Location: gs://weatherbench2/datasets/graphcast/

  • Files (see directory above for exact file names for each year):

  • date_range_YYYY-11-16_XXXX-02-01_12_hours_derived.zarr

  • date_range_YYYY-11-16_XXXX-02-01_12_hours-240x121_equiangular_with_poles_conservative.zarr

  • date_range_YYYY-11-16_XXXX-02-01_12_hours-64x32_equiangular_conservative.zarr

xr.open_zarr('gs://weatherbench2/datasets/graphcast/2020/date_range_2019-11-16_2021-02-01_12_hours_derived.zarr')
<xarray.Dataset>
Dimensions:                   (time: 886, prediction_timedelta: 40, lat: 721,
                               lon: 1440, level: 37)
Coordinates:
  * lat                       (lat) float32 -90.0 -89.75 -89.5 ... 89.75 90.0
  * level                     (level) int64 1 2 3 5 7 ... 900 925 950 975 1000
  * lon                       (lon) float32 0.0 0.25 0.5 ... 359.2 359.5 359.8
  * prediction_timedelta      (prediction_timedelta) timedelta64[ns] 06:00:00...
  * time                      (time) datetime64[ns] 2019-11-16 ... 2021-01-31...
Data variables: (12/14)
    10m_u_component_of_wind   (time, prediction_timedelta, lat, lon) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    10m_v_component_of_wind   (time, prediction_timedelta, lat, lon) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    10m_wind_speed            (time, prediction_timedelta, lat, lon) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    2m_temperature            (time, prediction_timedelta, lat, lon) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    geopotential              (time, prediction_timedelta, level, lat, lon) float32 dask.array<chunksize=(1, 1, 37, 721, 1440), meta=np.ndarray>
    mean_sea_level_pressure   (time, prediction_timedelta, lat, lon) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    ...                        ...
    total_precipitation_24hr  (time, prediction_timedelta, lat, lon) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    total_precipitation_6hr   (time, prediction_timedelta, lat, lon) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    u_component_of_wind       (time, prediction_timedelta, level, lat, lon) float32 dask.array<chunksize=(1, 1, 37, 721, 1440), meta=np.ndarray>
    v_component_of_wind       (time, prediction_timedelta, level, lat, lon) float32 dask.array<chunksize=(1, 1, 37, 721, 1440), meta=np.ndarray>
    vertical_velocity         (time, prediction_timedelta, level, lat, lon) float32 dask.array<chunksize=(1, 1, 37, 721, 1440), meta=np.ndarray>
    wind_speed                (time, prediction_timedelta, level, lat, lon) float32 dask.array<chunksize=(1, 1, 37, 721, 1440), meta=np.ndarray>

GraphCast (Operational)

These GraphCast forecasts are initialized using operational IFS HRES analyses. The model details differ from the ERA5 version above. See graphcast_operational on the GraphCast GitHub repository.

Location: gs://weatherbench2/datasets/graphcast_hres_init/2020/

  • Files (see directory above for exact file names for each year):

  • date_range_2019-11-16_2021-02-01_12_hours_derived.zarr

  • date_range_2019-11-16_2021-02-01_12_hours-240x121_equiangular_with_poles_conservative.zarr

  • date_range_2019-11-16_2021-02-01_12_hours-64x32_equiangular_conservative.zarr

xr.open_zarr('gs://weatherbench2/datasets/graphcast_hres_init/2020/date_range_2019-11-16_2021-02-01_12_hours_derived.zarr')
<xarray.Dataset>
Dimensions:                   (time: 732, prediction_timedelta: 40, lat: 721,
                               lon: 1440, level: 13)
Coordinates:
  * lat                       (lat) float32 -90.0 -89.75 -89.5 ... 89.75 90.0
  * level                     (level) int32 50 100 150 200 ... 700 850 925 1000
  * lon                       (lon) float32 0.0 0.25 0.5 ... 359.2 359.5 359.8
  * prediction_timedelta      (prediction_timedelta) timedelta64[ns] 06:00:00...
  * time                      (time) datetime64[ns] 2020-01-01 ... 2020-12-31...
Data variables: (12/14)
    10m_u_component_of_wind   (time, prediction_timedelta, lat, lon) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    10m_v_component_of_wind   (time, prediction_timedelta, lat, lon) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    10m_wind_speed            (time, prediction_timedelta, lat, lon) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    2m_temperature            (time, prediction_timedelta, lat, lon) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    geopotential              (time, prediction_timedelta, level, lat, lon) float32 dask.array<chunksize=(1, 1, 13, 721, 1440), meta=np.ndarray>
    mean_sea_level_pressure   (time, prediction_timedelta, lat, lon) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    ...                        ...
    total_precipitation_24hr  (time, prediction_timedelta, lat, lon) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    total_precipitation_6hr   (time, prediction_timedelta, lat, lon) float32 dask.array<chunksize=(1, 1, 721, 1440), meta=np.ndarray>
    u_component_of_wind       (time, prediction_timedelta, level, lat, lon) float32 dask.array<chunksize=(1, 1, 13, 721, 1440), meta=np.ndarray>
    v_component_of_wind       (time, prediction_timedelta, level, lat, lon) float32 dask.array<chunksize=(1, 1, 13, 721, 1440), meta=np.ndarray>
    vertical_velocity         (time, prediction_timedelta, level, lat, lon) float32 dask.array<chunksize=(1, 1, 13, 721, 1440), meta=np.ndarray>
    wind_speed                (time, prediction_timedelta, level, lat, lon) float32 dask.array<chunksize=(1, 1, 13, 721, 1440), meta=np.ndarray>

Spherical CNN

Forecasts using a Spherical CNN are available for 2020.

Location: gs://weatherbench2/datasets/sphericalcnn/

Files:

  • 2020-240x121_equiangular_with_poles.zarr

  • 2020-64x32_equiangular_conservative.zarr

xr.open_zarr('gs://weatherbench2/datasets/sphericalcnn/2020-240x121_equiangular_with_poles.zarr')
<xarray.Dataset>
Dimensions:               (time: 178, prediction_timedelta: 40, level: 13,
                           longitude: 240, latitude: 121)
Coordinates:
  * latitude              (latitude) float64 -90.0 -88.5 -87.0 ... 88.5 90.0
  * level                 (level) int64 50 100 150 200 250 ... 700 850 925 1000
  * longitude             (longitude) float64 0.0 1.5 3.0 ... 355.5 357.0 358.5
  * prediction_timedelta  (prediction_timedelta) timedelta64[ns] 0 days 06:00...
  * time                  (time) datetime64[ns] 2020-01-01 ... 2020-12-20
Data variables:
    geopotential          (time, prediction_timedelta, level, longitude, latitude) float32 dask.array<chunksize=(10, 40, 13, 240, 121), meta=np.ndarray>
    specific_humidity     (time, prediction_timedelta, level, longitude, latitude) float32 dask.array<chunksize=(10, 40, 13, 240, 121), meta=np.ndarray>
    temperature           (time, prediction_timedelta, level, longitude, latitude) float32 dask.array<chunksize=(10, 40, 13, 240, 121), meta=np.ndarray>
    u_component_of_wind   (time, prediction_timedelta, level, longitude, latitude) float32 dask.array<chunksize=(10, 40, 13, 240, 121), meta=np.ndarray>
    v_component_of_wind   (time, prediction_timedelta, level, longitude, latitude) float32 dask.array<chunksize=(10, 40, 13, 240, 121), meta=np.ndarray>
    wind_speed            (time, prediction_timedelta, level, longitude, latitude) float32 dask.array<chunksize=(10, 40, 13, 240, 121), meta=np.ndarray>

FuXi

Forecasts using FuXi model are available for 2020.

Location: gs://weatherbench2/datasets/fuxi/

Files:

  • 2020-1440x721.zarr

  • 2020-240x121_equiangular_with_poles_conservative.zarr

  • 2020-64x32_equiangular_conservative.zarr

xr.open_zarr('gs://weatherbench2/datasets/fuxi/2020-1440x721.zarr')
<xarray.Dataset>
Dimensions:                            (time: 702, prediction_timedelta: 60,
                                        latitude: 721, longitude: 1440, level: 2)
Coordinates:
  * latitude                           (latitude) float64 -90.0 -89.75 ... 90.0
  * level                              (level) int32 500 850
  * longitude                          (longitude) float64 0.0 0.25 ... 359.8
  * prediction_timedelta               (prediction_timedelta) timedelta64[ns] ...
  * time                               (time) datetime64[ns] 2020-01-01 ... 2...
Data variables:
    10m_u_component_of_wind            (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 60, 721, 1440), meta=np.ndarray>
    10m_v_component_of_wind            (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 60, 721, 1440), meta=np.ndarray>
    10m_wind_speed                     (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 60, 721, 1440), meta=np.ndarray>
    2m_temperature                     (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 60, 721, 1440), meta=np.ndarray>
    geopotential                       (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 60, 2, 721, 1440), meta=np.ndarray>
    mean_sea_level_pressure            (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 60, 721, 1440), meta=np.ndarray>
    temperature                        (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 60, 2, 721, 1440), meta=np.ndarray>
    total_precipitation_24hr_from_6hr  (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 60, 721, 1440), meta=np.ndarray>
    total_precipitation_6hr            (time, prediction_timedelta, latitude, longitude) float32 dask.array<chunksize=(1, 60, 721, 1440), meta=np.ndarray>
    u_component_of_wind                (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 60, 2, 721, 1440), meta=np.ndarray>
    v_component_of_wind                (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 60, 2, 721, 1440), meta=np.ndarray>
    wind_speed                         (time, prediction_timedelta, level, latitude, longitude) float32 dask.array<chunksize=(1, 60, 2, 721, 1440), meta=np.ndarray>

NeuralGCM

Forecasts made with the Neural General Circulation Model are available for 2020. The deterministic model has a raw resolution of 0.7 degrees

Location: gs://weatherbench2/datasets/neuralgcm_deterministic/

Files:

  • 2020-512x256.zarr

  • 2020-240x121_equiangular_with_poles_conservative.zarr

  • 2020-64x32_equiangular_conservative.zarr

xr.open_zarr('gs://weatherbench2/datasets/neuralgcm_deterministic/2020-240x121_equiangular_with_poles_conservative.zarr')
<xarray.Dataset>
Dimensions:                              (time: 732, prediction_timedelta: 31,
                                          longitude: 240, latitude: 121,
                                          level: 37)
Coordinates:
  * latitude                             (latitude) float64 -90.0 -88.5 ... 90.0
  * level                                (level) int64 1 2 3 5 ... 950 975 1000
  * longitude                            (longitude) float64 0.0 1.5 ... 358.5
  * prediction_timedelta                 (prediction_timedelta) timedelta64[ns] ...
  * time                                 (time) datetime64[ns] 2020-01-01 ......
Data variables:
    P_minus_E_cumulative                 (time, prediction_timedelta, longitude, latitude) float32 dask.array<chunksize=(1, 8, 240, 121), meta=np.ndarray>
    geopotential                         (time, prediction_timedelta, level, longitude, latitude) float32 dask.array<chunksize=(1, 8, 37, 240, 121), meta=np.ndarray>
    specific_cloud_ice_water_content     (time, prediction_timedelta, level, longitude, latitude) float32 dask.array<chunksize=(1, 8, 37, 240, 121), meta=np.ndarray>
    specific_cloud_liquid_water_content  (time, prediction_timedelta, level, longitude, latitude) float32 dask.array<chunksize=(1, 8, 37, 240, 121), meta=np.ndarray>
    specific_humidity                    (time, prediction_timedelta, level, longitude, latitude) float32 dask.array<chunksize=(1, 8, 37, 240, 121), meta=np.ndarray>
    temperature                          (time, prediction_timedelta, level, longitude, latitude) float32 dask.array<chunksize=(1, 8, 37, 240, 121), meta=np.ndarray>
    u_component_of_wind                  (time, prediction_timedelta, level, longitude, latitude) float32 dask.array<chunksize=(1, 8, 37, 240, 121), meta=np.ndarray>
    v_component_of_wind                  (time, prediction_timedelta, level, longitude, latitude) float32 dask.array<chunksize=(1, 8, 37, 240, 121), meta=np.ndarray>
    wind_speed                           (time, prediction_timedelta, level, longitude, latitude) float32 dask.array<chunksize=(1, 8, 37, 240, 121), meta=np.ndarray>
Attributes:
    experiment_id:  67001173
    worker_id:      1

The ensemble version has a resolution of 1.4 degrees and has been run to produce 50 members. We also computed ensemble means

Location: gs://weatherbench2/datasets/neuralgcm_ens/

Files:

  • 2020-256x128.zarr

  • 2020-240x121_equiangular_with_poles_conservative.zarr

  • 2020-240x121_equiangular_with_poles_conservative_mean.zarr

  • 2020-64x32_equiangular_conservative.zarr

  • 2020-64x32_equiangular_conservative_mean.zarr

xr.open_zarr('gs://weatherbench2/datasets/neuralgcm_ens/2020-240x121_equiangular_with_poles_conservative.zarr')
<xarray.Dataset>
Dimensions:                              (realization: 50, time: 732,
                                          prediction_timedelta: 32, level: 37,
                                          longitude: 240, latitude: 121)
Coordinates:
  * latitude                             (latitude) float64 -90.0 -88.5 ... 90.0
  * level                                (level) int64 1 2 3 5 ... 950 975 1000
  * longitude                            (longitude) float64 0.0 1.5 ... 358.5
  * prediction_timedelta                 (prediction_timedelta) timedelta64[ns] ...
  * realization                          (realization) int64 0 1 2 ... 47 48 49
  * time                                 (time) datetime64[ns] 2020-01-01 ......
Data variables:
    geopotential                         (realization, time, prediction_timedelta, level, longitude, latitude) float32 dask.array<chunksize=(50, 1, 2, 1, 240, 121), meta=np.ndarray>
    specific_cloud_ice_water_content     (realization, time, prediction_timedelta, level, longitude, latitude) float32 dask.array<chunksize=(50, 1, 2, 1, 240, 121), meta=np.ndarray>
    specific_cloud_liquid_water_content  (realization, time, prediction_timedelta, level, longitude, latitude) float32 dask.array<chunksize=(50, 1, 2, 1, 240, 121), meta=np.ndarray>
    specific_humidity                    (realization, time, prediction_timedelta, level, longitude, latitude) float32 dask.array<chunksize=(50, 1, 2, 1, 240, 121), meta=np.ndarray>
    temperature                          (realization, time, prediction_timedelta, level, longitude, latitude) float32 dask.array<chunksize=(50, 1, 2, 1, 240, 121), meta=np.ndarray>
    u_component_of_wind                  (realization, time, prediction_timedelta, level, longitude, latitude) float32 dask.array<chunksize=(50, 1, 2, 1, 240, 121), meta=np.ndarray>
    v_component_of_wind                  (realization, time, prediction_timedelta, level, longitude, latitude) float32 dask.array<chunksize=(50, 1, 2, 1, 240, 121), meta=np.ndarray>
    wind_speed                           (realization, time, prediction_timedelta, level, longitude, latitude) float32 dask.array<chunksize=(50, 1, 2, 1, 240, 121), meta=np.ndarray>
Attributes:
    experiment_id:  73974210
    worker_id:      3