{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "db1a246a-0cfa-4baf-a7a3-ca5bc521c734",
"metadata": {},
"source": [
"\n",
"# WeatherBench 2 Evaluation Quickstart\n",
"\n",
"\n",
" \n",
"\n",
"\n",
"In this notebook, we will cover the basic functionality of the WeatherBench evaluation framework.\n",
"\n",
"The WeatherBench evaluation framework takes two datasets for forecast and ground truth (called obs, even though reanalysis datasets like ERA5 are not observations), computes and saves the specified metrics.\n",
"\n",
"Here, we will evalute ECMWF's HRES forecast against ERA5."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f361a7ba-fa4f-49a8-a3cc-5113f7eb3430",
"metadata": {},
"outputs": [],
"source": [
"# Pip might complain about the Pandas version. The notebook should still work as expected.\n",
"!pip install git+https://github.com/google-research/weatherbench2.git"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "70b4fd7a-d599-4d82-986f-4ec51743e98c",
"metadata": {},
"outputs": [],
"source": [
"import apache_beam # Needs to be imported separately to avoid TypingError\n",
"import weatherbench2\n",
"import xarray as xr"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "f6b59ae2-4ca9-4b93-8e9f-6ac2304bad50",
"metadata": {},
"outputs": [],
"source": [
"# Run the code below to access cloud data on Colab!\n",
"# from google.colab import auth\n",
"# auth.authenticate_user()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "959f9b8f-00d2-405a-b07d-5593c3314e3f",
"metadata": {},
"source": [
"### Specify input datasets\n",
"\n",
"Let's take a look at the datasets. Currently, the WeatherBench pipeline requires all input dataset to be stored as Zarr files."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "4b6ec0cb-928f-4e50-a077-802b5a51e469",
"metadata": {},
"outputs": [],
"source": [
"forecast_path = 'gs://weatherbench2/datasets/hres/2016-2022-0012-64x32_equiangular_conservative.zarr'\n",
"obs_path = 'gs://weatherbench2/datasets/era5/1959-2022-6h-64x32_equiangular_conservative.zarr'\n",
"climatology_path = 'gs://weatherbench2/datasets/era5-hourly-climatology/1990-2019_6h_64x32_equiangular_conservative.zarr'"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "8abcc6c7-1766-4a2a-a99f-bbd062500562",
"metadata": {},
"source": [
"Generally, we follow ECMWF's naming conventions for the input files.\n",
"\n",
"* `time` [np.datetime64]: Time at forecast is initialized\n",
"* `lead_time` or `prediction_timedelta` [np.timedelta64]: Lead time\n",
"* `latitude` [float]: Latitudes from -90 to 90\n",
"* `longitude` [float]: Longitudes from 0 to 360\n",
"* `level` [hPa]: Pressure levels (optional)\n",
"\n",
"We don't actually need to open the forecast and obs datasets at this point, but we will do so here to see their structure."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "56de9923-acc3-49e8-90e9-a3286ff45d86",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
<xarray.Dataset>\n",
"Dimensions: (time: 5114, prediction_timedelta: 41,\n",
" longitude: 64, latitude: 32, level: 13)\n",
"Coordinates:\n",
" * latitude (latitude) float64 -87.19 -81.56 ... 81.56 87.19\n",
" * level (level) int32 50 100 150 200 ... 700 850 925 1000\n",
" * longitude (longitude) float64 0.0 5.625 ... 348.8 354.4\n",
" * prediction_timedelta (prediction_timedelta) timedelta64[ns] 00:00:00...\n",
" * time (time) datetime64[ns] 2016-01-01 ... 2022-12-31...\n",
"Data variables: (12/16)\n",
" 10m_u_component_of_wind (time, prediction_timedelta, longitude, latitude) float32 dask.array<chunksize=(4, 1, 64, 32), meta=np.ndarray>\n",
" 10m_v_component_of_wind (time, prediction_timedelta, longitude, latitude) float32 dask.array<chunksize=(4, 1, 64, 32), meta=np.ndarray>\n",
" 10m_wind_speed (time, prediction_timedelta, longitude, latitude) float32 dask.array<chunksize=(4, 1, 64, 32), meta=np.ndarray>\n",
" 2m_temperature (time, prediction_timedelta, longitude, latitude) float32 dask.array<chunksize=(4, 1, 64, 32), meta=np.ndarray>\n",
" geopotential (time, prediction_timedelta, level, longitude, latitude) float32 dask.array<chunksize=(4, 1, 13, 64, 32), meta=np.ndarray>\n",
" mean_sea_level_pressure (time, prediction_timedelta, longitude, latitude) float32 dask.array<chunksize=(4, 1, 64, 32), meta=np.ndarray>\n",
" ... ...\n",
" total_precipitation_24hr (time, prediction_timedelta, longitude, latitude) float32 dask.array<chunksize=(4, 1, 64, 32), meta=np.ndarray>\n",
" total_precipitation_6hr (time, prediction_timedelta, longitude, latitude) float32 dask.array<chunksize=(4, 1, 64, 32), meta=np.ndarray>\n",
" u_component_of_wind (time, prediction_timedelta, level, longitude, latitude) float32 dask.array<chunksize=(4, 1, 13, 64, 32), meta=np.ndarray>\n",
" v_component_of_wind (time, prediction_timedelta, level, longitude, latitude) float32 dask.array<chunksize=(4, 1, 13, 64, 32), meta=np.ndarray>\n",
" vertical_velocity (time, prediction_timedelta, level, longitude, latitude) float32 dask.array<chunksize=(4, 1, 13, 64, 32), meta=np.ndarray>\n",
" wind_speed (time, prediction_timedelta, level, longitude, latitude) float32 dask.array<chunksize=(4, 1, 13, 64, 32), meta=np.ndarray><xarray.Dataset>\n",
"Dimensions: (time: 92044,\n",
" longitude: 64,\n",
" latitude: 32, level: 13)\n",
"Coordinates:\n",
" * latitude (latitude) float64 -87....\n",
" * level (level) int64 50 ... 1000\n",
" * longitude (longitude) float64 0.0...\n",
" * time (time) datetime64[ns] 1...\n",
"Data variables: (12/38)\n",
" 10m_u_component_of_wind (time, longitude, latitude) float32 dask.array<chunksize=(100, 64, 32), meta=np.ndarray>\n",
" 10m_v_component_of_wind (time, longitude, latitude) float32 dask.array<chunksize=(100, 64, 32), meta=np.ndarray>\n",
" 10m_wind_speed (time, longitude, latitude) float32 dask.array<chunksize=(100, 64, 32), meta=np.ndarray>\n",
" 2m_temperature (time, longitude, latitude) float32 dask.array<chunksize=(100, 64, 32), meta=np.ndarray>\n",
" angle_of_sub_gridscale_orography (longitude, latitude) float32 dask.array<chunksize=(64, 32), meta=np.ndarray>\n",
" anisotropy_of_sub_gridscale_orography (longitude, latitude) float32 dask.array<chunksize=(64, 32), meta=np.ndarray>\n",
" ... ...\n",
" type_of_high_vegetation (longitude, latitude) float32 dask.array<chunksize=(64, 32), meta=np.ndarray>\n",
" type_of_low_vegetation (longitude, latitude) float32 dask.array<chunksize=(64, 32), meta=np.ndarray>\n",
" u_component_of_wind (time, level, longitude, latitude) float32 dask.array<chunksize=(100, 13, 64, 32), meta=np.ndarray>\n",
" v_component_of_wind (time, level, longitude, latitude) float32 dask.array<chunksize=(100, 13, 64, 32), meta=np.ndarray>\n",
" vertical_velocity (time, level, longitude, latitude) float32 dask.array<chunksize=(100, 13, 64, 32), meta=np.ndarray>\n",
" wind_speed (time, level, longitude, latitude) float32 dask.array<chunksize=(100, 13, 64, 32), meta=np.ndarray><xarray.Dataset>\n",
"Dimensions: (hour: 4, dayofyear: 366,\n",
" longitude: 64, latitude: 32,\n",
" level: 13)\n",
"Coordinates:\n",
" * dayofyear (dayofyear) int64 1 2 ... 366\n",
" * hour (hour) int64 0 6 12 18\n",
" * latitude (latitude) float64 -87.19 .....\n",
" * level (level) int64 50 100 ... 1000\n",
" * longitude (longitude) float64 0.0 ... ...\n",
"Data variables: (12/28)\n",
" 10m_u_component_of_wind (hour, dayofyear, longitude, latitude) float32 dask.array<chunksize=(4, 366, 64, 32), meta=np.ndarray>\n",
" 10m_v_component_of_wind (hour, dayofyear, longitude, latitude) float32 dask.array<chunksize=(4, 366, 64, 32), meta=np.ndarray>\n",
" 10m_wind_speed (hour, dayofyear, longitude, latitude) float32 dask.array<chunksize=(4, 366, 64, 32), meta=np.ndarray>\n",
" 2m_temperature (hour, dayofyear, longitude, latitude) float32 dask.array<chunksize=(4, 366, 64, 32), meta=np.ndarray>\n",
" geopotential (hour, dayofyear, level, longitude, latitude) float32 dask.array<chunksize=(4, 366, 13, 64, 32), meta=np.ndarray>\n",
" mean_sea_level_pressure (hour, dayofyear, longitude, latitude) float32 dask.array<chunksize=(4, 366, 64, 32), meta=np.ndarray>\n",
" ... ...\n",
" total_precipitation_6hr_seeps_dry_fraction (hour, dayofyear, longitude, latitude) float32 dask.array<chunksize=(4, 366, 64, 32), meta=np.ndarray>\n",
" total_precipitation_6hr_seeps_threshold (hour, dayofyear, longitude, latitude) float32 dask.array<chunksize=(4, 366, 64, 32), meta=np.ndarray>\n",
" u_component_of_wind (hour, dayofyear, level, longitude, latitude) float32 dask.array<chunksize=(4, 366, 13, 64, 32), meta=np.ndarray>\n",
" v_component_of_wind (hour, dayofyear, level, longitude, latitude) float32 dask.array<chunksize=(4, 366, 13, 64, 32), meta=np.ndarray>\n",
" vertical_velocity (hour, dayofyear, level, longitude, latitude) float32 dask.array<chunksize=(4, 366, 13, 64, 32), meta=np.ndarray>\n",
" wind_speed (hour, dayofyear, level, longitude, latitude) float32 dask.array<chunksize=(4, 366, 13, 64, 32), meta=np.ndarray><xarray.Dataset>\n",
"Dimensions: (lead_time: 41, region: 3, level: 3, metric: 2)\n",
"Coordinates:\n",
" * lead_time (lead_time) timedelta64[ns] 0 days 00:00:00 ... 10 days 0...\n",
" * region (region) object 'global' 'tropics' 'extra-tropics'\n",
" * level (level) int32 500 700 850\n",
" * metric (metric) object 'acc' 'mse'\n",
"Data variables:\n",
" geopotential (metric, region, lead_time, level) float64 ...\n",
" 2m_temperature (metric, region, lead_time) float64 ...