Accessing ERA5 Data#
NCI, NIWA and the Met Office have all downloaded a local copy of ERA5 5.625 data.
The locations of the data are presented below, uncomment the organisation you have access to.
If you work outside these organisations, follow the instructions in the
Downloading_ERA5.ipynbnotebook to download a local copy.
Assumptions:
You should already have downloaded a copy or partial copy of ERA5 5.625 degree resolution
You have checked out the PyEarthTools monorepo and have a functional PyEarthTools environment into which you can install new packages
This notebook will work through creating a new PyEarthTools package which can interface to the ERA5 dataset, referred to hereafter as “ERA5lowres” for convenience and naming.
This notebook will present two things:
The quick install-and-use demo
How it was done slowly and carefully so you can do it on new data
[ ]:
# Uncomment the organisation you have access to.
# wbench_data_dir = '/g/data/wb00/NCI-Weatherbench/5.625deg' # NCI
# wbench_data_dir = "/nesi/nobackup/niwa00004/riom/weatherbench/5.625deg/" # NIWA HPC
wbench_data_dir = "/data/users/infolab/weatherbench/5.625deg" # Met Office
# List the contents of the data directory.
!ls $wbench_data_dir
10m_u_component_of_wind geopotential_500 total_cloud_cover
10m_v_component_of_wind potential_vorticity total_precipitation
2m_temperature relative_humidity u_component_of_wind
baselines specific_humidity v_component_of_wind
constants temperature vorticity
geopotential temperature_850
Geopotential toa_incident_solar_radiation
[ ]:
# Select the first 2m temperature directory.
!ls $wbench_data_dir/2m_temperature/ | head -n 1
2m_temperature_1979_5.625deg.nc
[ ]:
# Select all 2m_temperature files with '198' *wildcard* to select all files from the 1980s.
!ls $wbench_data_dir/2m_temperature/*198*
/data/users/infolab/weatherbench/5.625deg/2m_temperature/2m_temperature_1980_5.625deg.nc
/data/users/infolab/weatherbench/5.625deg/2m_temperature/2m_temperature_1981_5.625deg.nc
/data/users/infolab/weatherbench/5.625deg/2m_temperature/2m_temperature_1982_5.625deg.nc
/data/users/infolab/weatherbench/5.625deg/2m_temperature/2m_temperature_1983_5.625deg.nc
/data/users/infolab/weatherbench/5.625deg/2m_temperature/2m_temperature_1984_5.625deg.nc
/data/users/infolab/weatherbench/5.625deg/2m_temperature/2m_temperature_1985_5.625deg.nc
/data/users/infolab/weatherbench/5.625deg/2m_temperature/2m_temperature_1986_5.625deg.nc
/data/users/infolab/weatherbench/5.625deg/2m_temperature/2m_temperature_1987_5.625deg.nc
/data/users/infolab/weatherbench/5.625deg/2m_temperature/2m_temperature_1988_5.625deg.nc
/data/users/infolab/weatherbench/5.625deg/2m_temperature/2m_temperature_1989_5.625deg.nc
Here we see that the data is stored in subdirectories with human-readable names, and within those subdirectories, the files are names with the year and month of the data.
This layout is somewhat different to the layout of the full ERA5 dataset as taken directly from CDS, so we will need to tell PyEarthTools how to deal with this difference.
The challenge for PyEarthTools is to understand this directory structure, figure out the shorthand variables names that are actually present in the files (e.g. total_cloud_cover is called “tcc” inside the netcdf file), and work out how to index the whole thing by shorthand-variable-name and date, including interpreting things like the unit of the variable and handling any variable renaming that might be needed for standardisation of naming conventions.
This requires some configuration, and a lot of PyEarthTools’ code is about this kind of dataset comprehension.
Let’s summarise the easy way.
Install the
era5lowresPython module (part of the tutorial package)Set an environment variable called ERA5LOWRES
Import the
era5lowresPython module
Assuming you have already installed the era5lowres module, let’s do step 2 and 3.
[ ]:
# Create an environment variable to store the path to the ERA5 data.
%env ERA5LOWRES=$wbench_data_dir
env: ERA5LOWRES=/data/users/infolab/weatherbench/5.625deg
[ ]:
# Import pyearthtools packages.
import pyearthtools.data
import pyearthtools.tutorial
[ ]:
# Display information about era5lowres.
pyearthtools.data.archive.era5lowres?
Init signature:
pyearthtools.data.archive.era5lowres(
variables: 'list[str] | str',
*,
level_value: 'int | float | list[int | float] | tuple[list | int, ...] | None' = None,
transforms: 'Transform | TransformCollection | None' = None,
)
Docstring: ECWMF ReAnalysis v5
Init docstring:
Setup ERA5 Low-Res Indexer
Args:
variables (list[str] | str):
Data variables to retrieve
resolution (Literal[ERA_RES], optional):
Resolution of data, must be one of 'monthly-averaged','monthly-averaged-by-hour', 'reanalysis'.
Defaults to 'reanalysis'.
level_value: (int, optional):
Level value to select if data contains levels. Defaults to None.
transforms (Transform | TransformCollection, optional):
Base Transforms to apply.
Defaults to TransformCollection().
File: ~/Projects/PyEarthTools/packages/tutorial/src/pyearthtools/tutorial/ERA5DataClass.py
Type: ABCMeta
Subclasses:
[27]:
var = ['u', 'v'] # Note - there is no really straightforward way to just list the variables in the archive
# However, mismatches will cause PyEarthTools to list what's available with a "did you mean" prompt
# A specific listing function should be added in future.
UandV = pyearthtools.data.archive.era5lowres(var)
UandV
[27]:
ERA5LowResIndex
Description ECWMF ReAnalysis v5
range '1970-current'
Documentation 'https://confluence.ecmwf.int/display/CKB/ERA5%3A+data+documentation'
Initialisation
level_value None
variables ['u', 'v']
Transforms
StandardCoordinateNames {'latitude': "['lat', 'Latitude', 'yt_ocean', 'yt']", 'longitude': "['lon', 'Longitude', 'xt_ocean', 'xt']", 'replacement_dictionary': 'None', 'time': "['Time']"}
Rename {'names': {'t2m': "'2t'", 'u10': "'10u'", 'v10': "'10v'", 'siconc': "'ci'"}}[ ]:
# Inspect the data for a specific date as an Xarray dataset.
# Note that the Data Variables include 'u' and 'v' as expected.
data = UandV['1984-01-01']
data
<xarray.Dataset> Size: 5MB
Dimensions: (latitude: 32, longitude: 64, level: 13, time: 24)
Coordinates:
* latitude (latitude) float64 256B -87.19 -81.56 -75.94 ... 81.56 87.19
* longitude (longitude) float64 512B 0.0 5.625 11.25 ... 343.1 348.8 354.4
* level (level) int32 52B 50 100 150 200 250 300 ... 600 700 850 925 1000
* time (time) datetime64[ns] 192B 1984-01-01 ... 1984-01-01T23:00:00
Data variables:
u (time, level, latitude, longitude) float32 3MB dask.array<chunksize=(24, 8, 19, 39), meta=np.ndarray>
v (time, level, latitude, longitude) float32 3MB dask.array<chunksize=(24, 8, 19, 39), meta=np.ndarray>
Attributes:
Conventions: CF-1.6[ ]:
# Inspect the 'u' variable in the dataset.
data.u
<xarray.DataArray 'u' (time: 24, level: 13, latitude: 32, longitude: 64)> Size: 3MB
dask.array<getitem, shape=(24, 13, 32, 64), dtype=float32, chunksize=(24, 8, 19, 39), chunktype=numpy.ndarray>
Coordinates:
* latitude (latitude) float64 256B -87.19 -81.56 -75.94 ... 81.56 87.19
* longitude (longitude) float64 512B 0.0 5.625 11.25 ... 343.1 348.8 354.4
* level (level) int32 52B 50 100 150 200 250 300 ... 600 700 850 925 1000
* time (time) datetime64[ns] 192B 1984-01-01 ... 1984-01-01T23:00:00
Attributes:
units: m s**-1
long_name: U component of wind
standard_name: eastward_wind[ ]:
# Inspect the time dimension of the 'u' variable.
data.u.time
<xarray.DataArray 'time' (time: 24)> Size: 192B
array(['1984-01-01T00:00:00.000000000', '1984-01-01T01:00:00.000000000',
'1984-01-01T02:00:00.000000000', '1984-01-01T03:00:00.000000000',
'1984-01-01T04:00:00.000000000', '1984-01-01T05:00:00.000000000',
'1984-01-01T06:00:00.000000000', '1984-01-01T07:00:00.000000000',
'1984-01-01T08:00:00.000000000', '1984-01-01T09:00:00.000000000',
'1984-01-01T10:00:00.000000000', '1984-01-01T11:00:00.000000000',
'1984-01-01T12:00:00.000000000', '1984-01-01T13:00:00.000000000',
'1984-01-01T14:00:00.000000000', '1984-01-01T15:00:00.000000000',
'1984-01-01T16:00:00.000000000', '1984-01-01T17:00:00.000000000',
'1984-01-01T18:00:00.000000000', '1984-01-01T19:00:00.000000000',
'1984-01-01T20:00:00.000000000', '1984-01-01T21:00:00.000000000',
'1984-01-01T22:00:00.000000000', '1984-01-01T23:00:00.000000000'],
dtype='datetime64[ns]')
Coordinates:
* time (time) datetime64[ns] 192B 1984-01-01 ... 1984-01-01T23:00:00
Attributes:
long_name: time