Downloading ERA5 Data (Low Resolution)#

Step One - Getting Data on Disk#

Not everyone has access to a full copy of ERA5 on disk on an HPC facility. It’s possible to download a reduced-resolution version of ERA5 which will fit on most coputing environments. Examples will use low-resolution data, but the data is still nearly 500GB. It’s going to be easier to do this on midrange or HPC than typical workstation setups, although a high-end workstation will be sufficient for getting the general idea.

This demo will begin by using the familiar and accessible weatherbench dataset (see pangeo-data/WeatherBench).

Readers are also recommented to start by working through the weatherbench tutorialss in order to have a strong grasp of what is going on with the ML.

Data Download and layout on disk#

If you don’t already have a replication of the dataset, the zipped data file is 270GB and can be downloaded thusly:

The process of unzipping the downloaded file is left as an exercise to the reader. The unzipped data is 471GB. Once downloaded and unpacked, proceed.

If you are working at NCI, the data is already available on disk as a data collection.

[1]:
import os
wbench_data_dir = '/g/data/wb00/NCI-Weatherbench/5.625deg'
!ls $wbench_data_dir
10m_u_component_of_wind  potential_vorticity           total_cloud_cover
10m_v_component_of_wind  relative_humidity             total_precipitation
2m_temperature           specific_humidity             u_component_of_wind
constants                temperature                   v_component_of_wind
geopotential             toa_incident_solar_radiation  vorticity

Loading Data without PyEarthTools#

pangeo-data/WeatherBench shows some examples of loading data without PyEarthTools. In short, you need to write the looping code to walk the filesystem, open things file-by-file, and manually do any reprocessing.

PyEarthTools provides a standardised interface to data. Code is already written to nicely interface to the full ERA5 that’s on disk at NCI (including the 5.625 degree data).

[ ]: