Downloading ERA5 Data (Low Resolution)#
Step One - Getting Data on Disk#
Not everyone has access to a full copy of ERA5 on disk on an HPC facility. It’s possible to download a reduced-resolution version of ERA5 which will fit on most coputing environments. Examples will use low-resolution data, but the data is still nearly 500GB. It’s going to be easier to do this on midrange or HPC than typical workstation setups, although a high-end workstation will be sufficient for getting the general idea.
This demo will begin by using the familiar and accessible weatherbench dataset (see pangeo-data/WeatherBench).
Readers are also recommented to start by working through the weatherbench tutorialss in order to have a strong grasp of what is going on with the ML.
Data Download and layout on disk#
If you don’t already have a replication of the dataset, the zipped data file is 270GB and can be downloaded thusly:
wget “https://dataserv.ub.tum.de/s/m1524895/download?path=%2F5.625deg&files=all_5.625deg.zip” -O all_5.625deg.zip
The process of unzipping the downloaded file is left as an exercise to the reader. The unzipped data is 471GB. Once downloaded and unpacked, proceed.
If you are working at NCI, the data is already available on disk as a data collection.
[1]:
import os
wbench_data_dir = '/g/data/wb00/NCI-Weatherbench/5.625deg'
!ls $wbench_data_dir
10m_u_component_of_wind potential_vorticity total_cloud_cover
10m_v_component_of_wind relative_humidity total_precipitation
2m_temperature specific_humidity u_component_of_wind
constants temperature v_component_of_wind
geopotential toa_incident_solar_radiation vorticity
Loading Data without PyEarthTools#
pangeo-data/WeatherBench shows some examples of loading data without PyEarthTools. In short, you need to write the looping code to walk the filesystem, open things file-by-file, and manually do any reprocessing.
PyEarthTools provides a standardised interface to data. Code is already written to nicely interface to the full ERA5 that’s on disk at NCI (including the 5.625 degree data).
[ ]: