Introduction to Pipelines#
[1]:
# Here we'll use WeatherBench 2 ERA5 dataset as data source
from pyearthtools.data.download.weatherbench import WB2ERA5
[2]:
# Now we import the pipeline model which is new in this example
import pyearthtools.pipeline
# and the operations module used as steps of the pipeline
import pyearthtools.pipeline.operations as ops
import pyearthtools.data.transforms.coordinates as coords
Data source#
For this example, we are using the ERA5 datasets from WeatherBench 2, at a low resolution. Data is small and direclty fetched from public Google Cloud bucket.
We can first have a look at the full dataset to see which variables and levels we want to assemble in the pipeline.
[3]:
WB2ERA5(resolution="64x32").dataset
[3]:
<xarray.Dataset> Size: 175GB
Dimensions: (time: 93544,
longitude: 64,
latitude: 32, level: 13)
Coordinates:
* latitude (latitude) float64 256B ...
* level (level) int64 104B 50 ....
* longitude (longitude) float64 512B ...
* time (time) datetime64[ns] 748kB ...
Data variables: (12/62)
10m_u_component_of_wind (time, longitude, latitude) float32 766MB dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
10m_v_component_of_wind (time, longitude, latitude) float32 766MB dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
10m_wind_speed (time, longitude, latitude) float32 766MB dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
2m_dewpoint_temperature (time, longitude, latitude) float32 766MB dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
2m_temperature (time, longitude, latitude) float32 766MB dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
above_ground (time, level, longitude, latitude) float32 10GB dask.array<chunksize=(100, 13, 64, 32), meta=np.ndarray>
... ...
volumetric_soil_water_layer_1 (time, longitude, latitude) float32 766MB dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
volumetric_soil_water_layer_2 (time, longitude, latitude) float32 766MB dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
volumetric_soil_water_layer_3 (time, longitude, latitude) float32 766MB dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
volumetric_soil_water_layer_4 (time, longitude, latitude) float32 766MB dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
vorticity (time, level, longitude, latitude) float32 10GB dask.array<chunksize=(100, 13, 64, 32), meta=np.ndarray>
wind_speed (time, level, longitude, latitude) float32 10GB dask.array<chunksize=(100, 13, 64, 32), meta=np.ndarray>- time: 93544
- longitude: 64
- latitude: 32
- level: 13
- latitude(latitude)float64-87.19 -81.56 ... 81.56 87.19
array([-87.1875, -81.5625, -75.9375, -70.3125, -64.6875, -59.0625, -53.4375, -47.8125, -42.1875, -36.5625, -30.9375, -25.3125, -19.6875, -14.0625, -8.4375, -2.8125, 2.8125, 8.4375, 14.0625, 19.6875, 25.3125, 30.9375, 36.5625, 42.1875, 47.8125, 53.4375, 59.0625, 64.6875, 70.3125, 75.9375, 81.5625, 87.1875]) - level(level)int6450 100 150 200 ... 700 850 925 1000
array([ 50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 850, 925, 1000]) - longitude(longitude)float640.0 5.625 11.25 ... 348.8 354.4
array([ 0. , 5.625, 11.25 , 16.875, 22.5 , 28.125, 33.75 , 39.375, 45. , 50.625, 56.25 , 61.875, 67.5 , 73.125, 78.75 , 84.375, 90. , 95.625, 101.25 , 106.875, 112.5 , 118.125, 123.75 , 129.375, 135. , 140.625, 146.25 , 151.875, 157.5 , 163.125, 168.75 , 174.375, 180. , 185.625, 191.25 , 196.875, 202.5 , 208.125, 213.75 , 219.375, 225. , 230.625, 236.25 , 241.875, 247.5 , 253.125, 258.75 , 264.375, 270. , 275.625, 281.25 , 286.875, 292.5 , 298.125, 303.75 , 309.375, 315. , 320.625, 326.25 , 331.875, 337.5 , 343.125, 348.75 , 354.375]) - time(time)datetime64[ns]1959-01-01 ... 2023-01-10T18:00:00
array(['1959-01-01T00:00:00.000000000', '1959-01-01T06:00:00.000000000', '1959-01-01T12:00:00.000000000', ..., '2023-01-10T06:00:00.000000000', '2023-01-10T12:00:00.000000000', '2023-01-10T18:00:00.000000000'], shape=(93544,), dtype='datetime64[ns]')
- 10m_u_component_of_wind(time, longitude, latitude)float32dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
- long_name :
- 10 metre U wind component
- short_name :
- u10
- units :
- m s**-1
Array Chunk Bytes 730.81 MiB 800.00 kiB Shape (93544, 64, 32) (100, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - 10m_v_component_of_wind(time, longitude, latitude)float32dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
- long_name :
- 10 metre V wind component
- short_name :
- v10
- units :
- m s**-1
Array Chunk Bytes 730.81 MiB 800.00 kiB Shape (93544, 64, 32) (100, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - 10m_wind_speed(time, longitude, latitude)float32dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
Array Chunk Bytes 730.81 MiB 800.00 kiB Shape (93544, 64, 32) (100, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - 2m_dewpoint_temperature(time, longitude, latitude)float32dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
- long_name :
- 2 metre dewpoint temperature
- short_name :
- d2m
- units :
- K
Array Chunk Bytes 730.81 MiB 800.00 kiB Shape (93544, 64, 32) (100, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - 2m_temperature(time, longitude, latitude)float32dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
- long_name :
- 2 metre temperature
- short_name :
- t2m
- units :
- K
Array Chunk Bytes 730.81 MiB 800.00 kiB Shape (93544, 64, 32) (100, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - above_ground(time, level, longitude, latitude)float32dask.array<chunksize=(100, 13, 64, 32), meta=np.ndarray>
Array Chunk Bytes 9.28 GiB 10.16 MiB Shape (93544, 13, 64, 32) (100, 13, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - ageostrophic_wind_speed(time, level, longitude, latitude)float32dask.array<chunksize=(100, 13, 64, 32), meta=np.ndarray>
Array Chunk Bytes 9.28 GiB 10.16 MiB Shape (93544, 13, 64, 32) (100, 13, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - angle_of_sub_gridscale_orography(longitude, latitude)float32dask.array<chunksize=(64, 32), meta=np.ndarray>
- long_name :
- Angle of sub-gridscale orography
- short_name :
- anor
- units :
- radians
Array Chunk Bytes 8.00 kiB 8.00 kiB Shape (64, 32) (64, 32) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray - anisotropy_of_sub_gridscale_orography(longitude, latitude)float32dask.array<chunksize=(64, 32), meta=np.ndarray>
- long_name :
- Anisotropy of sub-gridscale orography
- short_name :
- isor
- units :
- ~
Array Chunk Bytes 8.00 kiB 8.00 kiB Shape (64, 32) (64, 32) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray - boundary_layer_height(time, longitude, latitude)float32dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
- long_name :
- Boundary layer height
- short_name :
- blh
- units :
- m
Array Chunk Bytes 730.81 MiB 800.00 kiB Shape (93544, 64, 32) (100, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - divergence(time, level, longitude, latitude)float32dask.array<chunksize=(100, 13, 64, 32), meta=np.ndarray>
Array Chunk Bytes 9.28 GiB 10.16 MiB Shape (93544, 13, 64, 32) (100, 13, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - eddy_kinetic_energy(time, longitude, latitude)float32dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
Array Chunk Bytes 730.81 MiB 800.00 kiB Shape (93544, 64, 32) (100, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - geopotential(time, level, longitude, latitude)float32dask.array<chunksize=(100, 13, 64, 32), meta=np.ndarray>
- long_name :
- Geopotential
- short_name :
- z
- standard_name :
- geopotential
- units :
- m**2 s**-2
Array Chunk Bytes 9.28 GiB 10.16 MiB Shape (93544, 13, 64, 32) (100, 13, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - geopotential_at_surface(longitude, latitude)float32dask.array<chunksize=(64, 32), meta=np.ndarray>
- long_name :
- Geopotential
- short_name :
- z
- standard_name :
- geopotential
- units :
- m**2 s**-2
Array Chunk Bytes 8.00 kiB 8.00 kiB Shape (64, 32) (64, 32) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray - geostrophic_wind_speed(time, level, longitude, latitude)float32dask.array<chunksize=(100, 13, 64, 32), meta=np.ndarray>
Array Chunk Bytes 9.28 GiB 10.16 MiB Shape (93544, 13, 64, 32) (100, 13, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - high_vegetation_cover(longitude, latitude)float32dask.array<chunksize=(64, 32), meta=np.ndarray>
- long_name :
- High vegetation cover
- short_name :
- cvh
- units :
- (0 - 1)
Array Chunk Bytes 8.00 kiB 8.00 kiB Shape (64, 32) (64, 32) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray - integrated_vapor_transport(time, longitude, latitude)float32dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
Array Chunk Bytes 730.81 MiB 800.00 kiB Shape (93544, 64, 32) (100, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - lake_cover(longitude, latitude)float32dask.array<chunksize=(64, 32), meta=np.ndarray>
- long_name :
- Lake cover
- short_name :
- cl
- units :
- (0 - 1)
Array Chunk Bytes 8.00 kiB 8.00 kiB Shape (64, 32) (64, 32) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray - land_sea_mask(longitude, latitude)float32dask.array<chunksize=(64, 32), meta=np.ndarray>
- long_name :
- Land-sea mask
- short_name :
- lsm
- standard_name :
- land_binary_mask
- units :
- (0 - 1)
Array Chunk Bytes 8.00 kiB 8.00 kiB Shape (64, 32) (64, 32) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray - lapse_rate(time, level, longitude, latitude)float32dask.array<chunksize=(100, 13, 64, 32), meta=np.ndarray>
Array Chunk Bytes 9.28 GiB 10.16 MiB Shape (93544, 13, 64, 32) (100, 13, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - leaf_area_index_high_vegetation(time, longitude, latitude)float32dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
- long_name :
- Leaf area index, high vegetation
- short_name :
- lai_hv
- units :
- m**2 m**-2
Array Chunk Bytes 730.81 MiB 800.00 kiB Shape (93544, 64, 32) (100, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - leaf_area_index_low_vegetation(time, longitude, latitude)float32dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
- long_name :
- Leaf area index, low vegetation
- short_name :
- lai_lv
- units :
- m**2 m**-2
Array Chunk Bytes 730.81 MiB 800.00 kiB Shape (93544, 64, 32) (100, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - low_vegetation_cover(longitude, latitude)float32dask.array<chunksize=(64, 32), meta=np.ndarray>
- long_name :
- Low vegetation cover
- short_name :
- cvl
- units :
- (0 - 1)
Array Chunk Bytes 8.00 kiB 8.00 kiB Shape (64, 32) (64, 32) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray - mean_sea_level_pressure(time, longitude, latitude)float32dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
- long_name :
- Mean sea level pressure
- short_name :
- msl
- standard_name :
- air_pressure_at_mean_sea_level
- units :
- Pa
Array Chunk Bytes 730.81 MiB 800.00 kiB Shape (93544, 64, 32) (100, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - mean_surface_latent_heat_flux(time, longitude, latitude)float32dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
- long_name :
- Mean surface latent heat flux
- short_name :
- mslhf
- units :
- W m**-2
Array Chunk Bytes 730.81 MiB 800.00 kiB Shape (93544, 64, 32) (100, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - mean_surface_net_long_wave_radiation_flux(time, longitude, latitude)float32dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
- long_name :
- Mean surface net long-wave radiation flux
- short_name :
- msnlwrf
- units :
- W m**-2
Array Chunk Bytes 730.81 MiB 800.00 kiB Shape (93544, 64, 32) (100, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - mean_surface_net_short_wave_radiation_flux(time, longitude, latitude)float32dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
- long_name :
- Mean surface net short-wave radiation flux
- short_name :
- msnswrf
- units :
- W m**-2
Array Chunk Bytes 730.81 MiB 800.00 kiB Shape (93544, 64, 32) (100, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - mean_surface_sensible_heat_flux(time, longitude, latitude)float32dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
- long_name :
- Mean surface sensible heat flux
- short_name :
- msshf
- units :
- W m**-2
Array Chunk Bytes 730.81 MiB 800.00 kiB Shape (93544, 64, 32) (100, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - mean_top_downward_short_wave_radiation_flux(time, longitude, latitude)float32dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
- long_name :
- Mean top downward short-wave radiation flux
- short_name :
- mtdwswrf
- units :
- W m**-2
Array Chunk Bytes 730.81 MiB 800.00 kiB Shape (93544, 64, 32) (100, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - mean_top_net_long_wave_radiation_flux(time, longitude, latitude)float32dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
- long_name :
- Mean top net long-wave radiation flux
- short_name :
- mtnlwrf
- units :
- W m**-2
Array Chunk Bytes 730.81 MiB 800.00 kiB Shape (93544, 64, 32) (100, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - mean_top_net_short_wave_radiation_flux(time, longitude, latitude)float32dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
- long_name :
- Mean top net short-wave radiation flux
- short_name :
- mtnswrf
- units :
- W m**-2
Array Chunk Bytes 730.81 MiB 800.00 kiB Shape (93544, 64, 32) (100, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - mean_vertically_integrated_moisture_divergence(time, longitude, latitude)float32dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
- long_name :
- Mean vertically integrated moisture divergence
- short_name :
- mvimd
- units :
- kg m**-2 s**-1
Array Chunk Bytes 730.81 MiB 800.00 kiB Shape (93544, 64, 32) (100, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - potential_vorticity(time, level, longitude, latitude)float32dask.array<chunksize=(100, 13, 64, 32), meta=np.ndarray>
- long_name :
- Potential vorticity
- short_name :
- pv
- units :
- K m**2 kg**-1 s**-1
Array Chunk Bytes 9.28 GiB 10.16 MiB Shape (93544, 13, 64, 32) (100, 13, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - relative_humidity(time, level, longitude, latitude)float32dask.array<chunksize=(100, 13, 64, 32), meta=np.ndarray>
Array Chunk Bytes 9.28 GiB 10.16 MiB Shape (93544, 13, 64, 32) (100, 13, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - sea_ice_cover(time, longitude, latitude)float32dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
- long_name :
- Sea ice area fraction
- short_name :
- siconc
- standard_name :
- sea_ice_area_fraction
- units :
- (0 - 1)
Array Chunk Bytes 730.81 MiB 800.00 kiB Shape (93544, 64, 32) (100, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - sea_surface_temperature(time, longitude, latitude)float32dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
- long_name :
- Sea surface temperature
- short_name :
- sst
- units :
- K
Array Chunk Bytes 730.81 MiB 800.00 kiB Shape (93544, 64, 32) (100, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - slope_of_sub_gridscale_orography(longitude, latitude)float32dask.array<chunksize=(64, 32), meta=np.ndarray>
- long_name :
- Slope of sub-gridscale orography
- short_name :
- slor
- units :
- ~
Array Chunk Bytes 8.00 kiB 8.00 kiB Shape (64, 32) (64, 32) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray - snow_depth(time, longitude, latitude)float32dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
- long_name :
- Snow depth
- short_name :
- sd
- standard_name :
- lwe_thickness_of_surface_snow_amount
- units :
- m of water equivalent
Array Chunk Bytes 730.81 MiB 800.00 kiB Shape (93544, 64, 32) (100, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - soil_type(longitude, latitude)float32dask.array<chunksize=(64, 32), meta=np.ndarray>
- long_name :
- Soil type
- short_name :
- slt
- units :
- ~
Array Chunk Bytes 8.00 kiB 8.00 kiB Shape (64, 32) (64, 32) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray - specific_humidity(time, level, longitude, latitude)float32dask.array<chunksize=(100, 13, 64, 32), meta=np.ndarray>
- long_name :
- Specific humidity
- short_name :
- q
- standard_name :
- specific_humidity
- units :
- kg kg**-1
Array Chunk Bytes 9.28 GiB 10.16 MiB Shape (93544, 13, 64, 32) (100, 13, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - standard_deviation_of_filtered_subgrid_orography(longitude, latitude)float32dask.array<chunksize=(64, 32), meta=np.ndarray>
- long_name :
- Standard deviation of filtered subgrid orography
- short_name :
- sdfor
- units :
- m
Array Chunk Bytes 8.00 kiB 8.00 kiB Shape (64, 32) (64, 32) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray - standard_deviation_of_orography(longitude, latitude)float32dask.array<chunksize=(64, 32), meta=np.ndarray>
- long_name :
- Standard deviation of orography
- short_name :
- sdor
- units :
- m
Array Chunk Bytes 8.00 kiB 8.00 kiB Shape (64, 32) (64, 32) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray - surface_pressure(time, longitude, latitude)float32dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
- long_name :
- Surface pressure
- short_name :
- sp
- standard_name :
- surface_air_pressure
- units :
- Pa
Array Chunk Bytes 730.81 MiB 800.00 kiB Shape (93544, 64, 32) (100, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - temperature(time, level, longitude, latitude)float32dask.array<chunksize=(100, 13, 64, 32), meta=np.ndarray>
- long_name :
- Temperature
- short_name :
- t
- standard_name :
- air_temperature
- units :
- K
Array Chunk Bytes 9.28 GiB 10.16 MiB Shape (93544, 13, 64, 32) (100, 13, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - total_cloud_cover(time, longitude, latitude)float32dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
- long_name :
- Total cloud cover
- short_name :
- tcc
- standard_name :
- cloud_area_fraction
- units :
- (0 - 1)
Array Chunk Bytes 730.81 MiB 800.00 kiB Shape (93544, 64, 32) (100, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - total_column_vapor(time, longitude, latitude)float32dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
Array Chunk Bytes 730.81 MiB 800.00 kiB Shape (93544, 64, 32) (100, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - total_column_water(time, longitude, latitude)float32dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
- long_name :
- Total column water
- short_name :
- tcw
- units :
- kg m**-2
Array Chunk Bytes 730.81 MiB 800.00 kiB Shape (93544, 64, 32) (100, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - total_column_water_vapour(time, longitude, latitude)float32dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
- long_name :
- Total column vertically-integrated water vapour
- short_name :
- tcwv
- standard_name :
- lwe_thickness_of_atmosphere_mass_content_of_water_vapor
- units :
- kg m**-2
Array Chunk Bytes 730.81 MiB 800.00 kiB Shape (93544, 64, 32) (100, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - total_precipitation_12hr(time, longitude, latitude)float32dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
- long_name :
- Total precipitation
- short_name :
- tp
- units :
- m
Array Chunk Bytes 730.81 MiB 800.00 kiB Shape (93544, 64, 32) (100, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - total_precipitation_24hr(time, longitude, latitude)float32dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
- long_name :
- Total precipitation
- short_name :
- tp
- units :
- m
Array Chunk Bytes 730.81 MiB 800.00 kiB Shape (93544, 64, 32) (100, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - total_precipitation_6hr(time, longitude, latitude)float32dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
- long_name :
- Total precipitation
- short_name :
- tp
- units :
- m
Array Chunk Bytes 730.81 MiB 800.00 kiB Shape (93544, 64, 32) (100, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - type_of_high_vegetation(longitude, latitude)float32dask.array<chunksize=(64, 32), meta=np.ndarray>
- long_name :
- Type of high vegetation
- short_name :
- tvh
- units :
- ~
Array Chunk Bytes 8.00 kiB 8.00 kiB Shape (64, 32) (64, 32) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray - type_of_low_vegetation(longitude, latitude)float32dask.array<chunksize=(64, 32), meta=np.ndarray>
- long_name :
- Type of low vegetation
- short_name :
- tvl
- units :
- ~
Array Chunk Bytes 8.00 kiB 8.00 kiB Shape (64, 32) (64, 32) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray - u_component_of_wind(time, level, longitude, latitude)float32dask.array<chunksize=(100, 13, 64, 32), meta=np.ndarray>
- long_name :
- U component of wind
- short_name :
- u
- standard_name :
- eastward_wind
- units :
- m s**-1
Array Chunk Bytes 9.28 GiB 10.16 MiB Shape (93544, 13, 64, 32) (100, 13, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - v_component_of_wind(time, level, longitude, latitude)float32dask.array<chunksize=(100, 13, 64, 32), meta=np.ndarray>
- long_name :
- V component of wind
- short_name :
- v
- standard_name :
- northward_wind
- units :
- m s**-1
Array Chunk Bytes 9.28 GiB 10.16 MiB Shape (93544, 13, 64, 32) (100, 13, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - vertical_velocity(time, level, longitude, latitude)float32dask.array<chunksize=(100, 13, 64, 32), meta=np.ndarray>
- long_name :
- Vertical velocity
- short_name :
- w
- standard_name :
- lagrangian_tendency_of_air_pressure
- units :
- Pa s**-1
Array Chunk Bytes 9.28 GiB 10.16 MiB Shape (93544, 13, 64, 32) (100, 13, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - volumetric_soil_water_layer_1(time, longitude, latitude)float32dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
- long_name :
- Volumetric soil water layer 1
- short_name :
- swvl1
- units :
- m**3 m**-3
Array Chunk Bytes 730.81 MiB 800.00 kiB Shape (93544, 64, 32) (100, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - volumetric_soil_water_layer_2(time, longitude, latitude)float32dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
- long_name :
- Volumetric soil water layer 2
- short_name :
- swvl2
- units :
- m**3 m**-3
Array Chunk Bytes 730.81 MiB 800.00 kiB Shape (93544, 64, 32) (100, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - volumetric_soil_water_layer_3(time, longitude, latitude)float32dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
- long_name :
- Volumetric soil water layer 3
- short_name :
- swvl3
- units :
- m**3 m**-3
Array Chunk Bytes 730.81 MiB 800.00 kiB Shape (93544, 64, 32) (100, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - volumetric_soil_water_layer_4(time, longitude, latitude)float32dask.array<chunksize=(100, 64, 32), meta=np.ndarray>
- long_name :
- Volumetric soil water layer 4
- short_name :
- swvl4
- units :
- m**3 m**-3
Array Chunk Bytes 730.81 MiB 800.00 kiB Shape (93544, 64, 32) (100, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - vorticity(time, level, longitude, latitude)float32dask.array<chunksize=(100, 13, 64, 32), meta=np.ndarray>
Array Chunk Bytes 9.28 GiB 10.16 MiB Shape (93544, 13, 64, 32) (100, 13, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray - wind_speed(time, level, longitude, latitude)float32dask.array<chunksize=(100, 13, 64, 32), meta=np.ndarray>
Array Chunk Bytes 9.28 GiB 10.16 MiB Shape (93544, 13, 64, 32) (100, 13, 64, 32) Dask graph 936 chunks in 2 graph layers Data type float32 numpy.ndarray
- latitudePandasIndex
PandasIndex(Index([ -87.18750000000003, -81.56250000000001, -75.9375, -70.31249999999999, -64.68750000000001, -59.0625, -53.4375, -47.8125, -42.1875, -36.5625, -30.937499999999996, -25.312500000000004, -19.687499999999996, -14.062499999999991, -8.437499999999996, -2.812500000000003, 2.812500000000003, 8.437500000000009, 14.062500000000004, 19.687499999999996, 25.312500000000004, 30.93750000000001, 36.562499999999986, 42.1875, 47.8125, 53.4375, 59.062500000000014, 64.68750000000001, 70.3125, 75.9375, 81.56249999999997, 87.18750000000003], dtype='float64', name='latitude')) - levelPandasIndex
PandasIndex(Index([50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 850, 925, 1000], dtype='int64', name='level'))
- longitudePandasIndex
PandasIndex(Index([ 0.0, 5.625, 11.25, 16.875, 22.5, 28.125, 33.75, 39.375, 45.0, 50.625, 56.25, 61.87499999999999, 67.5, 73.125, 78.75, 84.375, 90.0, 95.625, 101.25, 106.875, 112.5, 118.125, 123.74999999999999, 129.375, 135.0, 140.625, 146.25, 151.875, 157.5, 163.125, 168.75, 174.375, 180.0, 185.625, 191.25, 196.875, 202.5, 208.125, 213.75, 219.375, 225.0, 230.62499999999997, 236.25, 241.875, 247.49999999999997, 253.125, 258.75, 264.375, 270.0, 275.625, 281.25, 286.875, 292.5, 298.125, 303.75, 309.375, 315.0, 320.625, 326.25, 331.875, 337.5, 343.125, 348.75, 354.375], dtype='float64', name='longitude')) - timePandasIndex
PandasIndex(DatetimeIndex(['1959-01-01 00:00:00', '1959-01-01 06:00:00', '1959-01-01 12:00:00', '1959-01-01 18:00:00', '1959-01-02 00:00:00', '1959-01-02 06:00:00', '1959-01-02 12:00:00', '1959-01-02 18:00:00', '1959-01-03 00:00:00', '1959-01-03 06:00:00', ... '2023-01-08 12:00:00', '2023-01-08 18:00:00', '2023-01-09 00:00:00', '2023-01-09 06:00:00', '2023-01-09 12:00:00', '2023-01-09 18:00:00', '2023-01-10 00:00:00', '2023-01-10 06:00:00', '2023-01-10 12:00:00', '2023-01-10 18:00:00'], dtype='datetime64[ns]', name='time', length=93544, freq=None))
Create a data preparation pipeline.#
We can use pipelines to create a reproducable and explainable process to prepare data for specific tasks. Pipelines could also be shared as templates to complete similar work.
The pipeline below:
Selects the ‘u’ and ‘v’ wind variables at 500 and 850 hPa, as well as 2 meter temperature.
Selects the ‘geopotential’ and ‘vorticity’ atmospheric variables at 850 hPa.
Merges these into a single dataset.
Sorts the variables into a specified order.
Applies a coordinate transformation to ensure latitude is formated as 0-360 degrees (not -180-180 degrees) using the
StandardLongitudeclass.Reverses the data by ‘level’ coordinate using the
ReIndexclass.
[4]:
data_preparation = pyearthtools.pipeline.Pipeline(
(
WB2ERA5(resolution="64x32", variables=["u", "v", "t2m"], level=[500, 850]),
WB2ERA5(resolution="64x32", variables=["geopotential", "vorticity"], level=850),
),
ops.xarray.Merge(),
ops.xarray.Sort(
[
"u_component_of_wind",
"v_component_of_wind",
"2m_temperature",
"geopotential",
"vorticity",
]
),
ops.Transforms(
apply=coords.StandardLongitude(type="0-360") + coords.ReIndex(level="reversed")
),
# These methods will be explained when we create a pipeline for machine learning.
# ops.xarray.reshape.CoordinateFlatten('level'),
# ops.xarray.conversion.ToNumpy(),
# ops.numpy.reshape.Squeeze(1),
)
[5]:
# Inspect the data_preparation pipeline which also visualises the pipeline as a graph.
data_preparation
Pipeline
Description `pyearthtools.pipeline` Data Pipeline
Initialisation
exceptions_to_ignore None
iterator None
sampler None
Steps
weatherbench.WB2ERA5 {'WB2ERA5': {'chunks': "'auto'", 'download_dir': 'None', 'level': '[500, 850]', 'url': "'gs://weatherbench2/datasets/era5/1959-2023_01_10-6h-64x32_equiangular_conservative.zarr'", 'variables': "['u', 'v', 't2m']"}}
weatherbench.WB2ERA5[1] {'WB2ERA5': {'chunks': "'auto'", 'download_dir': 'None', 'level': '850', 'url': "'gs://weatherbench2/datasets/era5/1959-2023_01_10-6h-64x32_equiangular_conservative.zarr'", 'variables': "['geopotential', 'vorticity']"}}
join.Merge {'Merge': {'merge_kwargs': 'None'}}
sort.Sort {'Sort': {'order': "['u_component_of_wind', 'v_component_of_wind', '2m_temperature', 'geopotential', 'vorticity']", 'strict': 'False'}}
transforms.Transforms {'Transforms': {'apply': {'TransformCollection': {'StandardLongitude': {'longitude_name': "'longitude'", 'type': "'0-360'"}, 'ReIndex': {'coordinates': 'None', 'level': "'reversed'"}}}, 'transforms': 'None', 'undo': 'None'}}Pipeline
- exceptions_to_ignoreNoneNone
- iteratorNoneNone
- samplerNoneNone
- weatherbench.WB2ERA5{'WB2ERA5': {'chunks': 'auto', 'download_dir': None, 'level': [500, 850], 'url': 'gs://weatherbench2/datasets/era5/1959-2023_01_10-6h-64x32_equiangular_conservative.zarr', 'variables': ['u', 'v', 't2m']}}WB2ERA5 : {'chunks': 'auto', 'download_dir': None, 'level': [500, 850], 'url': 'gs://weatherbench2/datasets/era5/1959-2023_01_10-6h-64x32_equiangular_conservative.zarr', 'variables': ['u', 'v', 't2m']}
- weatherbench.WB2ERA5[1]{'WB2ERA5': {'chunks': 'auto', 'download_dir': None, 'level': 850, 'url': 'gs://weatherbench2/datasets/era5/1959-2023_01_10-6h-64x32_equiangular_conservative.zarr', 'variables': ['geopotential', 'vorticity']}}WB2ERA5 : {'chunks': 'auto', 'download_dir': None, 'level': 850, 'url': 'gs://weatherbench2/datasets/era5/1959-2023_01_10-6h-64x32_equiangular_conservative.zarr', 'variables': ['geopotential', 'vorticity']}
- join.Merge{'Merge': {'merge_kwargs': None}}Merge : {'merge_kwargs': None}
- sort.Sort{'Sort': {'order': ['u_component_of_wind', 'v_component_of_wind', '2m_temperature', 'geopotential', 'vorticity'], 'strict': False}}Sort : {'order': ['u_component_of_wind', 'v_component_of_wind', '2m_temperature', 'geopotential', 'vorticity'], 'strict': False}
- transforms.Transforms{'Transforms': {'apply': {'TransformCollection': {'StandardLongitude': {'longitude_name': 'longitude', 'type': '0-360'}, 'ReIndex': {'coordinates': None, 'level': 'reversed'}}}, 'transforms': None, 'undo': None}}Transforms : {'apply': {'TransformCollection': {'StandardLongitude': {'longitude_name': 'longitude', 'type': '0-360'}, 'ReIndex': {'coordinates': None, 'level': 'reversed'}}}, 'transforms': None, 'undo': None}
Graph
[6]:
# Use the pipeline to create a sample for a specific date.
sample = data_preparation["20120102T00"]
sample
[6]:
<xarray.Dataset> Size: 58kB
Dimensions: (latitude: 32, level: 2, longitude: 64, time: 1)
Coordinates:
* latitude (latitude) float64 256B -87.19 -81.56 ... 81.56 87.19
* level (level) int64 16B 850 500
* longitude (longitude) float64 512B 0.0 5.625 ... 348.8 354.4
* time (time) datetime64[ns] 8B 2012-01-02
Data variables:
u_component_of_wind (time, level, longitude, latitude) float32 16kB dask.array<chunksize=(1, 2, 64, 32), meta=np.ndarray>
v_component_of_wind (time, level, longitude, latitude) float32 16kB dask.array<chunksize=(1, 2, 64, 32), meta=np.ndarray>
2m_temperature (time, longitude, latitude) float32 8kB dask.array<chunksize=(1, 64, 32), meta=np.ndarray>
geopotential (time, longitude, latitude) float32 8kB dask.array<chunksize=(1, 64, 32), meta=np.ndarray>
vorticity (time, longitude, latitude) float32 8kB dask.array<chunksize=(1, 64, 32), meta=np.ndarray>- latitude: 32
- level: 2
- longitude: 64
- time: 1
- latitude(latitude)float64-87.19 -81.56 ... 81.56 87.19
array([-87.1875, -81.5625, -75.9375, -70.3125, -64.6875, -59.0625, -53.4375, -47.8125, -42.1875, -36.5625, -30.9375, -25.3125, -19.6875, -14.0625, -8.4375, -2.8125, 2.8125, 8.4375, 14.0625, 19.6875, 25.3125, 30.9375, 36.5625, 42.1875, 47.8125, 53.4375, 59.0625, 64.6875, 70.3125, 75.9375, 81.5625, 87.1875]) - level(level)int64850 500
array([850, 500])
- longitude(longitude)float640.0 5.625 11.25 ... 348.8 354.4
array([ 0. , 5.625, 11.25 , 16.875, 22.5 , 28.125, 33.75 , 39.375, 45. , 50.625, 56.25 , 61.875, 67.5 , 73.125, 78.75 , 84.375, 90. , 95.625, 101.25 , 106.875, 112.5 , 118.125, 123.75 , 129.375, 135. , 140.625, 146.25 , 151.875, 157.5 , 163.125, 168.75 , 174.375, 180. , 185.625, 191.25 , 196.875, 202.5 , 208.125, 213.75 , 219.375, 225. , 230.625, 236.25 , 241.875, 247.5 , 253.125, 258.75 , 264.375, 270. , 275.625, 281.25 , 286.875, 292.5 , 298.125, 303.75 , 309.375, 315. , 320.625, 326.25 , 331.875, 337.5 , 343.125, 348.75 , 354.375]) - time(time)datetime64[ns]2012-01-02
array(['2012-01-02T00:00:00.000000000'], dtype='datetime64[ns]')
- u_component_of_wind(time, level, longitude, latitude)float32dask.array<chunksize=(1, 2, 64, 32), meta=np.ndarray>
- long_name :
- U component of wind
- short_name :
- u
- standard_name :
- eastward_wind
- units :
- m s**-1
Array Chunk Bytes 16.00 kiB 16.00 kiB Shape (1, 2, 64, 32) (1, 2, 64, 32) Dask graph 1 chunks in 8 graph layers Data type float32 numpy.ndarray - v_component_of_wind(time, level, longitude, latitude)float32dask.array<chunksize=(1, 2, 64, 32), meta=np.ndarray>
- long_name :
- V component of wind
- short_name :
- v
- standard_name :
- northward_wind
- units :
- m s**-1
Array Chunk Bytes 16.00 kiB 16.00 kiB Shape (1, 2, 64, 32) (1, 2, 64, 32) Dask graph 1 chunks in 8 graph layers Data type float32 numpy.ndarray - 2m_temperature(time, longitude, latitude)float32dask.array<chunksize=(1, 64, 32), meta=np.ndarray>
- long_name :
- 2 metre temperature
- short_name :
- t2m
- units :
- K
Array Chunk Bytes 8.00 kiB 8.00 kiB Shape (1, 64, 32) (1, 64, 32) Dask graph 1 chunks in 6 graph layers Data type float32 numpy.ndarray - geopotential(time, longitude, latitude)float32dask.array<chunksize=(1, 64, 32), meta=np.ndarray>
- long_name :
- Geopotential
- short_name :
- z
- standard_name :
- geopotential
- units :
- m**2 s**-2
Array Chunk Bytes 8.00 kiB 8.00 kiB Shape (1, 64, 32) (1, 64, 32) Dask graph 1 chunks in 7 graph layers Data type float32 numpy.ndarray - vorticity(time, longitude, latitude)float32dask.array<chunksize=(1, 64, 32), meta=np.ndarray>
Array Chunk Bytes 8.00 kiB 8.00 kiB Shape (1, 64, 32) (1, 64, 32) Dask graph 1 chunks in 7 graph layers Data type float32 numpy.ndarray
- latitudePandasIndex
PandasIndex(Index([ -87.18750000000003, -81.56250000000001, -75.9375, -70.31249999999999, -64.68750000000001, -59.0625, -53.4375, -47.8125, -42.1875, -36.5625, -30.937499999999996, -25.312500000000004, -19.687499999999996, -14.062499999999991, -8.437499999999996, -2.812500000000003, 2.812500000000003, 8.437500000000009, 14.062500000000004, 19.687499999999996, 25.312500000000004, 30.93750000000001, 36.562499999999986, 42.1875, 47.8125, 53.4375, 59.062500000000014, 64.68750000000001, 70.3125, 75.9375, 81.56249999999997, 87.18750000000003], dtype='float64', name='latitude')) - levelPandasIndex
PandasIndex(Index([850, 500], dtype='int64', name='level'))
- longitudePandasIndex
PandasIndex(Index([ 0.0, 5.625, 11.25, 16.875, 22.5, 28.125, 33.75, 39.375, 45.0, 50.625, 56.25, 61.87499999999999, 67.5, 73.125, 78.75, 84.375, 90.0, 95.625, 101.25, 106.875, 112.5, 118.125, 123.74999999999999, 129.375, 135.0, 140.625, 146.25, 151.875, 157.5, 163.125, 168.75, 174.375, 180.0, 185.625, 191.25, 196.875, 202.5, 208.125, 213.75, 219.375, 225.0, 230.62499999999997, 236.25, 241.875, 247.49999999999997, 253.125, 258.75, 264.375, 270.0, 275.625, 281.25, 286.875, 292.5, 298.125, 303.75, 309.375, 315.0, 320.625, 326.25, 331.875, 337.5, 343.125, 348.75, 354.375], dtype='float64', name='longitude')) - timePandasIndex
PandasIndex(DatetimeIndex(['2012-01-02'], dtype='datetime64[ns]', name='time', freq=None))
[7]:
# Inspect vorticity DataArray variable.
sample.vorticity
[7]:
<xarray.DataArray 'vorticity' (time: 1, longitude: 64, latitude: 32)> Size: 8kB dask.array<getitem, shape=(1, 64, 32), dtype=float32, chunksize=(1, 64, 32), chunktype=numpy.ndarray> Coordinates: * latitude (latitude) float64 256B -87.19 -81.56 -75.94 ... 81.56 87.19 * longitude (longitude) float64 512B 0.0 5.625 11.25 ... 343.1 348.8 354.4 * time (time) datetime64[ns] 8B 2012-01-02
- time: 1
- longitude: 64
- latitude: 32
- dask.array<chunksize=(1, 64, 32), meta=np.ndarray>
Array Chunk Bytes 8.00 kiB 8.00 kiB Shape (1, 64, 32) (1, 64, 32) Dask graph 1 chunks in 7 graph layers Data type float32 numpy.ndarray - latitude(latitude)float64-87.19 -81.56 ... 81.56 87.19
array([-87.1875, -81.5625, -75.9375, -70.3125, -64.6875, -59.0625, -53.4375, -47.8125, -42.1875, -36.5625, -30.9375, -25.3125, -19.6875, -14.0625, -8.4375, -2.8125, 2.8125, 8.4375, 14.0625, 19.6875, 25.3125, 30.9375, 36.5625, 42.1875, 47.8125, 53.4375, 59.0625, 64.6875, 70.3125, 75.9375, 81.5625, 87.1875]) - longitude(longitude)float640.0 5.625 11.25 ... 348.8 354.4
array([ 0. , 5.625, 11.25 , 16.875, 22.5 , 28.125, 33.75 , 39.375, 45. , 50.625, 56.25 , 61.875, 67.5 , 73.125, 78.75 , 84.375, 90. , 95.625, 101.25 , 106.875, 112.5 , 118.125, 123.75 , 129.375, 135. , 140.625, 146.25 , 151.875, 157.5 , 163.125, 168.75 , 174.375, 180. , 185.625, 191.25 , 196.875, 202.5 , 208.125, 213.75 , 219.375, 225. , 230.625, 236.25 , 241.875, 247.5 , 253.125, 258.75 , 264.375, 270. , 275.625, 281.25 , 286.875, 292.5 , 298.125, 303.75 , 309.375, 315. , 320.625, 326.25 , 331.875, 337.5 , 343.125, 348.75 , 354.375]) - time(time)datetime64[ns]2012-01-02
array(['2012-01-02T00:00:00.000000000'], dtype='datetime64[ns]')
- latitudePandasIndex
PandasIndex(Index([ -87.18750000000003, -81.56250000000001, -75.9375, -70.31249999999999, -64.68750000000001, -59.0625, -53.4375, -47.8125, -42.1875, -36.5625, -30.937499999999996, -25.312500000000004, -19.687499999999996, -14.062499999999991, -8.437499999999996, -2.812500000000003, 2.812500000000003, 8.437500000000009, 14.062500000000004, 19.687499999999996, 25.312500000000004, 30.93750000000001, 36.562499999999986, 42.1875, 47.8125, 53.4375, 59.062500000000014, 64.68750000000001, 70.3125, 75.9375, 81.56249999999997, 87.18750000000003], dtype='float64', name='latitude')) - longitudePandasIndex
PandasIndex(Index([ 0.0, 5.625, 11.25, 16.875, 22.5, 28.125, 33.75, 39.375, 45.0, 50.625, 56.25, 61.87499999999999, 67.5, 73.125, 78.75, 84.375, 90.0, 95.625, 101.25, 106.875, 112.5, 118.125, 123.74999999999999, 129.375, 135.0, 140.625, 146.25, 151.875, 157.5, 163.125, 168.75, 174.375, 180.0, 185.625, 191.25, 196.875, 202.5, 208.125, 213.75, 219.375, 225.0, 230.62499999999997, 236.25, 241.875, 247.49999999999997, 253.125, 258.75, 264.375, 270.0, 275.625, 281.25, 286.875, 292.5, 298.125, 303.75, 309.375, 315.0, 320.625, 326.25, 331.875, 337.5, 343.125, 348.75, 354.375], dtype='float64', name='longitude')) - timePandasIndex
PandasIndex(DatetimeIndex(['2012-01-02'], dtype='datetime64[ns]', name='time', freq=None))
[ ]: