Dataset Transforms#

pyearthtools.data provides an interface to apply transformations to any dataset at the time of loading. Commonly used transformations, like region cutting, masking, filtering, interpolation and more can be found at pyearthtools.data.transforms.

However, it is also possible for a user to define their own transform to apply.

[1]:
import pyearthtools.data

import warnings

with warnings.catch_warnings(action="ignore"):
    import site_archive_nci

Variables#

[2]:
doi = '2022-01-01T00'
var = 't'

Default Transforms#

All inbuilt DataIndexes use these transforms to force data into a standard format and for their own individual filtering’s.

ERA5’s base_transforms are shown below,

[3]:
petvar = pyearthtools.data.archive.ERA5(var)
[4]:
petvar.base_transforms
[4]:
TransformCollection
    Initialisation                 A Collection of Transforms to be applied to Data
             apply_default                  False
             intelligence_level             100
    Transforms
             coordinates.StandardCoordinateNames {'StandardCoordinateNames': {'latitude': "['lat', 'Latitude', 'yt_ocean', 'yt']", 'longitude': "['lon', 'Longitude', 'xt_ocean', 'xt']", 'replacement_dictionary': 'None', 'time': "['Time']"}}
             attributes.Rename              {'Rename': {'names': {'t2m': "'2t'", 'u10': "'10u'", 'v10': "'10v'", 'siconc': "'ci'"}}}

Using prebuilt Transforms#

Region Cutting shows how to use a prebuilt Transform for cutting a dataset to a region of interest.

Defining Your Own Transform#

As different project or use cases may require different Transforms to be applied to the data, there a couple of ways to define and use a custom Transform.

Function#

The first & easiest way is to pass a function to the transform argument, either in DataIndex initialisation or retrieval call. This function will automatically be wrapped with a FunctionTransform and added to the TransformCollection.

This function must have the signature:

def function(dataset) -> type(dataset)
[5]:
def custom_transform(dataset):
    """This applies a custom Transform to mark the dataset"""
    dataset.attrs['Transform Mark'] = True
    return dataset

First, lets add it is a base_transform, to be applied to any data retrieval call using that DataIndex

[6]:
pyearthtools.data.archive.ERA5(var, transforms = custom_transform).base_transforms
[6]:
TransformCollection
    Initialisation                 A Collection of Transforms to be applied to Data
             apply_default                  False
             intelligence_level             100
    Transforms
             coordinates.StandardCoordinateNames {'StandardCoordinateNames': {'latitude': "['lat', 'Latitude', 'yt_ocean', 'yt']", 'longitude': "['lon', 'Longitude', 'xt_ocean', 'xt']", 'replacement_dictionary': 'None', 'time': "['Time']"}}
             attributes.Rename              {'Rename': {'names': {'t2m': "'2t'", 'u10': "'10u'", 'v10': "'10v'", 'siconc': "'ci'"}}}
             transform.FunctionTransform    {'FunctionTransform': {'function': '<function custom_transform at 0x14d0a10f7420>'}}

Then, lets just pass it as a Transform to that one data retrieval call

[7]:
petvar = pyearthtools.data.archive.ERA5(var)                          # Create the basic dataset accessor
queriedAndTransformed = petvar(doi, transforms = custom_transform)    # Query a date of interest, and specify an additional transform
print(queriedAndTransformed.attrs['Transform Mark'])                  # Check that the tranform mark is present in the data attributes
queriedAndTransformed                                                 # Render the dataset summary, including the whole attributes
True
[7]:
<xarray.Dataset> Size: 307MB
Dimensions:    (longitude: 1440, latitude: 721, level: 37, time: 1)
Coordinates:
  * longitude  (longitude) float32 6kB -180.0 -179.8 -179.5 ... 179.5 179.8
  * latitude   (latitude) float32 3kB 90.0 89.75 89.5 ... -89.5 -89.75 -90.0
  * level      (level) int32 148B 1 2 3 5 7 10 20 ... 875 900 925 950 975 1000
  * time       (time) datetime64[ns] 8B 2022-01-01
Data variables:
    t          (time, level, latitude, longitude) float64 307MB dask.array<chunksize=(1, 5, 405, 900), meta=np.ndarray>
Attributes:
    Conventions:     CF-1.6
    history:         2022-05-02 20:13:06 UTC+1000 by era5_replication_tools-1...
    license:         Licence to use Copernicus Products: https://apps.ecmwf.i...
    summary:         ERA5 is the fifth generation ECMWF atmospheric reanalysi...
    title:           ERA5 pressure-levels oper temperature 20220101-20220131
    Transform Mark:  True

Transform Class#

For more complex Transforms, the Transform Class can be implemented.

A user must implement the .apply(dataset) function, and must have the same signature as above:

def apply(dataset) -> type(dataset)

It is also important to note, that these Transforms can be used independently just like a function.

[8]:
class CustomTransform(pyearthtools.data.transform.Transform):
    """Custom Transform Class to mark the dataset"""
    def __init__(self, value):
        self.value = value
    def apply(self, dataset):
        dataset.attrs['Transform Mark'] = self.value
        return dataset
Using the CustomTransform#
[9]:
pyearthtools.data.archive.ERA5(var, level = 'single', transforms = CustomTransform('Tranformation Description')).base_transforms
[9]:
TransformCollection
    Initialisation                 A Collection of Transforms to be applied to Data
             apply_default                  False
             intelligence_level             100
    Transforms
             coordinates.StandardCoordinateNames {'StandardCoordinateNames': {'latitude': "['lat', 'Latitude', 'yt_ocean', 'yt']", 'longitude': "['lon', 'Longitude', 'xt_ocean', 'xt']", 'replacement_dictionary': 'None', 'time': "['Time']"}}
             attributes.Rename              {'Rename': {'names': {'t2m': "'2t'", 'u10': "'10u'", 'v10': "'10v'", 'siconc': "'ci'"}}}
             __main__.CustomTransform       {'CustomTransform': {}}
[10]:
pyearthtools.data.archive.ERA5(var, level = 'single')(doi, transforms = CustomTransform('Tranformation Description')).attrs
[10]:
{'Conventions': 'CF-1.6',
 'history': '2022-05-02 20:13:06 UTC+1000 by era5_replication_tools-1.10.0: mv /g/data/rt52/admin/incoming/era5/pressure-levels/reanalysis/t/2022/t_era5_oper_pl_20220101-20220131.tmp /g/data/rt52/era5/pressure-levels/reanalysis/t/2022/t_era5_oper_pl_20220101-20220131.nc\n2022-05-02 19:53:00 UTC+1000 by era5_replication_tools-1.10.0: nccopy -ctime/1,level/1,latitude/81,longitude/180 -m2G -h2G -k4 -d2 -s /g/data/id28/admin/incoming/era5/pressure-levels/reanalysis/t/2022/t_era5_oper_pl_20220101-20220131.nc /g/data/rt52/admin/incoming/era5/pressure-levels/reanalysis/t/2022/t_era5_oper_pl_20220101-20220131.tmp\n2022-05-02 17:41:58 UTC+1000 by era5_replication_tools-1.10.0: curl --connect-timeout 20 --show-error --silent --max-time 36000 -o /g/data/id28/admin/incoming/era5/pressure-levels/reanalysis/t/2022/t_era5_oper_pl_20220101-20220131.nc https://download-0007-clone.copernicus-climate.eu/cache-compute-0007/cache/data1/adaptor.mars.internal-1651474330.584116-6641-2-664a6a58-9b32-4a75-b37d-83e6a03fb328.nc\n2022-05-02 07:22:28 GMT by grib_to_netcdf-2.24.3: /opt/ecmwf/mars-client/bin/grib_to_netcdf -S param -o /cache/data1/adaptor.mars.internal-1651474330.584116-6641-2-664a6a58-9b32-4a75-b37d-83e6a03fb328.nc /cache/tmp/664a6a58-9b32-4a75-b37d-83e6a03fb328-adaptor.mars.internal-1651472072.7816172-6641-3-tmp.grib',
 'license': 'Licence to use Copernicus Products: https://apps.ecmwf.int/datasets/licences/copernicus/',
 'summary': 'ERA5 is the fifth generation ECMWF atmospheric reanalysis of the global climate. This file is part of the ERA5 replica hosted at NCI Australia. For more information please see http://dx.doi.org/10.25914/5f48874388857',
 'title': 'ERA5 pressure-levels oper temperature 20220101-20220131',
 'Transform Mark': 'Tranformation Description'}
[11]:
CustomTransform('Tranformation Description')(pyearthtools.data.archive.ERA5(var, )(doi)).attrs
[11]:
{'Conventions': 'CF-1.6',
 'history': '2022-05-02 20:13:06 UTC+1000 by era5_replication_tools-1.10.0: mv /g/data/rt52/admin/incoming/era5/pressure-levels/reanalysis/t/2022/t_era5_oper_pl_20220101-20220131.tmp /g/data/rt52/era5/pressure-levels/reanalysis/t/2022/t_era5_oper_pl_20220101-20220131.nc\n2022-05-02 19:53:00 UTC+1000 by era5_replication_tools-1.10.0: nccopy -ctime/1,level/1,latitude/81,longitude/180 -m2G -h2G -k4 -d2 -s /g/data/id28/admin/incoming/era5/pressure-levels/reanalysis/t/2022/t_era5_oper_pl_20220101-20220131.nc /g/data/rt52/admin/incoming/era5/pressure-levels/reanalysis/t/2022/t_era5_oper_pl_20220101-20220131.tmp\n2022-05-02 17:41:58 UTC+1000 by era5_replication_tools-1.10.0: curl --connect-timeout 20 --show-error --silent --max-time 36000 -o /g/data/id28/admin/incoming/era5/pressure-levels/reanalysis/t/2022/t_era5_oper_pl_20220101-20220131.nc https://download-0007-clone.copernicus-climate.eu/cache-compute-0007/cache/data1/adaptor.mars.internal-1651474330.584116-6641-2-664a6a58-9b32-4a75-b37d-83e6a03fb328.nc\n2022-05-02 07:22:28 GMT by grib_to_netcdf-2.24.3: /opt/ecmwf/mars-client/bin/grib_to_netcdf -S param -o /cache/data1/adaptor.mars.internal-1651474330.584116-6641-2-664a6a58-9b32-4a75-b37d-83e6a03fb328.nc /cache/tmp/664a6a58-9b32-4a75-b37d-83e6a03fb328-adaptor.mars.internal-1651472072.7816172-6641-3-tmp.grib',
 'license': 'Licence to use Copernicus Products: https://apps.ecmwf.int/datasets/licences/copernicus/',
 'summary': 'ERA5 is the fifth generation ECMWF atmospheric reanalysis of the global climate. This file is part of the ERA5 replica hosted at NCI Australia. For more information please see http://dx.doi.org/10.25914/5f48874388857',
 'title': 'ERA5 pressure-levels oper temperature 20220101-20220131',
 'Transform Mark': 'Tranformation Description'}

Transform Collections#

To apply multiple Transforms, a TransformCollection can be made.

Adding any function or Transform to a Transform will automatically create a new TransformCollection.

When calling this Collection, each Transform is applied in order. This Collection also implements many of the common list functions, such as pop, remove & append

[12]:
collection = CustomTransform('CT1') + CustomTransform('CT2')
collection
[12]:
TransformCollection
    Initialisation                 A Collection of Transforms to be applied to Data
             apply_default                  False
             intelligence_level             100
    Transforms
             __main__.CustomTransform       {'CustomTransform': {}}
             __main__.CustomTransform[1]    {'CustomTransform': {}}
[13]:
collection.pop()
[13]:
CustomTransform
    Initialisation                 Custom Transform Class to mark the dataset
[14]:
collection
[14]:
TransformCollection
    Initialisation                 A Collection of Transforms to be applied to Data
             apply_default                  False
             intelligence_level             100
    Transforms
             __main__.CustomTransform       {'CustomTransform': {}}
[ ]: