Pipeline Entrypoints#

As PyEarthTools pipelines propose a generic way to load and prepare various earth system datasets, it is possible to use a pipeline as a source for anemoi-datasets.

Example#

Below is a minimal example of using a PyEarthTools pipeline to load data and prepare it for anemoi, please see the anemoi docs for more information on the datasets config.

Create the Pipeline in PyEarthTools#

.. code-block:: python

import pyearthtools.data
import pyearthtools.pipeline

pipeline = pyearthtools.pipeline.Pipeline(
    pyearthtools.data.download.arcoera5.ARCOERA5(['t2m', 'u10', 'v10']),
    pyearthtools.pipeline.operations.xarray.values.FillNan()
)
pipeline.save('/PATH/TO/PIPELINE.yaml')

Create the anemoi-datasets config#

.. code-block:: yaml

name: pyearthtools_to_anemoi
description: PyEarthTools Pipeline converted to Anemoi
attribution: PyEarthTools

dates:
    start: '2025-11-10T00:00:00'
    end: '2025-11-12T00:00:00'
    frequency: 1h

input:
    pyearthtools: # Use the pyearthtools input object
        pipeline: /PATH/TO/PIPELINE.yaml

Run anemoi-datasets#

.. code-block:: bash

anemoi-datasets create /path/to/anemoi/dataset.yaml

Function Contract#

The expected contract and result from the PyEarthTools pipeline is to return an xarray object of a single time index.

Both tools provide methods to modify the metadata of the data, and should be used accordingly to prepare for downstream uses.