Additional Pipeline Syntaxes

Additional Pipeline Syntaxes#

This notebooks introduces syntaxes to ease creation and manipulation of Pipelineobjects:

named pipelines,
combination using | operation,
reversing pipelines.

[1]:

import numpy as np

def repr_ndarray(arr):
    return f"array(..., shape={arr.shape}, dtype={arr.dtype})"

np.set_printoptions(override_repr=repr_ndarray)

[2]:

import pyearthtools.data
import pyearthtools.pipeline

To illustrate these features, we’ll reuse the same pipeline as the one used in the End-to-end CNN Training Example. Here is the original definition of the pipeline.

[3]:

pyearthtools.pipeline.Pipeline(
    pyearthtools.data.download.weatherbench.WB2ERA5(
        variables=["2m_temperature", "u", "v", "geopotential", "vorticity"],
        level=[850],
        license_ok=True,
    ),
    pyearthtools.pipeline.operations.xarray.Sort(
        ["2m_temperature", "u_component_of_wind", "v_component_of_wind", "vorticity", "geopotential"]
    ),
    pyearthtools.data.transforms.coordinates.StandardLongitude(type="0-360"),
    pyearthtools.pipeline.operations.xarray.reshape.CoordinateFlatten(["level"]),
    pyearthtools.pipeline.modifications.TemporalRetrieval(
        concat=True, samples=((0, 1), (6, 1))
    ),
    pyearthtools.pipeline.operations.xarray.conversion.ToNumpy(),
    pyearthtools.pipeline.operations.numpy.reshape.Rearrange("c t h w -> t c h w"),
    pyearthtools.pipeline.operations.numpy.reshape.Squeeze(axis=0),
)

Graph

../../_images/notebooks_pipeline_Patterns_4_2.svg

Named pipelines#

When developing a new pipeline, it can be convenient to separate the main stages of a long pipeline into these sub-pipelines, and assemble them into one big pipeline afterwards. However, once the pipeline has been assembled, we loose access to the sub-pipelines. To solve this, we can add a name to each of the sub-pipelines. Then, in the final pipeline, we can recover them via the .named attribute, which is a dictionary of all the named sub-pipelines contained in a pipeline.

In the following example, we build the same pipeline but split into 3 stages:

a named pipeline “prepare”, to fetch the data and apply few transformation on it,
a temporal retrieval step, to generate the tuple of (features, target) samples,
a named pipeline “reshape”, to do the final convertion to numpy and reshaping.

[4]:

pipeline = pyearthtools.pipeline.Pipeline(
    pyearthtools.pipeline.Pipeline(
        pyearthtools.data.download.weatherbench.WB2ERA5(
            variables=["2m_temperature", "u", "v", "geopotential", "vorticity"],
            level=[850],
            license_ok=True,
        ),
        pyearthtools.pipeline.operations.xarray.Sort(
            ["2m_temperature", "u_component_of_wind", "v_component_of_wind", "vorticity", "geopotential"]
        ),
        pyearthtools.data.transforms.coordinates.StandardLongitude(type="0-360"),
        pyearthtools.pipeline.operations.xarray.reshape.CoordinateFlatten(["level"]),
        name="prepare"
    ),
    pyearthtools.pipeline.modifications.TemporalRetrieval(concat=True, samples=((0, 1), (6, 1))),
    pyearthtools.pipeline.Pipeline(
        pyearthtools.pipeline.operations.xarray.conversion.ToNumpy(),
        pyearthtools.pipeline.operations.numpy.reshape.Rearrange("c t h w -> t c h w"),
        pyearthtools.pipeline.operations.numpy.reshape.Squeeze(axis=0),
        name="reshape"
    ),
)
pipeline

Graph

../../_images/notebooks_pipeline_Patterns_6_2.svg

We can inspect the .named attribute to see which named pipelines are accessible within a pipeline.

[5]:

pipeline.named.keys()

[5]:

dict_keys(['prepare', 'reshape'])

Then we can access the named pipeline “prepare” as follows.

[6]:

pipeline.named["prepare"]

Graph

../../_images/notebooks_pipeline_Patterns_10_2.svg

And even use it without the rest of the pipeline, as it includes a data source.

[7]:

pipeline.named["prepare"]["20210101T00"]

[7]:

<xarray.Dataset> Size: 42kB
Dimensions:                 (latitude: 32, longitude: 64, time: 1)
Coordinates:
  * latitude                (latitude) float64 256B -87.19 -81.56 ... 87.19
  * longitude               (longitude) float64 512B 0.0 5.625 ... 348.8 354.4
  * time                    (time) datetime64[ns] 8B 2021-01-01
Data variables:
    2m_temperature          (time, longitude, latitude) float32 8kB dask.array<chunksize=(1, 64, 32), meta=np.ndarray>
    u_component_of_wind850  (time, longitude, latitude) float32 8kB dask.array<chunksize=(1, 64, 32), meta=np.ndarray>
    v_component_of_wind850  (time, longitude, latitude) float32 8kB dask.array<chunksize=(1, 64, 32), meta=np.ndarray>
    vorticity850            (time, longitude, latitude) float32 8kB dask.array<chunksize=(1, 64, 32), meta=np.ndarray>
    geopotential850         (time, longitude, latitude) float32 8kB dask.array<chunksize=(1, 64, 32), meta=np.ndarray>
Attributes:
    level-dtype:  int64

Additional Pipeline Syntaxes

Contents

Additional Pipeline Syntaxes#

Pipeline

Graph

Named pipelines#

Pipeline

Graph

Pipeline

Graph

Pipe operator#

Pipeline

Graph

Reversed pipeline#

Pipeline

Graph

ReversedPipeline

End-to-end inference pipeline example#

Pipeline

Graph