Pipeline API How-To Guide

Pipeline API How-To Guide#

A pipeline is a Python iterator with the job of supplying input/output pairs to a machine learning framework, or to a PyEarthTools registered model interface.

It is somewhat similar to an IterableDataset in PyTorch, or a DataLoader in PyTorch Lightning. However, it can be used with not only these frameworks, but many others as well. The advantages of using a PyEarthTools pipeline are:

  • More modular and flexible if you want to use the same pipeline with multiple models

  • Capable of supplying data to PyTorch, XGBoost, MLX, Tensorflow and JAX

  • Includes pre-coded processing steps for many common operations

For more information, please see: