New Users Guide

New Users Guide#

Welcome new user! This document will continue to be updated based on user feedback. This table quickly explains how to get things done in PyEarthTools

Step	Without PyEarthTools	With PyEarthTools
Obtaining and loading data	Manual download or open from disk + data cleaning	Use or adapt in-built fetchers and openers (no cleaning)
Process, subset, augment, tranform and normalise data	Custom code	Use pre-defined validated Pipelines
Present your data to PyTorch or another framework as a Python iterator	Custom code to iterate over data	Pipelines are iterators
Define a machine learning model	Clone someone’s repo or define your own	Use a ‘bundled model’ or write your own
Denormalise model outputs	Manual code	Pipelines are reversible
Model evaluation	Use a separate framework	Use pre-defined evaluation to generate standard scorecards

This approach also allows your research to be more targetted into varying only the part of the end-to-end process which you are investigating. By starting with a baseline implementation to provide a strong basis for comparison and modifying only the relevant step, you can undertake a more controlled investigative process, confidently generating results from experimentation along the way.

Installation#

We strongly recommend using either a Conda or Python virtual environment.

Conda environment

Run the following commands to install PyEarthTools in a Conda environment:

git clone git@github.com:ACCESS-Community-Hub/PyEarthTools.git
conda create -y -p ./venv python graphviz
conda activate ./venv
pip install -r requirements.txt
cd notebooks
jupyter lab

Python virtual environment

Run the following commands to install PyEarthTools in a Python virtual environment:

git clone git@github.com:ACCESS-Community-Hub/PyEarthTools.git
python3 -m venv ./venv
source venv/bin/activate
pip install -r requirements.txt
cd notebooks
jupyter lab

Optional dependencies

Install Graphviz (not installable via pip) to display pipelines.

For other installation options, please refer to the installation guide.

Where to Start#

The tutorial “Train and run a simplified global weather model” is the best place to start if you are working in your own environment. This tutorial has been tested with a 4GB GPU, uses less than 3GB of training data, and each model training epoch will take between 10 and 25 minutes depending on your hardware. This tutorial will also work at NCI or on other HPC facilities.

If you are working at NCI, then “Blending Data from Multiple Sources” and “Working with Climate Data” are also good places to start. These tutorials both use very large data sets. These data sets are archived on disk at NCI so these tutorials are straightforward to run using NCI facilities.

Core Concepts in PyEarthTools#

A modelling project in PyEarthTools involves the following steps:

Fetching and loading data
Processing data for machine learning
Training a model
Evaluating the model

The “Train and run a simplified global weather model” tutorial demonstrates the first three of these steps. Guidance for new users on model evaluation will be added at a later date.