Tutorial: ENSO Pipeline#

This notebook shows how the data preprocessing steps performed in the ENSO_Forecast notebook can be replicated using the built-in PyEarthTools pipeline.

The purpose is to use the PyEarthTools pipeline to:

  1. load ERA5 global data,

  2. crop the data within the Nino3.4 spatial domain, and

  3. calculate the spatial mean for each time step.

This pipeline is applied to ERA5 monthly temperature data. The next step is to calculate the Nino3.4 time series.


[4]:
# Necessary import
import pyearthtools.data
import pyearthtools.pipeline as petpipe
import site_archive_nci
import warnings
import xarray as xr
#import plotly.express as px
import matplotlib.pyplot as plt

Load and process ERA5 2-metre air temperature (t2m) data. We want to calculate the Niño3.4 index from the t2m anomalies in the region 5°N–5°S, 170°W–120°W.#

[5]:
# Total time frame for training the model
start = '1970-01-01'
end = '2024-12-31'

# Start by considering a specific date for visualising the process on one sample
doi = '2021-06-09T06' # Note - if you change 'product' to 'reanalysis' you can get the 6-hour timesteps
variables_of_interest = ['2t']
product = 'monthly-averaged'
accessor = pyearthtools.data.archive.ERA5(variables_of_interest, product=product)
accessor[doi][variables_of_interest[0]].plot()
/opt/conda/envs/pet/lib/python3.11/site-packages/pyearthtools/data/indexes/_indexes.py:809: IndexWarning: Data requested at a higher resolution than available. hour > month
  warnings.warn(
[5]:
<matplotlib.collections.QuadMesh at 0x1507c9d2dad0>
../../../_images/notebooks_tutorial_ENSO_Tutorial_ENSO_Pipeline_4_2.png
[6]:
min_lat = -5
max_lat = 5
min_lon = -170
max_lon = -120

# Helper transform to compute the spatial mean
class PointMean(pyearthtools.data.transforms.Transform):

    def apply(self, dataset, **kwargs):
       # dataset['2t'] = dataset['2t'].mean(dim=['latitude', 'longitude'])
        dataset[variables_of_interest[0]] = dataset[variables_of_interest[0]].mean(dim=['latitude', 'longitude'])
        return dataset


# Perform the preprocessing steps in one pipeline: load, crop, and average
pipeline = petpipe.Pipeline(
    accessor,
    pyearthtools.data.transform.region.Bounding(min_lat, max_lat, min_lon, max_lon), # keep only data within the bounding box
    PointMean(), # Calculate spatial mean
    sampler=pyearthtools.pipeline.samplers.Default(),
    iterator=pyearthtools.pipeline.iterators.DateRange(start, end, interval='1 month')     # Retrieve monthly data from 1970 to the end of 2024
)
pipeline



Pipeline
    Description                    `pyearthtools.pipeline` Data Pipeline


    Initialisation
             exceptions_to_ignore           None
             iterator                       {'DateRange': {'allowlist': 'None', 'blocklist': 'None', 'end': "'2024-12-31'", 'interval': "'1 month'", 'start': "'1970-01-01'"}}
             sampler                        {'Default': {}}
    Steps
             ERA5                           {'ERA5': {'level_value': 'None', 'product': "'monthly-averaged'", 'variables': "['2t']"}}
             region.Bounding                {'Bounding': {'max_lat': '5', 'max_lon': '-120', 'min_lat': '-5', 'min_lon': '-170'}}
             __main__.PointMean             {'PointMean': {}}

Graph

../../../_images/notebooks_tutorial_ENSO_Tutorial_ENSO_Pipeline_5_2.svg
[3]:
%%time
# Takes around ten seconds
all_steps = None
with warnings.catch_warnings():
    warnings.simplefilter("ignore") # suppress warnings during pipeline execution
    all_steps = list(pipeline) # execute the pipeline and store all results in a list
CPU times: user 11.6 s, sys: 1.78 s, total: 13.4 s
Wall time: 43 s
[9]:
# extract the '2t' variable (variable of interest) from each pipeline output
all_temps = [s[variables_of_interest[0]] for s in all_steps]

# concatenate all temperature datasets along the time dimension
ds = xr.concat(all_temps, dim='time')
[10]:
# plot the time series
ds.plot.line(color="purple", marker="o")
[10]:
[<matplotlib.lines.Line2D at 0x1507ba420f50>]
../../../_images/notebooks_tutorial_ENSO_Tutorial_ENSO_Pipeline_8_1.png

Next, you can replicate the steps in ENSO_Forecast notebook from 4. Calculate the Niño3.4 index onward.#

[12]:
# Convert to pandas DataFrame with 3 columns 'year' 'month' and 't2m' (i.e. 2t timeseries)
df = ds.to_dataframe(name='t2m').reset_index()

# Extract year and month from 'time'
df['year'] = df['time'].dt.year
df['month'] = df['time'].dt.month

# Keep required columns
t2_df = df[['year', 'month', 't2m']]

print("\nPrinting first 10 rows of t2_df:")
t2_df.head(10)

Printing first 10 rows of t2_df:
[12]:
year month t2m
0 1970 1 298.799964
1 1970 2 298.884142
2 1970 3 298.863824
3 1970 4 299.425736
4 1970 5 299.650770
5 1970 6 299.316589
6 1970 7 298.237027
7 1970 8 298.046338
8 1970 9 297.680157
9 1970 10 297.906577

Calculate the Niño3.4 index#

[14]:
# Calculate monthly climatology
monthly_clim = t2_df.groupby('month')['t2m'].mean()
print("Printing monthly climatology (mean t2m by month):")
monthly_clim

# Assign the climatology value to each row
t2_df['monthly_clim'] = t2_df['month'].map(monthly_clim)

# Calculate monthly anomalies
t2_df['anom'] = t2_df['t2m'] - t2_df['monthly_clim']

# Minus 5-year moving average to remove trend
t2_df['anom_detrended'] = t2_df['anom'] - t2_df['anom'].rolling(window=60, center=True, min_periods=1).mean()

# Apply 5 month running average
t2_df['anom_smoothed'] = t2_df['anom_detrended'].rolling(window=5, center=True, min_periods=1).mean()

# Normalise by the standard deviation of the time series
std_val = t2_df['anom_smoothed'].std()
t2_df['nino3.4'] = t2_df['anom_smoothed'] / std_val

print("\nPrinting first 10 rows of t2_df:")
t2_df.head(10)
Printing monthly climatology (mean t2m by month):

Printing first 10 rows of t2_df:
[14]:
year month t2m monthly_clim anom anom_detrended anom_smoothed nino3.4
0 1970 1 298.799964 298.664784 0.135180 0.797918 0.568150 0.885935
1 1970 2 298.884142 298.846780 0.037362 0.653045 0.515593 0.803981
2 1970 3 298.863824 299.178853 -0.315029 0.253487 0.496437 0.774111
3 1970 4 299.425736 299.585783 -0.160047 0.357922 0.371018 0.578540
4 1970 5 299.650770 299.701533 -0.050762 0.419813 0.114479 0.178510
5 1970 6 299.316589 299.568230 -0.251641 0.170821 -0.045365 -0.070739
6 1970 7 298.237027 299.242830 -1.005803 -0.629650 -0.278800 -0.434741
7 1970 8 298.046338 298.925642 -0.879304 -0.545729 -0.470833 -0.734185
8 1970 9 297.680157 298.792747 -1.112591 -0.809253 -0.699846 -1.091294
9 1970 10 297.906577 298.731676 -0.825098 -0.540353 -0.793144 -1.236776
[ ]: