Tutorial: ENSO Pipeline

Tutorial: ENSO Pipeline#

This notebook shows how the data preprocessing steps performed in the ENSO_Forecast notebook can be replicated using the built-in PyEarthTools pipeline.

The purpose is to use the PyEarthTools pipeline to:

load ERA5 global data,
crop the data within the Nino3.4 spatial domain, and
calculate the spatial mean for each time step.

This pipeline is applied to ERA5 monthly temperature data. The next step is to calculate the Nino3.4 time series.

[4]:

# Necessary import
import pyearthtools.data
import pyearthtools.pipeline as petpipe
import site_archive_nci
import warnings
import xarray as xr
#import plotly.express as px
import matplotlib.pyplot as plt

Load and process ERA5 2-metre air temperature (t2m) data. We want to calculate the Niño3.4 index from the t2m anomalies in the region 5°N–5°S, 170°W–120°W.#

[5]:

# Total time frame for training the model
start = '1970-01-01'
end = '2024-12-31'

# Start by considering a specific date for visualising the process on one sample
doi = '2021-06-09T06' # Note - if you change 'product' to 'reanalysis' you can get the 6-hour timesteps
variables_of_interest = ['2t']
product = 'monthly-averaged'
accessor = pyearthtools.data.archive.ERA5(variables_of_interest, product=product)
accessor[doi][variables_of_interest[0]].plot()

/opt/conda/envs/pet/lib/python3.11/site-packages/pyearthtools/data/indexes/_indexes.py:809: IndexWarning: Data requested at a higher resolution than available. hour > month
  warnings.warn(

[5]:

<matplotlib.collections.QuadMesh at 0x1507c9d2dad0>

../../../_images/notebooks_tutorial_ENSO_Tutorial_ENSO_Pipeline_4_2.png

[6]:

min_lat = -5
max_lat = 5
min_lon = -170
max_lon = -120

# Helper transform to compute the spatial mean
class PointMean(pyearthtools.data.transforms.Transform):

    def apply(self, dataset, **kwargs):
       # dataset['2t'] = dataset['2t'].mean(dim=['latitude', 'longitude'])
        dataset[variables_of_interest[0]] = dataset[variables_of_interest[0]].mean(dim=['latitude', 'longitude'])
        return dataset


# Perform the preprocessing steps in one pipeline: load, crop, and average
pipeline = petpipe.Pipeline(
    accessor,
    pyearthtools.data.transform.region.Bounding(min_lat, max_lat, min_lon, max_lon), # keep only data within the bounding box
    PointMean(), # Calculate spatial mean
    sampler=pyearthtools.pipeline.samplers.Default(),
    iterator=pyearthtools.pipeline.iterators.DateRange(start, end, interval='1 month')     # Retrieve monthly data from 1970 to the end of 2024
)
pipeline

Pipeline
    Description                    `pyearthtools.pipeline` Data Pipeline


    Initialisation
             exceptions_to_ignore           None
             iterator                       {'DateRange': {'allowlist': 'None', 'blocklist': 'None', 'end': "'2024-12-31'", 'interval': "'1 month'", 'start': "'1970-01-01'"}}
             sampler                        {'Default': {}}
    Steps
             ERA5                           {'ERA5': {'level_value': 'None', 'product': "'monthly-averaged'", 'variables': "['2t']"}}
             region.Bounding                {'Bounding': {'max_lat': '5', 'max_lon': '-120', 'min_lat': '-5', 'min_lon': '-170'}}
             __main__.PointMean             {'PointMean': {}}

Graph

../../../_images/notebooks_tutorial_ENSO_Tutorial_ENSO_Pipeline_5_2.svg

[3]:

%%time
# Takes around ten seconds
all_steps = None
with warnings.catch_warnings():
    warnings.simplefilter("ignore") # suppress warnings during pipeline execution
    all_steps = list(pipeline) # execute the pipeline and store all results in a list

CPU times: user 11.6 s, sys: 1.78 s, total: 13.4 s
Wall time: 43 s

[9]:

# extract the '2t' variable (variable of interest) from each pipeline output
all_temps = [s[variables_of_interest[0]] for s in all_steps]

# concatenate all temperature datasets along the time dimension
ds = xr.concat(all_temps, dim='time')

[10]:

# plot the time series
ds.plot.line(color="purple", marker="o")

[10]:

[<matplotlib.lines.Line2D at 0x1507ba420f50>]

../../../_images/notebooks_tutorial_ENSO_Tutorial_ENSO_Pipeline_8_1.png

Next, you can replicate the steps in ENSO_Forecast notebook from 4. Calculate the Niño3.4 index onward.#

[12]:

# Convert to pandas DataFrame with 3 columns 'year' 'month' and 't2m' (i.e. 2t timeseries)
df = ds.to_dataframe(name='t2m').reset_index()

# Extract year and month from 'time'
df['year'] = df['time'].dt.year
df['month'] = df['time'].dt.month

# Keep required columns
t2_df = df[['year', 'month', 't2m']]

print("\nPrinting first 10 rows of t2_df:")
t2_df.head(10)


Printing first 10 rows of t2_df:

[12]:

	year	month	t2m
0	1970	1	298.799964
1	1970	2	298.884142
2	1970	3	298.863824
3	1970	4	299.425736
4	1970	5	299.650770
5	1970	6	299.316589
6	1970	7	298.237027
7	1970	8	298.046338
8	1970	9	297.680157
9	1970	10	297.906577

Calculate the Niño3.4 index#

[14]:

# Calculate monthly climatology
monthly_clim = t2_df.groupby('month')['t2m'].mean()
print("Printing monthly climatology (mean t2m by month):")
monthly_clim

# Assign the climatology value to each row
t2_df['monthly_clim'] = t2_df['month'].map(monthly_clim)

# Calculate monthly anomalies
t2_df['anom'] = t2_df['t2m'] - t2_df['monthly_clim']

# Minus 5-year moving average to remove trend
t2_df['anom_detrended'] = t2_df['anom'] - t2_df['anom'].rolling(window=60, center=True, min_periods=1).mean()

# Apply 5 month running average
t2_df['anom_smoothed'] = t2_df['anom_detrended'].rolling(window=5, center=True, min_periods=1).mean()

# Normalise by the standard deviation of the time series
std_val = t2_df['anom_smoothed'].std()
t2_df['nino3.4'] = t2_df['anom_smoothed'] / std_val

print("\nPrinting first 10 rows of t2_df:")
t2_df.head(10)

Printing monthly climatology (mean t2m by month):

Printing first 10 rows of t2_df:

[14]:

	year	month	t2m	monthly_clim	anom	anom_detrended	anom_smoothed	nino3.4
0	1970	1	298.799964	298.664784	0.135180	0.797918	0.568150	0.885935
1	1970	2	298.884142	298.846780	0.037362	0.653045	0.515593	0.803981
2	1970	3	298.863824	299.178853	-0.315029	0.253487	0.496437	0.774111
3	1970	4	299.425736	299.585783	-0.160047	0.357922	0.371018	0.578540
4	1970	5	299.650770	299.701533	-0.050762	0.419813	0.114479	0.178510
5	1970	6	299.316589	299.568230	-0.251641	0.170821	-0.045365	-0.070739
6	1970	7	298.237027	299.242830	-1.005803	-0.629650	-0.278800	-0.434741
7	1970	8	298.046338	298.925642	-0.879304	-0.545729	-0.470833	-0.734185
8	1970	9	297.680157	298.792747	-1.112591	-0.809253	-0.699846	-1.091294
9	1970	10	297.906577	298.731676	-0.825098	-0.540353	-0.793144	-1.236776

[ ]: