Exploring Other Neural Net Approaches

Exploring Other Neural Net Approaches#

This notebook demonstrates how to make use of the gridded nature of the source data in ENSO prediction using a gridded MLP, and provides a gateway to more complex neural network architecture such as CNNs, ResNets or others of your own devising. The simple MLP example provided has skill, but is less accurate than the simple time-series models from previous steps. It is interesting to consider how this might challenge preconceptions around model complexity and the effectiveness of black-box approaches to modelling, while simultaneously inviting users to experiment with alternative ML architectures and starting to work with larger ML models.

[1]:

# Necessary import
import pyearthtools.data
import pyearthtools.pipeline as petpipe
import site_archive_nci
import warnings
import xarray as xr
import numpy as np # np.cos and np.sin
import matplotlib.pyplot as plt

import pandas as pd

import numpy as np
import scipy.stats

import scores
import torch

# Total time frame for training the model
start = '1970-01'
end = '2024-12'

# Start by considering a specific date for visualising the process on one sample
doi = '2021-06' # Note - Requesting by month only
variables_of_interest = ['2t']
product = 'monthly-averaged'
accessor = pyearthtools.data.archive.ERA5(variables_of_interest, product=product)
# accessor[doi]['2t'].plot()

Neural Network Design#

The following example is a fully-connected network with a variety of activation functions. It has not been optimised for the problem at hand. Simple designs are often useful in establishing a benchmark performance for a neural network approach. This network takes a lot of space in GPU memory because of it being fully-connected. A lot of the work in neural network architecture has been to achieve increased accuracy while simulteneously reducing the size of the network in GPU memory (VRAM).

Starting simple like this is a good way to introduce a problem. A good next step would be to replace the model design below with an alternative such as a CNN.

[10]:

class MyRegressor(torch.nn.Module):
    """
    Create a simple network
    """

    def __init__(self, input_length):
        super(MyRegressor, self).__init__()

        self.input_length = input_length

        self.model = torch.nn.Sequential(
            torch.nn.Linear(self.input_length, self.input_length),
            torch.nn.Sigmoid(),
            torch.nn.Linear(self.input_length, 128),
            torch.nn.ReLU(),
            torch.nn.Linear(128, 64),
            torch.nn.Sigmoid(),
            torch.nn.Linear(64, 1)
        )

    def __call__(self, x):
        return self.model(x)

[11]:

length = sample[0].shape[0]
model = MyRegressor(length)

[12]:

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(torch.cuda.is_available())
print(device)

True
cuda:0

[13]:

model = model.to(device)

[14]:

# Loss and optimizer
criterion = torch.nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

[15]:

print(X_train.shape)
print(y_train.shape)

torch.Size([491, 24723])
torch.Size([491, 1])

[16]:

# Training loop with loss tracking
n_epochs = 80
train_losses = []
val_losses = []
print_per = 10

i = 0
for epoch in range(n_epochs):

    i += 1
    if i % print_per == 0:
        print(f"Commencing epoch {i}")

    # Train
    X_train_torch = X_train.to(device)
    y_train_torch = y_train.to(device)
    optimizer.zero_grad()
    output = model(X_train_torch)
    loss = criterion(output, y_train_torch)
    loss.backward()
    optimizer.step()
    train_losses.append(loss.item())

    # Validate on latest training
    model.eval()
    X_test_torch = X_test.to(device)
    y_test_torch = y_test.to(device)
    with torch.no_grad():
        val_output = model(X_test_torch)
        val_loss = criterion(val_output, y_test_torch)
        val_losses.append(val_loss.item())
    model.train()  # back to training mode

Commencing epoch 10
Commencing epoch 20
Commencing epoch 30
Commencing epoch 40
Commencing epoch 50
Commencing epoch 60
Commencing epoch 70
Commencing epoch 80

[17]:

plt.figure(figsize=(6, 4))
plt.plot(train_losses, label='Train Loss')
plt.plot(val_losses, label='Validation Loss')
plt.xlabel('Training Step')
plt.ylabel('MSE Loss')
plt.title('Training vs Validation Loss')
plt.legend()
plt.grid(True)
plt.show()

../../../_images/notebooks_tutorial_ENSO_Tutorial_ENSO_Gridded_MLP_19_0.png

Commentary on the training#

Here we can see overfitting clearly showing itself somewhere between epoch 40 and 60 (it varies quite a bit between runs). The validation loss starts to oscillate significantly. Even though the validation loss appears to continue to trend towards an improvement, this is still a sign that the gains are unlikely to be reliable in practise.

[18]:

y_pred = model(X_test.to(device)).to('cpu')

[19]:

# correlation = np.corrcoef(y_test, y_pred)
plt.scatter(x=y_test.detach().numpy().reshape(167), y=y_pred.detach().numpy().reshape(167))

[19]:

<matplotlib.collections.PathCollection at 0x145c38739150>

../../../_images/notebooks_tutorial_ENSO_Tutorial_ENSO_Gridded_MLP_22_1.png

[20]:

model.eval()
with torch.no_grad():
    y_pred_tensor = model(X_test.to(device)).to('cpu')
    y_pred = y_pred_tensor.numpy().flatten()

[21]:

a = y_test.reshape(167)
b = y_pred.reshape(167)

[22]:

# Quick evaluation of model performance using correlation and RMSE
correlation = np.corrcoef(a, b)
rmse = scores.continuous.rmse(a, b)

# print(f"Correlation between predicted and actual values: {correlation:.3f}")
# print(f"Root Mean Squared Error (RMSE): {rmse:.3f}")

[23]:

correlation[0, 1]

[23]:

np.float64(0.9301738606015924)

[24]:

rmse

[24]:

tensor(0.2749)

Results#

These results are similar to, but not better than, the single-variable time-series model that was developed in the previous example.

[ ]:

[ ]:

Exploring Other Neural Net Approaches

Contents

Exploring Other Neural Net Approaches#

Separating the input pipeline from the target pipeline#

Pipeline

Graph

Neural Network Design#

Commentary on the training#

Results#