Training the LUCIE model

Training the LUCIE model#

LUCIE is a climate model developed by Haiwen Guan, Troy Arcomano, Ashesh Chattopadhyay and Romit Maulik (2024). See their preprint at https://doi.org/10.48550/arXiv.2405.16297 and the archive of their training data, code and results here https://doi.org/10.5281/zenodo.15164648.

The code in PyEarthTools was based on their code repository at ISCLPennState/LUCIE, which is made available under the MIT license (see the PyEarthTools NOTICE file for full information on this point)

LUCIE is a model which of interest to climate researchers due to its long-term stability for rollouts for many decades. This model is licensed in a compatible fashion, so we are able to provide a bundled, customised version of LUCIE which can be used within the PyEarthTools framework, integrated with its data pipelines and configurable to work flexibly.

We have only just begun the process of this integration, and so for now the model does not make extensive use of the PyEarthTools classes. This is expected to change fairly quickly, and as this happens, this notebook will be updated. However, in the interests of providing the bundled version to the community as soon as possible for those already seeking to work with the model, we present it in a “work in progress” fashion.

You need to manually download the original published dataset from Zenodo, and update the paths in this notebook to point to them. The initial focus will be on reproducing the paper fairly closely using the same data and only slightly modified code (changes to support more devices and updates for compatibility), true enough to the original. Subsequently, we will develop the code further to be adaptable to new data sources.

The intention is to:

  • [done] Supply the source code to train and run the model in PyEarthTools

  • [done] Validate that the model can train without obvious code-level errors

  • Validate inference and reproduce the training results to ensure the trained model is valid

  • Support library updates and other changes

  • Support multiple ML backends beyond CUDA

  • Support connection to multiple data sources through PET data accessors

  • Move the normalisation into a PET pipeline so it can be easily modified and experimented with

If you would like to know more, or get involved with this work, please let us know on the issue tracker

[1]:
import lucie
import torch
[2]:
from pathlib import Path
import numpy as np
[3]:
device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
device = torch.device("cuda:0" if torch.cuda.is_available() else device)
[4]:
regridded_path = Path.home() / 'dev/data/lucie' / 'era5_T30_regridded.npz'
[5]:
regridded_data = lucie.train.load_data(regridded_path)
[6]:
preprocessed_path = Path.home() / 'dev/data/lucie' / 'era5_T30_preprocessed.npz'
preprocessed_data = np.load(preprocessed_path)
[8]:
model = lucie.train.load_data_and_train(device, regridded_data, preprocessed_data, debug_sample_limit=50, n_epochs=2)
Starting Training
  0%|                                                                                                                                | 0/2 [00:00<?, ?it/s]
  0%|                                                                                                                               | 0/50 [00:00<?, ?it/s]
  6%|███████▏                                                                                                               | 3/50 [00:00<00:01, 29.47it/s]
 32%|█████████████████████████████████████▊                                                                                | 16/50 [00:00<00:00, 84.68it/s]
 58%|███████████████████████████████████████████████████████████████████▊                                                 | 29/50 [00:00<00:00, 103.67it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 104.68it/s]
 50%|████████████████████████████████████████████████████████████                                                            | 1/2 [00:19<00:19, 19.24s/it]
2 year rollout bias tensor(nan, device='mps:0')

  0%|                                                                                                                               | 0/50 [00:00<?, ?it/s]
 24%|████████████████████████████                                                                                         | 12/50 [00:00<00:00, 118.86it/s]
 50%|██████████████████████████████████████████████████████████▌                                                          | 25/50 [00:00<00:00, 121.39it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 121.63it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:38<00:00, 19.23s/it]
2 year rollout bias tensor(nan, device='mps:0')
[9]:
torch.save(model.state_dict(), "model.pth")
[ ]: