Zoo API Docs#
zoo#
- class pyearthtools.zoo.BaseForecastModel#
Setup
BaseForecastModelA child must at least implement the `.load` function to pass back a `pyearthtools.training.wrapper.Predictor`.
- ## Setup
- Setting
_default_config_pathprovides a default config path. This should be given, otherwise it must be set by the user each time.
- Setting
- Setting
_timesallows a model to specify which time deltas need to be retrieved for predictions. Used for live download.
- Setting
Setting
_download_pathsspecifies files to download.- Setting
_nameprovides the name of the model. It is best to set this identical to where the model is registered to. If not given, will be the class name. Use ‘/’ to set categories.
- Setting
- ###
_download_paths Setting
_download_pathsin the class will allow those assets to be automatically retrieved and stored. They are then accessible underneath a directory retrievable fromself.assets.If given as a str the last ‘/’ will be used as the name, or if given as a tuple, the first element is the link, and the second the name.
These paths can be to either a file or a zip file on a server or on the local machine.
If the assets should be downloaded each time, set
_redownload_each_timeto True.
## Config Folder The config folder for a model can use the following conventions to ease in setup
Data/- Location for all data loadersPipeline/- Location for all pipelinesIt is assumed that most data configs will have a pipeline name identically to them for loading and preparing the data, however, the following exception applies. If a
Dataconfig has a(), with a str inside, it represents a different data source, but the same pipeline, this is can be useful for setting by different sources of the same data, link for downloading, archived data or experiments.Additionally, any data with a
-represents an ancillary source, i.e. forcings and will not be included in the available data sources. Any text prior to the-represents the parent source and any after is it’s purpose.Getting
ancillary_pipelinewill give back a dictionary of ancillary pipelines associated with the chosen source.When creating a model subclass set
_default_config_pathto the default path.A user can provide a
config_pathduring__init__to allow access to user defined configs. This allows experiments to be easily run, and will follow the conventions outlined above. i.e. Providing a data config with()will use the base pipeline.### Examples Consider the following structure.
>>> ├── Data >>> │ ├── ERA5-Forcings.yaml >>> │ ├── ERA5(cds)-Forcings.yaml >>> │ ├── ERA5(cds).yaml >>> │ └── ERA5.yaml >>> └── Pipeline >>> └── ERA5.yaml
A user can request either
ERA5orERA5(cds)as the data source, these two sources are then loaded and usePipeline/ERA5.yamlas it’s pipeline.When getting
ancillary_pipeline, eitherERA5-ForcingsorERA5(cds)-Forcingswill be used, dependent on the data source as detailed above. If aPipeline/ERA5-Forcings.yamlexisted, both sources would then use this as their pipeline.### Configurable Config Path Using
pyearthtools.configthe paths in which config files forpyearthtools.zoocan be adjusted. This can be done by either settingconfigsin~/.config/pyearthtools/models.yamlor settingpyearthtools_MODELS__CONFIGSin the environment.An environment can define a list of paths split by
:atpyearthtools_MODELS__CONFIGS. These will be added to the valid pipelines, with the model class name added to the end.For most models this should be the full categorical path of the model, see each model for it’s
_name. If not set will be the class name.### Config Assignments Specifying a ‘{}’ after a config selection allows a user to specify replacement keys for the pipeline.
All keys in a pipeline need to be surrounded by ‘__’, so that a key
IDcorresponds in the config to__ID__, Say theERA5pipeline contains a key: ‘__ID__’, to allow a user to select a certain ID at the time of running, the config can be specified as:` 'ERA5{ID=42}' `This will replace
__ID__inside the config before it is loaded with ‘42’. The replacement value will be a str.#### Default Assignments | Key | Value | | — | —– | | pyearthtools_ASSETS | Asset path to this model | | pyearthtools_MODELS_DEFAULT_CONFIG | Default config path for a model | | FILE | Folder containing the loaded config | | OUTPUT_DIR | Output location as specified by the user |
If ‘:’ follows the KEY part and still within ‘__*__’, anything following will be considered the default value.
#### Class assignments Assignments like shown above can also be provided within
self._default_assignmentswhich will be used when loading aPipeline.` self._default_assignments = {ID = 42} `## Assets Assets will be saved at the location given in the config at
models.assets. This can be cnanged by either settingassetsin~/.config/pyearthtools/models.yamlor settingpyearthtools_MODELS__ASSETSin the environment.The model name is appended to this path, as specify only the overall
pyearthtoolsasset path.This asset path is then accessible from
self.assets.## Caching Inputs Setting
config_pathallows for the inputs of a model to be cached out before inference. This may be especially useful for sanity checking, or preloading downloaded data before switching to a compute node.data_cleanupdefines how to manage this cache, by default, will remove any data over 1 day old, and limit the directory size to10GB, seepyearthtools.data.indexes.CachingIndexfor more information.The model
_nameand pipeline name is automatically added to the path to prevent collisions. So:If
config_pathis/data/goes/here, and the model isModel/Name, with pipelinePipelineNameThe full path is
/data/goes/here/Model/Name/PipelineNameThe pattern of the cache will then take over.
Must be implemented by a child class to setup a model
A child must at least implement the
.loadfunction to pass back apyearthtools.training.wrapper.Predictorwrapper.- Parameters:
pipeline_name (Optional[str]) – Pipeline name to use, must be in
valid_pipelinepipeline – Already-loaded pipeline object (alternative to the pipeline name)
predictions (output Location to save)
config_path (Optional[os.PathLike]) – Override for config path to find Data & Pipelines. Defaults to None.
data_cache (Optional[os.PathLike]) – Location to set a data cache for, automatically adds model name & pipeline to path. Defaults to None.
data_cleanup (dict[str, Any] | str | None) – Config for cleanup for data_cache. Defaults to None.
delete_cache (Optional[bool]) – Delete all data in cache. Defaults to False.
download_assets (bool, optional) – Whether to download assets. Will be called anyway upon first call to
.indexDefaults to False.**kwargs (Any, optional) – All extra kwargs used when getting the DataIndex.
output (Optional[os.PathLike])
- Raises:
ValueError – If
pipelinenot in._valid_pipeline()and a valid loaded pipeline is not supplied
- property ancillary_pipeline: dict[str, 'pyearthtools.pipeline.Pipeline']#
Ancillary Pipelines
Get all ancillary pipelines associated with the selected one.
Ancillaries are marked with a ‘-’ with the prior representing the core, and the post the name of the ancillary.
- Returns:
Name of ancillary: Loaded Pipeline
- Return type:
(dict[str, pyearthtools.pipeline.Pipeline])
- property assets: Path#
Get assets directory. Set in config by
models.assets, therefore can be configured by the user in~/.config/pyearthtools/models.yaml, or by settingpyearthtools_MODELS__ASSETSin the environment.
- property cache: Path | None#
Get cache directory
- data(basetime)#
Get data from pipeline
Used to download for live runs
- Parameters:
basetime (str) – Time that a prediction would be run at
- Return type:
list[Any]
- download_assets()#
Download all assets in
_download_paths, and store in.assets- Return type:
None
- classmethod get_all_config_paths(config_path)#
Get all config paths associated with this model.
- Parameters:
config_path (PathLike | None) – Defined Config path to add.
- Returns:
All config paths
- Return type:
(tuple[Path, …])
- Raises:
ValueError – If no config paths found.
- classmethod get_config(key, default=None)#
Get config for
keyfrompyearthtools.config- Parameters:
key (str)
default (Any)
- Return type:
Any
- classmethod get_name()#
Get name of this class.
Can be overriden by setting
_name, if not given, will becls.__name__.- Return type:
str
- property index: pyearthtools.training.MLDataIndex#
Get pipeline as an
MLDataIndex
- classmethod is_valid_pipeline(pipeline_name, config_path=None)#
Check if
pipelineis a valid pipeline- Parameters:
pipeline – Pipeline name to check if valid
config_path (PathLike | None) – Path to search for configuration
pipeline_name (str)
- Returns:
If
pipelineis valid.- Return type:
(bool)
- abstractmethod load(*args, **kwargs)#
Load
pyearthtools.training.wrapper.Predictor, and provide kwargs forpyearthtools.training.MLDataIndex.Must accept user passed kwargs.
- Return type:
tuple[‘pyearthtools.training.wrapper.Predictor’, dict[str, Any]]
- load_pipeline(pipeline, data=True, ancillary=None, **kwargs)#
Hook to allow modification of how
pipelineis loaded.- Parameters:
pipeline (str) – Path to pipeline file to open.
data (bool) – If pipeline is the data source or pipeline.
ancillary (Optional[str]) – Name of ancillary pipeline if ancillary pipeline.
kwargs (Any) – Assignments to pass to
pyearthtools.pipeline.load
- Return type:
Returns: Loaded pipeline
- Usage:
A child model could override this to assign values within
__KEY__keys inside thePipeline. Or add a step.
- classmethod log()#
Model specific logger
- Return type:
Logger
- property pipeline: pyearthtools.pipeline.Pipeline#
Get pipeline as configured in the init.
- run(*args, **kwargs)#
Run model
Using pipeline, and overwritten load function, create a
DataIndexfor the model, and run a predictionAll args, and kwargs passed through
- Raises:
RuntimeError – If a DataNotFoundError occurs
- Returns:
Result of running the index
- Return type:
(Any)
- search(*args, **kwargs)#
Run a safe search on the index, skipping override
- Return type:
dict[str, Path]
- timer(title)#
Get timer context local to this object.
- Parameters:
title (str) – Name of timer
- Returns:
Timer context
- Return type:
(Timer)
- classmethod valid_pipelines(ancillary=False, *, config_path=None)#
Get valid pipeline list at
config_path.See
_valid_pipelinefor full docs.- Parameters:
ancillary (bool)
config_path (PathLike | None)
- pyearthtools.zoo.register(name, exists='warn')#
Register a custom model for
pyearthtools.zoo.Any registered model is accessible underneath
pyearthtools.zoo.Models.*By setting the key with ‘/’ the categories of the model can be set.
Example
>>> register('Category/MODEL')(MODEL) >>> # Accessible at `pyearthtools.zoo.Models.Category.MODEL`
- Parameters:
name (str) – Name under which the model should be registered. A warning is issued if this name conflicts with a preexisting model.
exists (Literal['warn', 'ignore', 'error'])
- Return type:
Callable[[…], Any]
zoo.exceptions#
- class pyearthtools.zoo.exceptions.ModelException#
Base model exception
- class pyearthtools.zoo.exceptions.ModelRegistrationException#
Model Registration exception
zoo.model#
- class pyearthtools.zoo.model.Timer#
Record and log the execution time of code within this context manager.
- Parameters:
title (str)
logger (logging.Logger | None)
zoo.predict#
- pyearthtools.zoo.predict.data(model, time, pipeline, data_cache=None, config_path=None, **kwargs)#
Get data needed for model to run,
Can be used to precache data for ‘live’ runs.
- Parameters:
model (str) – Model name to load
time (str) – Isoformat of time to get data for
pipeline (str) – Pipeline config to use
data_cache (Path | str | None) – Where to cache data. Defaults to None
config_path (Path | str | None) – Override for config path. Defaults to None
kwargs (dict[Any, Any]) – Extra keyword arguments to send to the model.
- Raises:
RuntimeError – If an error occured, catch it with nice error message.
- Returns:
Loaded data needed for the model.
- Return type:
list[Any]
- pyearthtools.zoo.predict.predict(model, time, pipeline_name, output, data_cache=None, config_path=None, **kwargs)#
Run a prediction for a given model, pipeline, and time.
- Parameters:
model (str) – Model name to load
time (str) – Isoformat of time to run prediction for
pipeline (str) – Pipeline config to use
output (Path | str) – Location to save data
data_cache (Path | str | None) – Where to cache data. Defaults to None
config_path (Path | str | None) – Override for config path. Defaults to None
kwargs (dict[Any, Any]) – Extra keyword arguments to send to the model.
pipeline_name (str)
- Raises:
RuntimeError – If an error occured, catch it with nice error message.
- Returns:
Loaded Predictions.
- Return type:
(Any)
zoo.utils#
- class pyearthtools.zoo.utils.Colour#
Colour helper
- class pyearthtools.zoo.utils.CategorisedObjects(name, categories=None, *, _parse=None, **objects)#
Generic class to allow access into a categorised objects.
Categories are formed from nested kwargs and dictionaries, and can be set later with
__setitem__. Key’s must be hashable, just like a dictionary.Examples
>>> record = CategorisedObjects('Example', category_1 = {'sub_cat': 10}) >>> record.category_1 >>> ─┬ category_1 ── >>> └──sub_cat## Parsing
Overriding
_parseallows custom classes to be parsed when retrieved. Overriding_nameallows custom classes names to be retrieved when displaying what is available.Construct a Category, can itself have sub categories.
If any
objectis a dictionary, create anotherCategorisedObjectsat that entry.- Parameters:
name (str) – Name of this category.
categories (dict[str, Any | CategorisedObjects] | None, optional) – Dictionary to configure categories to allow access to. If element is dictionary, will be configured as a sub category. Defaults to None
_parse (Callable, None, optional) – Init arg to override
_parsefunction, to allow parsing of object upon retrieval. Must be a callable expecting self and one argument.**objects (Any | CategorisedObjects | dict[str, Any | CategorisedObjects]) – Kwargs form of
categories, kwarg key is top level category.
- property available: tuple[str, ...]#
Get list of available objects
- items() a generator object providing a view on Category's items#
- keys() a generator object providing a view on Category's keys#
- update(_CategorisedObjects__dict=None, **kwargs)#
Update
CategorisedObjectsCan be given as full path seperated by ‘/’.
Value can be dictionary, which will be expanded.
- Parameters:
_CategorisedObjects__dict (dict[Any, Any] | None)
kwargs (Any)
- values() an generator object providing a view on Category's values#
- class pyearthtools.zoo.utils.AvailableModels#
Get all available models as defined by
entrypointsunderneathpyearthtools.zoo.register.Categorise with these entry points by seperating layers with
_.Examples
>>> # Entrypoints >>> # NESM_modelNAME >>> AvailableModels() >>> ─┬ Available Models ── >>> └─┬ NESM ── >>> └──modelNAME
Can retrieve model by getting attibute one layer at a time, or by
getattr(self, 'NESM/modelNAME'), or if last name is unique, that name alone.If
NESM/Modelexists within the AvailableModels, it can be retrieved in the following way,`python AvailableModels.NESM.modelNAME AvailableModels['NESM/modelNAME'] AvailableModels.modelNAME # Only works if `modelNAME` is unique. `Construct object containing all available models
- Raises:
ValueError – If a model will get overwritten by a duplicate key.
- refresh()#
Refresh available models
- class pyearthtools.zoo.utils.TabCompleter#
A tab completer that can either complete from the filesystem or from a list.
- create_list_completer(ll)#
This is a closure that creates a method that autocompletes from the given list.
Since the autocomplete function can’t be given a list to complete from a closure is used to create the listCompleter function with a list to complete from.
- Parameters:
ll (list | str)
- path_completer(text, state)#
This is the tab completer for systems paths. Only tested on Linux systems
- pyearthtools.zoo.utils.parse_str(item)#
Parse a str to a boolean if represents a bool
- Parameters:
item (str)
- Return type:
str | int | float | bool
- pyearthtools.zoo.utils.find_demlim(value, delim_options)#
Find which delimiter is being used out of
delim_optionsDefaults to ‘-’ if none found
- Parameters:
value (str)
delim_options (list[str])
- pyearthtools.zoo.utils.delta_conversion(value, unit='hour')#
Attempt to convert a given
valueto an integer of the givenunit.If cannot convert, will quietly return
value- Parameters:
value (Any) – Value to convert
unit (str, optional) – Unit to convert in to. Defaults to ‘hour’.
- Returns:
Time delta in unit
- Return type:
(int)
- pyearthtools.zoo.utils.create_mapping(list1, list2)#
Creates a dictionary mapping elements from list1 to list2, ignoring text in ().
Allows data to be associated with pipelines designed to be generic. If no element found in the second list, value will be None. – Generated by Bard
- Parameters:
list1 (list[str]) – A list of strings.
list2 (list[str]) – A list of strings.
- Returns:
A dictionary mapping elements from list1 to list2, ignoring text in ().
- Return type:
(dict[str, str | None])
Examples
Given two lists [‘era5’, ‘era5(test)’] and [‘era5’], the mapping would be
- pyearthtools.zoo.utils.get_annotation(val)#
Get annotation from a signature value
- Parameters:
val (Parameter)
- pyearthtools.zoo.utils.get_arguments(function)#
Get arguments of a function
- Parameters:
function (Callable) – Function to get arguments of
- Returns:
[Required arguments, Type hints], [Defaulted arguements, defaults or type hints]
- Return type:
(tuple[dict[str,Any], dict[str, Any]])
- pyearthtools.zoo.utils.split_name_assignment(config)#
Split
configinto name and assignment components.Assignment is given enclosed in {}, and multiple assignments can be split by ‘,’.
If no assignment, return it as None
- Parameters:
config (str) – Pipeline config to parse
- Raises:
ValueError – If too many elements discovered
- Returns:
config name, dictionary of assignments if any or None
- Return type:
(tuple[str, dict[str, str | int] | None])
zoo.warnings#
- class pyearthtools.zoo.AccessorRegistrationWarning#
Warning for conflicts in accessor registration.