Data API Docs#
data.archive#
- class pyearthtools.data.archive.ZarrIndex(store, variables=None, *, template=False, transforms=None, open_kwargs=None, save_kwargs=None, remote=False, **kwargs)#
Zarr Index
Can be used to access local/remote zarr archives, with the ability to write into them.
Examples:
>>> zarr_archive = Zarr(PATH_TO_ZARR_ARCHIVE) >>> zarr_archive()
For time aware indexing, use
ZarrTime.Additonally, this class can be used to create an ‘empty’ archive, with all metadata prepopulated.
This is useful to premake an archive, and then use many distributed processes to write subsets into it.
Template Example:
>>> zarr_archive = Zarr(PATH_TO_ZARR_ARCHIVE, template = True) >>> zarr_archive.make_template(SINGLE_SAMPLE, time = EXPANDED_TIME) >>> >>> for subsample in TOTALSAMPLES: # Can be done distributedly >>> zarr_archive.save(subsample)
Zarr Archive
Can use
saas mode for saving, which means ‘safe append’. Will look at existing archive, and only append onappend_dimdata that is missing.If
templateis True,existswill always be False.- Parameters:
store (PathLike) – Store or path to directory in local or remote file system.
variables (str | list[str] | None, optional) – Variables within the dataset to subset to. Defaults to None.
template (bool, optional) – Whether this archive is a template, will cause
existsto always return False. Allows a cacher to write to this archive, despite it appearing to exist on disk. Defaults to False.transforms (Transform | TransformCollection | None, optional) – Base Transforms to be applied to data. Transforms are applied on the retrieval of data, i.e.
index[]but not when directly getting the data,index.get(). Defaults to TransformCollection().open_kwargs (dict[str, Any] | None, optional) – Kwargs to use when opening the zarr archive. See https://docs.xarray.dev/en/stable/generated/xarray.open_zarr.html Defaults to None.
save_kwargs (dict[str, Any] | None, optional) – Kwargs to use when saving the zarr archive. See https://docs.xarray.dev/en/latest/generated/xarray.Dataset.to_zarr.html Defaults to None.
remote (bool) – If this flag is set, then the store variable is an fsspec style URL to a remote Zarr store, so will not be treated like a local path.
- exists(search_dict=None, **kwargs)#
Check if zarr archive exists
If
template == True, always return False,- Parameters:
search_dict (dict[str, Any] | None, optional) – Key / val to check for in data. Defaults to None.
kwargs – Kwargs form of
search_dict.
- Returns:
If zarr archive / data in archive exists.
- Return type:
(bool)
- get()#
Get zarr archive
Used within the indexes data access flows
Subset on
variablesif given, but applies no other subsetting.
- make_template(dataset, *, chunk=None, encoding=None, overwrite=True, append_dimension=None, expand_coords=None, **kwargs)#
Make a template dataset out of one sample of data,
dataset.A sample should contain all of the variables this full dataset should have. It must also contain all values along the coordinates not included in
expand_coordsthat can be expected, i.e. all latitude values.A sample does not need to include all values as specified in
expand_coords, it will be reindexed to include them by this function.The full dataset is defined as the sample expanded by
expand_coords.- Parameters:
dataset (xr.Dataset) – Single sample of full dataset. All metadata will be taken from this sample.
chunk (Literal['auto'] | None | dict[str, Literal['auto'] | int ], optional) – Override for chunks of zarr archive. Any key in
expand_coordswill be chunked ‘auto’. Defaults to None.overwrite (bool, optional) – Whether to override an existing zarr archive. Defaults to True.
append_dimension (str | None, optional) – Dimension to append on, if to append. Defaults to None.
expand_coords (dict[str, list[Any]] | None) – Coordinates to reindex. Allows for a single sample to be passed, but full archive created of larger data. Defaults to None.
kwargs – Kwargs form of
expand_coordsencoding (dict[str, dict[str, Any]] | None)
- Raises:
FileExistsError – If file exists and
override== False.
Examples
>>> era5 = pyearthtools.data.archive.ERA5.sample() >>> >>> full_time_values = list(map(lambda x: x.datetime64(), pyearthtools.data.TimeRange('1980', '2020', '6 hour'))) >>> >>> zarr_archive = Zarr(PATH_TO_ZARR, template = True) >>> zarr_archive.make_template(era5['2000-01-01T00'], time = full_time_values) ... # Will create a zarr archive like `era5` but across all of `full_time_values`
- save(data, save_kwargs=None, **kwargs)#
Save
datainto the zarr archiveSee https://docs.xarray.dev/en/latest/generated/xarray.Dataset.to_zarr.html
Can use
saas mode for saving, which means ‘safe append’. Will look at existing archive, and only append onappend_dimdata that is missing.- Parameters:
data (Dataset) – Dataset to save
save_kwargs (dict[str, Any] | None) – Extra kwargs to pass to
.to_zarr, in addition toinit.save_kwargs. Defaults to None.**kwargs – Kwargs form of
save_kwargs
- search()#
Get path of zarr archive
- Return type:
str | Path
- class pyearthtools.data.archive.ZarrTimeIndex(store, variables=None, *, template=False, transforms=None, open_kwargs=None, save_kwargs=None, remote=False, **kwargs)#
Time index aware zarr archive
Allows for
[]with a time value, and subsetting accordingly.Zarr Archive
Can use
saas mode for saving, which means ‘safe append’. Will look at existing archive, and only append onappend_dimdata that is missing.If
templateis True,existswill always be False.- Parameters:
store (PathLike) – Store or path to directory in local or remote file system.
variables (str | list[str] | None, optional) – Variables within the dataset to subset to. Defaults to None.
template (bool, optional) – Whether this archive is a template, will cause
existsto always return False. Allows a cacher to write to this archive, despite it appearing to exist on disk. Defaults to False.transforms (Transform | TransformCollection | None, optional) – Base Transforms to be applied to data. Transforms are applied on the retrieval of data, i.e.
index[]but not when directly getting the data,index.get(). Defaults to TransformCollection().open_kwargs (dict[str, Any] | None, optional) – Kwargs to use when opening the zarr archive. See https://docs.xarray.dev/en/stable/generated/xarray.open_zarr.html Defaults to None.
save_kwargs (dict[str, Any] | None, optional) – Kwargs to use when saving the zarr archive. See https://docs.xarray.dev/en/latest/generated/xarray.Dataset.to_zarr.html Defaults to None.
remote (bool) – If this flag is set, then the store variable is an fsspec style URL to a remote Zarr store, so will not be treated like a local path.
- exists(querytime=None, **kwargs)#
Check for existence,
If
querytimegiven check it is in the zarr archive.- Parameters:
querytime (str | None)
- retrieve(querytime=None, *args, transforms=None, **kwargs)#
If supplied, retrieve the data subset for the specified time
- Parameters:
querytime (str | Petdt | None)
transforms (Transform | TransformCollection | None)
- Return type:
Any
- pyearthtools.data.archive.extensions.register_archive(name, *, sample_kwargs=None)#
Register a custom archive underneath
pyearthtools.data.archive.- Parameters:
name (str) – Name under which the archive should be registered. A warning is issued if this name conflicts with a preexisting archive.
sample_kwargs (dict[str, Any] | None, optional) – Keyword arguments to initialise a sample index for demonstration. Can be retrieved with
.sample
- Return type:
Callable
- pyearthtools.data.archive.reset_root()#
Reset all root directories
- pyearthtools.data.archive.set_root(root_dir=None, **kwargs)#
Change root directory for data sources.
Can set value of dictionary to None which will result in the root directory being reset to the default value.
- Parameters:
root_dir (dict[str, str | None] | None, optional) – Dictionary with root directory replacements. Defaults to None.
**kwargs (dict[str,str | None]) – Kwargs version of root_dir
- pyearthtools.data.archive.config_root()#
Setup Root Directories
data.derived#
- class pyearthtools.data.derived.DerivedValue(transforms=TransformCollection Initialisation A Collection of Transforms to be applied to Data apply_default False intelligence_level 100 Transforms, *args, add_default_transforms=True, preprocess_transforms=None, **kwargs)#
Base class for Derived data
Subclassed from
DataIndexso transforms can be used.Child must implement
derive.Introduce [transforms][pyearthtools.data.transforms] to data loading
- Parameters:
transforms (Transform | TransformCollection, optional) – Base Transforms to be applied to data. Defaults to TransformCollection().
add_default_transforms (bool, optional) – Add Default Transformations. Defaults to True
preprocess_transforms (Transform | TransformCollection | Callable | None) – Transforms to apply in preprocessing for datasets. Does not work on other file formats. Defaults to None.
- abstractmethod derive(*args, **kwargs)#
Get derived value.
Will only be passed most specific key, so if a function of time, expect a time.
Child class must implement
- Return type:
Dataset
- get(*args, **kwargs)#
Override for get to use
derive.
- classmethod like(dataset, **kwargs)#
Setup DerivedValue taking coords from
datasetif key in__init__.If
clstakeslatitudeandlongitude, and those coords indataset, will takevalues, and pass to__init__Examples:
`python era = pyearthtools.data.archive.ERA5.sample() derived = DerivedValue.like(era['2000-01-01T00']) `- Parameters:
dataset (Dataset | DataArray)
- class pyearthtools.data.derived.TimeDerivedValue(data_interval=None, **kwargs)#
Temporally derived value Index
Derived value which is a factor of time.
Hooks into
TimeDataIndexto allow for series retrieval- Parameters:
data_interval (tuple[int, str] | int | str | TimeDelta | None, optional) – Default interval of data. Defaults to None.
- class pyearthtools.data.derived.AdvancedTimeDerivedValue(data_interval=None, split_time=False, **kwargs)#
Advanced Temporally Derived Index
Allows for time-resolution-based retrieval.
Example:
>>> index = AdvancedTimeDerivedValue('6 hours') >>> index['2000-01-01'] # Will get four steps 00,06,12,18
- Parameters:
data_interval (tuple[int, str] | int | str | TimeDelta | None) – Interval of derivation, if given allows for [] to get multiple samples based on resolution.
split_time (bool) – Whether to split a series call into each individual time, or pass list of times.
Derived value which is a factor of time.
Hooks into
TimeDataIndexto allow for series retrieval- Parameters:
data_interval (tuple[int, str] | int | str | TimeDelta | None, optional) – Default interval of data. Defaults to None.
split_time (bool)
- series(start, end, interval=None, **_)#
Index into Provided Data function to create a continuous series of Data
- Parameters:
DataFunction (AdvancedTimeIndex) – Data function, must be AdvancedTimeIndex or child
start (str | datetime.datetime | Petdt) – Timestep to begin series at
end (str | datetime.datetime | Petdt) – Timestep to end series at
interval (tuple[float, str]) – Time interval between samples. Use pandas.to_timedelta notation, (10, ‘minute’)
inclusive (bool, optional) – Whether end time is included in retrieval. Defaults to False.
skip_invalid (bool, optional) – Whether to skip invalid data. Defaults to False.
transforms (Transform | TransformCollection, optional) – Extra [Transform’s][pyearthtools.data.transforms.Transform] to be applied to data. Defaults to TransformCollection().
verbose (bool, optional) – Print logging messages. Defaults to False.
force_get (bool, optional) – Use series method which loads each dataset using
.get. WARNING: Takes significantly longer, as it does not use dask. Defaults to False.subset_time (bool, optional) – Whether to force subset time dim. Defaults to True.
tolerance (tuple | pd.Timedelta, optional) – Tolerance for time subsetting. Defaults to None.
- Returns:
Loaded xarray dataset
- Return type:
(xr.Dataset)
- class pyearthtools.data.derived.Insolation(latitude, longitude, interval=None, *, S=1.0, daily=False, clip_zero=True)#
Calculate the approximate solar insolation for given dates.
Use
liketo mimic a dataset, it must havelatitudeandlongitudein the coords.Calculate the approximate solar insolation for given dates.
For an example reference, see: https://brian-rose.github.io/ClimateLaboratoryBook/courseware/insolation.html
- Parameters:
latitude (np.ndarray | list) – 1d or 2d array of latitudes
longitude (np.ndarray | list) – 1d or 2d array of longitudes (0-360deg). If 2d, must match the shape of latitude.
interval (tuple[int, str] | int | str | None, optional) – TimeDelta of data. E.g.
6 hour. Used for series retrieval. Can be None to not default have interval awareness. Defaults to None.S (float, optional) – scaling factor (solar constant). Defaults to 1.0.
daily (bool, optional) – if True, return the daily max solar radiation (lat and day of year dependent only). Defaults to False.
clip_zero (bool, optional) – if True, set values below 0 to 0. Defaults to True.
- Raises:
ValueError – If
latitudeorlongitudeare invalid.
- derive(time)#
Get derived value.
Will only be passed most specific key, so if a function of time, expect a time.
Child class must implement
- Parameters:
time (Timestamp)
- Return type:
Dataset
data.download#
- class pyearthtools.data.download.arcoera5.ARCOERA5(variables=None, level=None, transforms=None, **kwargs)#
Analysis-Ready, Cloud Optimized ERA5
Carver, Robert W, and Merose, Alex. (2023): ARCO-ERA5: An Analysis-Ready Cloud-Optimized Reanalysis Dataset. 22nd Conf. on AI for Env. Science, Denver, CO, Amer. Meteo. Soc, 4A.1, https://ams.confex.com/ams/103ANNUAL/meetingapp.cgi/Paper/415842
Analysis-Ready, Cloud Optimized ERA5 integrated within
pyearthtools.Allows for access to a cloud ERA5 archive.
- Parameters:
variables (str | list[str] | None, optional) – Variables to retrieve, can be either short_name or long_name. Default to None, to retrieve all variables.
level (int | list[int] | None, optional) – Pressure levels to select. Defaults to None, to select all levels.
transforms (Transform | TransformCollection | None, optional) – Transforms to apply to dataset. Defaults to None.
- property dataset: Dataset#
Get full dataset for this obj
- get(time)#
Get timestep from dataset
- Parameters:
time (str)
- classmethod sample()#
Example subset of the dataset
- pyearthtools.data.download.arcoera5.LEVELS = [1, 2, 3, 5, 7, 10, 20, 30, 50, 70, 100, 125, 150, 175, 200, 225, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 775, 800, 825, 850, 875, 900, 925, 950, 975, 1000]#
valid ARCO-ERA5 level values
- pyearthtools.data.download.arcoera5.LONG_NAMES#
mapping from long variable names to short variable names
- pyearthtools.data.download.arcoera5.SHORT_NAMES#
mapping from short variable names to long variable names
- class pyearthtools.data.download.weatherbench.WeatherBench2(dataset_url, license_url, *, variables=None, level=None, transforms=None, chunks='auto', download_dir=None, license_ok=False, **kwargs)#
WeatherBench2 cloud-optimized ground truth and baseline datasets
Stephan Rasp, Stephan Hoyer, Alexander Merose, Ian Langmore, Peter Battaglia, Tyler Russel, Alvaro Sanchez-Gonzalez, Vivian Yang, Rob Carver, Shreya Agrawal, Matthew Chantry, Zied Ben Bouallegue, Peter Dueben, Carla Bromberg, Jared Sisk, Luke Barrington, Aaron Bell and Fei Sha (2024): WeatherBench 2: A benchmark for the next generation of data-driven global weather models Journal of Advances in Modeling Earth Systems, 16, e2023MS004019 https://doi.org/10.1029/2023MS004019
If a
download_dirfolder is provided, the selected subset (i.e. variables and levels) of the dataset will be first downloaded into the folder, in a subfolder named with the hash of the url. In this subfolder, each variable and level is saved as a separate compressed zarr dataset. Once downloaded, any subsequent access will use the local version.Later, if you select a different set of variables and levels, make sure to use the same folder, as only the missing variables and levels will then be downloaded.
- Parameters:
dataset_url (str) – URL of the zarr dataset
license_url (str) – License of the dataset
variables (str | list[str] | None, optional) – Variables to retrieve, can be either short_name or long_name. Default to None, to retrieve all variables.
level (int | list[int] | None, optional) – Pressure levels to select. Defaults to None, to select all levels.
transforms (Transform | TransformCollection | None, optional) – Transforms to apply to dataset. Defaults to None.
chunks (int | dict | Literal["auto"], optional) – Chunking used to load data into Dask arrays. Defaults to “auto”.
download_dir (str | Path, optional) – Folder where to save a copy of the dataset. Defaults to None.
license_ok (bool, optional) – License has been read. Defaults to False.
- property dataset: Dataset#
Get full dataset for this obj
- get(time)#
Get timestep from dataset
- Parameters:
time (str)
- license()#
Get the license for this dataset
- Return type:
str
- class pyearthtools.data.download.weatherbench.WB2ERA5(*, resolution='64x32', **kwargs)#
WeatherBench2 cloud-optimized ground truth ERA5 dataset
ERA5 datasets downloaded from the Copernicus Climate Data Store with a time range from 1959 to 2023 (incl.). The data have been downsampled to 6h and 13 levels, except for the “raw” dataset. The raw dataset is hourly with a 0.25 degree spatial resolution and 37 levels.
https://weatherbench2.readthedocs.io/en/latest/data-guide.html#era5
Stephan Rasp, Stephan Hoyer, Alexander Merose, Ian Langmore, Peter Battaglia, Tyler Russel, Alvaro Sanchez-Gonzalez, Vivian Yang, Rob Carver, Shreya Agrawal, Matthew Chantry, Zied Ben Bouallegue, Peter Dueben, Carla Bromberg, Jared Sisk, Luke Barrington, Aaron Bell and Fei Sha (2024): WeatherBench 2: A benchmark for the next generation of data-driven global weather models Journal of Advances in Modeling Earth Systems, 16, e2023MS004019 https://doi.org/10.1029/2023MS004019
See
pyearthtools.data.download.weatherbench.WeatherBench2for additional parameters.- Parameters:
resolution (str, optional) – Dataset resolution, one of “raw”, “1440x721”, “240x121” and “64x32”. The “raw” dataset is not subsampled, i.e. is hourly with 36 levels. Defaults to “64x32”.
- classmethod sample()#
Example subset of the dataset
- class pyearthtools.data.download.weatherbench.WB2ERA5Clim(*, resolution='64x32', period='1990-2017', **kwargs)#
WeatherBench2 cloud-optimized ground truth ERA5 climatology dataset
For WeatherBench 2, the climatology was computed using a running window for smoothing (see paper and script) for each day of year and sixth hour of day. Climatologies have been computed for 1990-2017 and 1990-2019.
https://weatherbench2.readthedocs.io/en/latest/data-guide.html#era5-climatology
Stephan Rasp, Stephan Hoyer, Alexander Merose, Ian Langmore, Peter Battaglia, Tyler Russel, Alvaro Sanchez-Gonzalez, Vivian Yang, Rob Carver, Shreya Agrawal, Matthew Chantry, Zied Ben Bouallegue, Peter Dueben, Carla Bromberg, Jared Sisk, Luke Barrington, Aaron Bell and Fei Sha (2024): WeatherBench 2: A benchmark for the next generation of data-driven global weather models Journal of Advances in Modeling Earth Systems, 16, e2023MS004019 https://doi.org/10.1029/2023MS004019
See
pyearthtools.data.download.weatherbench.WeatherBench2for additional parameters.- Parameters:
resolution (str, optional) – Dataset resolution, one of “1440x721”, “512x256”, “240x121” and “64x32”. Defaults to “64x32”.
period (str, optional) – Covered time period, either “1990-2017” or “1990-2019”. Defaults to “1990-2017”.
- classmethod sample()#
Example subset of the dataset
data.indexes#
- class pyearthtools.data.indexes.Index(*args, **kwargs)#
Base Level Index to define the structure
To use, subclass and define the
.getfunction, any calls, shall be passed through.- abstractmethod get(*args, **kwargs)#
Base Level
.getcall, used to retrieve data from args
- class pyearthtools.data.indexes.DataIndex(transforms=TransformCollection Initialisation A Collection of Transforms to be applied to Data apply_default False intelligence_level 100 Transforms, *args, add_default_transforms=True, preprocess_transforms=None, **kwargs)#
Index to introduce [transforms][pyearthtools.data.transforms] to data loading
Transforms are applied on a
retrieveor__call__, but not ongetIntroduce [transforms][pyearthtools.data.transforms] to data loading
- Parameters:
transforms (Transform | TransformCollection, optional) – Base Transforms to be applied to data. Defaults to TransformCollection().
add_default_transforms (bool, optional) – Add Default Transformations. Defaults to True
preprocess_transforms (Transform | TransformCollection | Callable | None) – Transforms to apply in preprocessing for datasets. Does not work on other file formats. Defaults to None.
- retrieve(*args, transforms=TransformCollection Initialisation A Collection of Transforms to be applied to Data apply_default False intelligence_level 100 Transforms, **kwargs)#
Retrieve data for the given time step, applying the suppled transforms
The untransformed data is obtained using
get, which must be implemented by the user- Parameters:
transforms (Transform | TransformCollection, optional) – Extra transforms to apply. Defaults to TransformCollection().
- Returns:
Loaded data with transforms applied
- Return type:
(Any)
- class pyearthtools.data.indexes.FileSystemIndex(*args, **kwargs)#
Index addon to load data from a File System
Provides basic loading functions and allows for an index to be ‘searched’.
- exists(*args, **kwargs)#
First, use
search(*args, **kwargs)to find all matching files, Paths or identifiers Then usecheck_existenceto confirm the found data object- Returns:
If data exists
- Return type:
(bool)
- filesystem(*args)#
Find datafiles given args on local filesystem.
Must be implemented by child class to specify data.
Can return a dictionary[str, str], tuple, list or path representing the files to load.
- Return type:
Path | dict[str, str]
- get(*args, **kwargs)#
Get data by loading it from the search
Passes all args to
search()and all kwargs toload()- Raises:
DataNotFoundError – Data could not be found
- Returns:
Loaded Data
- Return type:
(Any)
- load(files, **kwargs)#
Load a given list of files.
Automatically determine method to load files for file extension
- Supported:
netcdf
pandas [csv]
numpy
- Parameters:
files (dict[str, str | Path] | Path | list[str | Path] | tuple[str | Path]) – Files to load
**kwargs (Any, optional) – Kwargs passed to underlying loading function
- Raises:
InvalidDataError – If an error arose when loading file
- Returns:
Loaded data
- Return type:
(Any)
- search(*args, **kwargs)#
Find file name/path, with the underlying functionality defined by discovered location.
All arguments passed to underlying function.
- Parameters:
*args (Any, optional) – Arguments passed to underlying search function
*kwargs (Any, optional) – Keyword Arguments passed to underlying search function
- Returns:
Path to data defined by arguments
- Return type:
(Path | list[str | Path] | dict[str, str | Path])
- class pyearthtools.data.indexes.TimeIndex(data_interval=None, round=False, **kwargs)#
Introduce general time based Indexing with [Petdt][pyearthtools.data.time.Petdt].
Allow for multiple time retrievals.
Setup TimeIndex.
Will warn a user if date is of incorrect resolution
- Parameters:
data_interval (tuple[int, str] | str | int | TimeDelta | None) –
Interval of data. Must follow format for [TimeDelta][pyearthtools.data.time.TimeDelta].
E.g.
>>> (1, 'h') = 1 Hour >>> (10, 'D') = 10 Days. >>> 10 = 10 minutes.
round (bool) – Default value for round when retrieving data.
- aggregation(start, end, interval, *, aggregation='mean', aggregation_dim='time', save_location=None, skip_invalid=True, num_divisions=1, transforms=TransformCollection Initialisation A Collection of Transforms to be applied to Data apply_default False intelligence_level 100 Transforms, verbose=False, **kwargs)#
Get aggregation of [TimeIndex][pyearthtools.data.TimeIndex] over given dimension
- !!! Warning:
Any
num_divisionsnot a factor of the number of data steps will result in some data being missed.
- Parameters:
DataFunction (TimeIndex) – TimeIndex to retrieve Data
start (str | datetime.datetime | Petdt) – Start Date
end (str | datetime.datetime | Petdt) – End Date
interval (tuple[float, str]) – Interval between samples. Use pandas.to_timedelta notation, (10, ‘minute’)
aggregation (str, optional) – Aggregation Function to apply. Defaults to “mean”.
aggregation_dim (str, optional) – Dimension to aggregate over apply. Defaults to “time”.
save_location (str | Path | None, optional) – Location to automatically save the result. Defaults to None.
skip_invalid (bool, optional) – Whether to skip invalid data. Defaults to True.
num_divisions (int, optional) – Number of times to divide series to alleviate memory issues. Defaults to 1.
transforms (Transform | TransformCollection, optional) – Extra Transforms to be applied. Defaults to TransformCollection().
verbose (bool, optional) – Whether to log progress messages. Defaults to False.
- Returns:
Dataset with aggregation applied
- Return type:
xr.Dataset
- range(start, end, interval, *, skip_invalid=True, num_divisions=1, transforms=TransformCollection Initialisation A Collection of Transforms to be applied to Data apply_default False intelligence_level 100 Transforms, **kwargs)#
Find Minimum and Maximum of a [TimeIndex][pyearthtools.data.TimeIndex] in the given time range
- Parameters:
DataFunction (TimeIndex) – TimeIndex to retrieve Data
start (str | Petdt) – Start Date
end (str | Petdt) – End Date
interval (tuple[float, str]) – Interval between samples. Use pandas.to_timedelta notation, (10, ‘minute’)
skip_invalid (bool, optional) – Whether to skip invalid data. Defaults to True.
num_divisions (int, optional) – Number of times to divide series to alleviate memory issues. Defaults to 1.
transforms (Transform | TransformCollection, optional) – Extra Transforms to be applied. Defaults to TransformCollection().
- Returns:
Dictionary with max and min populated
- Return type:
dict
- safe_series(start, end, interval, **kwargs)#
Safely index into the provided Data function to create a continuous series of Data.
Called by the series method
Uses [series][pyearthtools.data.operations.index_routines.series], but provides an automatic interpolation.
- !!! Warning
If data is missing or if a resolution higher than the actual data resolution is provided, those missing time steps will be interpolated,
- Parameters:
DataFunction (AdvancedTimeIndex) – Data function, must be AdvancedTimeIndex or child
start (str | Petdt) – Timestep to begin series at
end (str | Petdt) – Timestep to end series at
interval (TimeDelta) – Time interval between samples. Use pandas.to_timedelta notation, (10, ‘minute’)
**kwargs (dict, optional) – Any extra keyword arguments to pass to [series][pyearthtools.data.operations.index_routines.series]
- Returns:
Loaded xarray dataset
- Return type:
(xr.Dataset)
- series(start, end, interval, *, inclusive=False, skip_invalid=False, transforms=TransformCollection Initialisation A Collection of Transforms to be applied to Data apply_default False intelligence_level 100 Transforms, verbose=False, force_get=False, subset_time=True, time_dim=None, tolerance=None, **kwargs)#
Index into Provided Data function to create a continuous series of Data
- Parameters:
DataFunction (AdvancedTimeIndex) – Data function, must be AdvancedTimeIndex or child
start (str | datetime.datetime | Petdt) – Timestep to begin series at
end (str | datetime.datetime | Petdt) – Timestep to end series at
interval (tuple[float, str]) – Time interval between samples. Use pandas.to_timedelta notation, (10, ‘minute’)
inclusive (bool, optional) – Whether end time is included in retrieval. Defaults to False.
skip_invalid (bool, optional) – Whether to skip invalid data. Defaults to False.
transforms (Transform | TransformCollection, optional) – Extra [Transform’s][pyearthtools.data.transforms.Transform] to be applied to data. Defaults to TransformCollection().
verbose (bool, optional) – Print logging messages. Defaults to False.
force_get (bool, optional) – Use series method which loads each dataset using
.get. WARNING: Takes significantly longer, as it does not use dask. Defaults to False.subset_time (bool, optional) – Whether to force subset time dim. Defaults to True.
tolerance (tuple | pd.Timedelta, optional) – Tolerance for time subsetting. Defaults to None.
time_dim (str | None)
- Returns:
Loaded xarray dataset
- Return type:
(xr.Dataset)
- class pyearthtools.data.indexes.SingleTimeIndex(data_interval=None, round=False, **kwargs)#
Introduce single time based Indexing with [Petdt][pyearthtools.data.time.Petdt].
While [Index][pyearthtools.data.indexes.indexes.Index] assumes nothing about the selection arguments, this will attempt to convert them to a [Petdt][pyearthtools.data.time.Petdt], and select that time from the data.
[Petdt][pyearthtools.data.time.Petdt] keeps a record of the resolution of the given date string, which allows for more informative warnings.
Setup TimeIndex.
Will warn a user if date is of incorrect resolution
- Parameters:
data_interval (tuple[int, str] | str | int | TimeDelta | None) –
Interval of data. Must follow format for [TimeDelta][pyearthtools.data.time.TimeDelta].
E.g.
>>> (1, 'h') = 1 Hour >>> (10, 'D') = 10 Days. >>> 10 = 10 minutes.
round (bool) – Default value for round when retrieving data.
- retrieve(querytime, *args, select=False, round=None, **kwargs)#
Retrieve Data at given timestep, uses [Index][pyearthtools.data.indexes.Index] to load data.
While [Index][pyearthtools.data.indexes.Index] assumes nothing, this will attempt to select time.
- Parameters:
querytime (str | datetime.datetime | Petdt) – Timestep to retrieve data at
select (bool, optional) – Select
querytimein dataset. Defaults to False.round (bool, optional) – Select nearest time, when selecting. Can be configured in
init. Defaults to False.
- Returns:
Loaded data, with time selected
- Return type:
(Any)
- class pyearthtools.data.indexes.TimeDataIndex(transforms=TransformCollection Initialisation A Collection of Transforms to be applied to Data apply_default False intelligence_level 100 Transforms, preprocess_transforms=None, data_interval=None, **kwargs)#
Setup TimeDataIndex
For indexing with time and applying transforms
- Parameters:
transforms (Transform | TransformCollection, optional) – Transforms to add when retrieving data. Defaults to TransformCollection().
data_interval (tuple[int, str] | int, optional) – Temporal Interval of Data. Defaults to None.
preprocess_transforms (Transform | TransformCollection, optional) – Transforms to apply in preprocessing for datasets. Does not work on other file formats. Defaults to None.
- aggregation(*args, **kwargs)#
Get aggregation of [TimeIndex][pyearthtools.data.TimeIndex] over given dimension
- !!! Warning:
Any
num_divisionsnot a factor of the number of data steps will result in some data being missed.
- Parameters:
DataFunction (TimeIndex) – TimeIndex to retrieve Data
start (str | datetime.datetime | Petdt) – Start Date
end (str | datetime.datetime | Petdt) – End Date
interval (tuple[float, str]) – Interval between samples. Use pandas.to_timedelta notation, (10, ‘minute’)
aggregation (str, optional) – Aggregation Function to apply. Defaults to “mean”.
aggregation_dim (str, optional) – Dimension to aggregate over apply. Defaults to “time”.
save_location (str | Path | None, optional) – Location to automatically save the result. Defaults to None.
skip_invalid (bool, optional) – Whether to skip invalid data. Defaults to True.
num_divisions (int, optional) – Number of times to divide series to alleviate memory issues. Defaults to 1.
transforms (Transform | TransformCollection, optional) – Extra Transforms to be applied. Defaults to TransformCollection().
verbose (bool, optional) – Whether to log progress messages. Defaults to False.
- Returns:
Dataset with aggregation applied
- Return type:
xr.Dataset
- range(*args, **kwargs)#
Find Minimum and Maximum of a [TimeIndex][pyearthtools.data.TimeIndex] in the given time range
- Parameters:
DataFunction (TimeIndex) – TimeIndex to retrieve Data
start (str | Petdt) – Start Date
end (str | Petdt) – End Date
interval (tuple[float, str]) – Interval between samples. Use pandas.to_timedelta notation, (10, ‘minute’)
skip_invalid (bool, optional) – Whether to skip invalid data. Defaults to True.
num_divisions (int, optional) – Number of times to divide series to alleviate memory issues. Defaults to 1.
transforms (Transform | TransformCollection, optional) – Extra Transforms to be applied. Defaults to TransformCollection().
- Returns:
Dictionary with max and min populated
- Return type:
dict
- safe_series(*args, **kwargs)#
Safely index into the provided Data function to create a continuous series of Data.
Called by the series method
Uses [series][pyearthtools.data.operations.index_routines.series], but provides an automatic interpolation.
- !!! Warning
If data is missing or if a resolution higher than the actual data resolution is provided, those missing time steps will be interpolated,
- Parameters:
DataFunction (AdvancedTimeIndex) – Data function, must be AdvancedTimeIndex or child
start (str | Petdt) – Timestep to begin series at
end (str | Petdt) – Timestep to end series at
interval (TimeDelta) – Time interval between samples. Use pandas.to_timedelta notation, (10, ‘minute’)
**kwargs (dict, optional) – Any extra keyword arguments to pass to [series][pyearthtools.data.operations.index_routines.series]
- Returns:
Loaded xarray dataset
- Return type:
(xr.Dataset)
- series(*args, **kwargs)#
Index into Provided Data function to create a continuous series of Data
- Parameters:
DataFunction (AdvancedTimeIndex) – Data function, must be AdvancedTimeIndex or child
start (str | datetime.datetime | Petdt) – Timestep to begin series at
end (str | datetime.datetime | Petdt) – Timestep to end series at
interval (tuple[float, str]) – Time interval between samples. Use pandas.to_timedelta notation, (10, ‘minute’)
inclusive (bool, optional) – Whether end time is included in retrieval. Defaults to False.
skip_invalid (bool, optional) – Whether to skip invalid data. Defaults to False.
transforms (Transform | TransformCollection, optional) – Extra [Transform’s][pyearthtools.data.transforms.Transform] to be applied to data. Defaults to TransformCollection().
verbose (bool, optional) – Print logging messages. Defaults to False.
force_get (bool, optional) – Use series method which loads each dataset using
.get. WARNING: Takes significantly longer, as it does not use dask. Defaults to False.subset_time (bool, optional) – Whether to force subset time dim. Defaults to True.
tolerance (tuple | pd.Timedelta, optional) – Tolerance for time subsetting. Defaults to None.
- Returns:
Loaded xarray dataset
- Return type:
(xr.Dataset)
- class pyearthtools.data.indexes.AdvancedTimeIndex(data_interval=None, round=False, **kwargs)#
Extend Time based indexing for Advanced uses, using the provided
data_intervalOverrides
retrieve, to allow a series of data to be retrieved based upon given date resolution.Tip
“New retrieve Behaviour”
>>> Consider a dataset with 10 minute resolution >>> >>> | Date | Behaviour | >>> | -----------------|-----------------------| >>> |`2021-01-01T00:00`|Exact Data | >>> |`2021-01-01T00` |All Data in that hour | >>> |`2021-01-01` |All Data in that day | >>> |`2021-01` |All Data in that month | >>> |`2021` |All Data in that year |
Important
Many features of this class require the
data_intervalto be specifiedSetup TimeIndex.
Will warn a user if date is of incorrect resolution
- Parameters:
data_interval (tuple[int, str] | str | int | TimeDelta | None) –
Interval of data. Must follow format for [TimeDelta][pyearthtools.data.time.TimeDelta].
E.g.
>>> (1, 'h') = 1 Hour >>> (10, 'D') = 10 Days. >>> 10 = 10 minutes.
round (bool) – Default value for round when retrieving data.
- retrieve(querytime, *, aggregation=None, select=True, use_simple=False, **kwargs)#
Retrieve data at timestep, but will use the resolution of the time to infer large scale retrievals.
Tip
“Date Behaviour”
>>> | Date | Behaviour | >>> | ------------------ | -----------------------| >>> | '2021-01-01T00:00' | Exact Data | >>> | '2021-01-01' | All Data in that day | >>> | '2021-01' | All Data in that month | >>> | '2021' | All Data in that year |
- Parameters:
querytime (str | datetime | Petdt) – Timestep to retrieve data at, can be exact data or range as described above.
aggregation (str | None) – If data becomes a range, can specify an aggregation method.
select (bool) – Whether to attempt to select the given timestep if date is either fully qualified or data_interval not given.
use_simple (bool) – Whether to simply use the
DataIndex.retrieveinstead.kwargs – Kwargs passed to downstream retrieval function
- Returns:
Loaded Dataset with transforms applied, and aggregated if
aggregation_methodgiven.- Raises:
DataNotFoundError – If Data not found at timestep.
- Return type:
Dataset
Note
Extra transforms can be supplied, using `transforms = `
- class pyearthtools.data.indexes.AdvancedTimeDataIndex(transforms=TransformCollection Initialisation A Collection of Transforms to be applied to Data apply_default False intelligence_level 100 Transforms, preprocess_transforms=None, data_interval=None, **kwargs)#
Combine
AdvancedTimeIndexandDataIndex,Allows advanced temporal indexing with transforms applied.
Setup TimeDataIndex
For indexing with time and applying transforms
- Parameters:
transforms (Transform | TransformCollection, optional) – Transforms to add when retrieving data. Defaults to TransformCollection().
data_interval (tuple[int, str] | int, optional) – Temporal Interval of Data. Defaults to None.
preprocess_transforms (Transform | TransformCollection, optional) – Transforms to apply in preprocessing for datasets. Does not work on other file formats. Defaults to None.
- class pyearthtools.data.indexes.BaseTimeIndex(data_interval=None, round=False, **kwargs)#
Indexer to combine transforms, file system searching and basic Time
Combines
TimeIndex,DataIndexandFileSystemIndex, to allow transforms and searching on filesystems based on times.Setup TimeIndex.
Will warn a user if date is of incorrect resolution
- Parameters:
data_interval (tuple[int, str] | str | int | TimeDelta | None) –
Interval of data. Must follow format for [TimeDelta][pyearthtools.data.time.TimeDelta].
E.g.
>>> (1, 'h') = 1 Hour >>> (10, 'D') = 10 Days. >>> 10 = 10 minutes.
round (bool) – Default value for round when retrieving data.
- class pyearthtools.data.indexes.DataFileSystemIndex(transforms=TransformCollection Initialisation A Collection of Transforms to be applied to Data apply_default False intelligence_level 100 Transforms, *args, add_default_transforms=True, preprocess_transforms=None, **kwargs)#
Indexer to combine transforms and file system searching
Combines
DataIndexandFileSystemIndex, to allow transforms and searching on filesystems.Introduce [transforms][pyearthtools.data.transforms] to data loading
- Parameters:
transforms (Transform | TransformCollection, optional) – Base Transforms to be applied to data. Defaults to TransformCollection().
add_default_transforms (bool, optional) – Add Default Transformations. Defaults to True
preprocess_transforms (Transform | TransformCollection | Callable | None) – Transforms to apply in preprocessing for datasets. Does not work on other file formats. Defaults to None.
- class pyearthtools.data.indexes.ArchiveIndex(transforms=TransformCollection Initialisation A Collection of Transforms to be applied to Data apply_default False intelligence_level 100 Transforms, preprocess_transforms=None, data_interval=None, **kwargs)#
Default Archive Indexer, for use by on disk datasets.
Combines
DataIndex,FileSystemIndexandAdvancedTimeIndex, to allow transforms, searching, and advanced temporal indexing.- !!! Help “Initialisation Arguments”
transform
Setup TimeDataIndex
For indexing with time and applying transforms
- Parameters:
transforms (Transform | TransformCollection, optional) – Transforms to add when retrieving data. Defaults to TransformCollection().
data_interval (tuple[int, str] | int, optional) – Temporal Interval of Data. Defaults to None.
preprocess_transforms (Transform | TransformCollection, optional) – Transforms to apply in preprocessing for datasets. Does not work on other file formats. Defaults to None.
- search(*args, **kwargs)#
Find file name/path, with the underlying functionality defined by discovered location.
All arguments passed to underlying function.
- Parameters:
*args (Any, optional) – Arguments passed to underlying search function
*kwargs (Any, optional) – Keyword Arguments passed to underlying search function
- Returns:
Path to data defined by arguments
- Return type:
(Path | list[str | Path] | dict[str, str | Path])
- class pyearthtools.data.indexes.ForecastIndex(data_interval=None, round=False, **kwargs)#
Index into Forecast data, where Temporal indexing and selection is invalid.
Combines
DataIndex,FileSystemIndexandTimeIndex.Setup TimeIndex.
Will warn a user if date is of incorrect resolution
- Parameters:
data_interval (tuple[int, str] | str | int | TimeDelta | None) –
Interval of data. Must follow format for [TimeDelta][pyearthtools.data.time.TimeDelta].
E.g.
>>> (1, 'h') = 1 Hour >>> (10, 'D') = 10 Days. >>> 10 = 10 minutes.
round (bool) – Default value for round when retrieving data.
- aggregation(querytime, aggregation, *, preserve_dims=None, reduce_dims=None, transforms=TransformCollection Initialisation A Collection of Transforms to be applied to Data apply_default False intelligence_level 100 Transforms, **kwargs)#
API Function of [aggregation][pyearthtools.data.index_operations.index_operations.aggregation] for each ForecastIndex
- Parameters:
querytime (str | Petdt) – Time to get data at
aggregation (str | Callable) – Aggregation method to apply.
transforms (TransformCollection | Transform, optional) – Extra Transforms to apply. Defaults to TransformCollection().
preserve_dims (list | None)
reduce_dims (list | None)
- Returns:
Aggregation of data
- Return type:
(xr.Dataset)
- retrieve(basetime, *args, querytime=None, **kwargs)#
Retrieve data from a forecast product, allowing seperate specification of basetime and querytime
- search(*args, **kwargs)#
Find file name/path, with the underlying functionality defined by discovered location.
All arguments passed to underlying function.
- Parameters:
*args (Any, optional) – Arguments passed to underlying search function
*kwargs (Any, optional) – Keyword Arguments passed to underlying search function
- Returns:
Path to data defined by arguments
- Return type:
(Path | list[str | Path] | dict[str, str | Path])
- series(start, end, interval, *, lead_time=None, inclusive=False, skip_invalid=False, transforms=TransformCollection Initialisation A Collection of Transforms to be applied to Data apply_default False intelligence_level 100 Transforms, verbose=False)#
Index into Provided Data function to create a continuous series of Data
- Parameters:
DataFunction (AdvancedTimeIndex) – Data function, must be AdvancedTimeIndex or child
start (str | datetime.datetime | Petdt) – Timestep to begin series at
end (str | datetime.datetime | Petdt) – Timestep to end series at
interval (tuple[float, str]) – Time interval between samples. Use pandas.to_timedelta notation, (10, ‘minute’)
inclusive (bool, optional) – Whether end time is included in retrieval. Defaults to False.
skip_invalid (bool, optional) – Whether to skip invalid data. Defaults to False.
transforms (Transform | TransformCollection, optional) – Extra [Transform’s][pyearthtools.data.transforms.Transform] to be applied to data. Defaults to TransformCollection().
verbose (bool, optional) – Print logging messages. Defaults to False.
force_get (bool, optional) – Use series method which loads each dataset using
.get. WARNING: Takes significantly longer, as it does not use dask. Defaults to False.subset_time (bool, optional) – Whether to force subset time dim. Defaults to True.
tolerance (tuple | pd.Timedelta, optional) – Tolerance for time subsetting. Defaults to None.
lead_time (tuple[float, str] | TimeDelta)
- Returns:
Loaded xarray dataset
- Return type:
(xr.Dataset)
- class pyearthtools.data.indexes.StaticDataIndex(transforms=TransformCollection Initialisation A Collection of Transforms to be applied to Data apply_default False intelligence_level 100 Transforms, *args, add_default_transforms=True, preprocess_transforms=None, **kwargs)#
Index into Static Data
Introduce [transforms][pyearthtools.data.transforms] to data loading
- Parameters:
transforms (Transform | TransformCollection, optional) – Base Transforms to be applied to data. Defaults to TransformCollection().
add_default_transforms (bool, optional) – Add Default Transformations. Defaults to True
preprocess_transforms (Transform | TransformCollection | Callable | None) – Transforms to apply in preprocessing for datasets. Does not work on other file formats. Defaults to None.
- class pyearthtools.data.indexes.CachingIndex(cache, pattern=None, pattern_kwargs={}, *, transforms=TransformCollection Initialisation A Collection of Transforms to be applied to Data apply_default False intelligence_level 100 Transforms, cleanup=None, override=None, verbose=False, save_kwargs=None, **kwargs)#
Standard CachingIndex which behaves like a standard archive but with cached data
Base FileSystemCacheIndex Object to Cache data on the fly
If only
cacheis given, ExpandedDate, or TemporalExpandedDate will be used by default. Ifcacheandpatternnot given, will not save data, and the point of this class is lost.cachecan also be ‘temp’ to set to a TemporaryDirectory created on__init__, or include any environment variables, with $NOTATION.Warning
Existing Cache
If the
cacheis set to an existing cache location, and thepatternis the same being made and exists,pattern_kwargswill be set by default to the existing cache’s kwargs, and then updated by any given.- Parameters:
cache (str | Path | None) – Location to save data to.
pattern (str | type | PatternIndex | None) – String of pattern to use or defined pattern. Defaults to ExpandedDate, or TemporalExpandedDate.
pattern_kwargs (dict[str, Any] | str) – Kwargs to pass to initalisation of new pattern if pattern is str.
transforms (Transform | TransformCollection) – Base Transforms to apply.
cleanup (dict[str, Any] | float | int | str | None) –
Cache cleanup settings.
If a number type, assumed to represent age of file in days.
If dictionary type, the following keys can be used:
Key
Purpose
Type
delta
Time delta to delete files past
int, float, tuple, TimeDelta
dir_size
Maximum allowed directory size. Deletes oldest according to
keyint, float, str, ByteSize (if str, use ‘100 GB’ format)
key
Key to use to find time of file for other time based delete steps. Default ‘modified’.
Literal[‘modified’, ‘created’]
data_time
Maximum difference in time the data is of and current time
int, float, tuple, TimeDelta
verbose
Print files being deleted
bool
Cleanup is run on each initialisation and deletion of the
CacheIndex, and can be triggered manually with.cleanup()Defaults to None.
override (bool, optional) – Override cached data. Defaults to False.
save_kwargs (dict[str, Any], optional) – Kwargs to pass to save function. Defaults to None.
verbose (bool)
- Raises:
ValueError – If
cacheandpatternnot given.
- pyearthtools.data.indexes.CachingForecastIndex#
alias of
TimeCachingIndex
- class pyearthtools.data.indexes.IntakeIndex(catalog_file, transforms=TransformCollection Initialisation A Collection of Transforms to be applied to Data apply_default False intelligence_level 100 Transforms, *, add_default_transforms=True, filter_dict=None, **kwargs)#
Index designed to operate on Intake ESM Catalogs
Will not cache the data anywhere.
Example:
>>> import pyearthtools.data >>> import intake_esm >>> >>> cat_url = intake_esm.tutorial.get_url("google_cmip6") >>> >>> intakeIndex = pyearthtools.data.IntakeIndex(cat_url) >>> intakeIndex(experiment_id=["historical", "ssp585"],table_id="Oyr",variable_id="o2",grid_label="gn")
Intake ESM Catalog Index
- Parameters:
catalog_file (str | Path) – Intake ESM Catalog location
transforms (Transform | TransformCollection) – Transforms to add to data.
add_default_transforms (bool) – Add default transforms.
filter_dict (dict[str, Any] | None) – Filter dictionary for
Intakesearch.
- Raises:
ImportError – if
intakecannot be imported.
- property filter: dict[str, Any]#
Get filters applied to data retrieval
- Returns:
Intake ESM search kwargs
- Return type:
(dict)
- get(**kwargs)#
Get data directly from
intakeSee
._get_from_intakefor docs.
- pop_filter(pop=[], *args)#
Pop filter elements from intake searching
- Parameters:
pop (list[str], optional) – Items to pop from filter
*args (str, optional) – Args form of pop.
- Return type:
None
- search(filter={}, **kwargs)#
Override for Index search,
As this is primarily an Intake Index, search Intake Catalog
Uses
filterset throughinitandupdate_filter, as will as those given here.- Parameters:
filter (dict[str, Any], optional) – Intake search filter, updates filters given in
init. Defaults to {}.kwargs (Any) – Extra kwargs for
filter.
- Returns:
Intake catalog after search
- Return type:
(intake_esm.core.esm_datastore)
- search_intake(filter_dict={}, **kwargs)#
Search Intake Catalog
Uses
filterset throughinitandupdate_filter- Parameters:
filter_dict (dict, optional) – Updates to filters. Defaults to {}.
kwargs (Any)
- Returns:
Intake catalog after search
- Return type:
(intake_esm.core.esm_datastore)
- update_filter(filter_dict=None, **kwargs)#
Update filter for intake searching
- Parameters:
filter_dict (dict[str, Any], optional) – Filter update. Defaults to {}.
- Return type:
None
- class pyearthtools.data.indexes.IntakeIndexCache(catalog_file, cache=None, pattern_kwargs=None, *, transforms=TransformCollection Initialisation A Collection of Transforms to be applied to Data apply_default False intelligence_level 100 Transforms, filter_dict=None, **kwargs)#
Intake ESM Index which caches to a local location.
Uses
ArgumentExpansionin the same order as the catalog itself.Effectively builds a local copy of the intake catalog.
- !!! note “Multiple Keys”:
As the data is saved according to the given filters, list or tuples in the filters will be split during the
filesystemsearch, and handled one after the other. This will cause the underlying pattern to not be exactly usable as a cache, elements for it will have to be atomic.
Caching Intake ESM Index
- Parameters:
catalog_file (str | Path) – Intake ESM Catalog to load.
cache (str | Path | None, optional) – Cache Location. If set to None, does not cache. Defaults to None.
filter_dict (dict, optional) – Default filters for searching the Intake ESM Catalog. Defaults to {}.
**kwargs (Any, optional) – Additional filters.
pattern_kwargs (dict[str, Any] | None)
transforms (Transform | TransformCollection)
See
pyearthtools.data.indexes.BaseCacheIndexfor remaining arguments docs.- filesystem(**kwargs)#
Search for generated data if cache is given. If data does not exist yet, generate it, save it, and return the path to it
Data is generated here if cache is given so that
.seriesoperations, can work on filesystem, and thus any dask things work well.- Parameters:
args (Any) – Args to search for / generate data for
self (IntakeIndexCache)
kwargs (Any)
- Returns:
Filepath to discovered / generated data
- Return type:
(Path | list[str | Path] | dict[str, str | Path])
- Raises:
NotImplementedError – If
cacheis not set, cannot cache data.
- generate(**kwargs)#
Using child classes implemented
_generate, generate data, and save it using the pattern.Return the saved data as managed by the pattern.
Only args is passed to save pattern to find the path to save at.
- Returns:
Saved and reloaded data
- Return type:
(Any)
- Parameters:
self (IntakeIndexCache)
kwargs (Any)
- get(**kwargs)#
Retrieve Data given filter kwargs
If cache is given, automatically check to see if the file is generated, else, generate it and return the data
If cache is not given, just generate and return the data
- Parameters:
**kwargs (Any) – Kwargs to generate with
- Returns:
Loaded data
- Return type:
(xr.Dataset | dict[str, xr.Dataset])
- search(*args, **kwargs)#
Find file name/path, with the underlying functionality defined by discovered location.
All arguments passed to underlying function.
- Parameters:
*args (Any, optional) – Arguments passed to underlying search function
*kwargs (Any, optional) – Keyword Arguments passed to underlying search function
- Returns:
Path to data defined by arguments
- Return type:
(Path | list[str | Path] | dict[str, str | Path])
- class pyearthtools.data.indexes.cacheIndex.BaseCacheIndex(transforms=TransformCollection Initialisation A Collection of Transforms to be applied to Data apply_default False intelligence_level 100 Transforms, *args, add_default_transforms=True, preprocess_transforms=None, **kwargs)#
Base CacheIndex
Cannot be used directly, see
MemCacheorFileSystemCacheIndex.Introduce [transforms][pyearthtools.data.transforms] to data loading
- Parameters:
transforms (Transform | TransformCollection, optional) – Base Transforms to be applied to data. Defaults to TransformCollection().
add_default_transforms (bool, optional) – Add Default Transformations. Defaults to True
preprocess_transforms (Transform | TransformCollection | Callable | None) – Transforms to apply in preprocessing for datasets. Does not work on other file formats. Defaults to None.
- property global_override#
Get a context manager within which data will be overridden in all caches.
- property override#
Get a context manager within which data will be overridden in the cache.
- class pyearthtools.data.indexes.cacheIndex.MemCache(pattern=None, pattern_kwargs=None, *, max_size=None, compute=False, transforms=TransformCollection Initialisation A Collection of Transforms to be applied to Data apply_default False intelligence_level 100 Transforms, add_default_transforms=True, **kwargs)#
Memory Cache
Examples
>>> import pyearthtools.data ... >>> mem_cache = pyearthtools.data.indexes.FunctionalMemCacheIndex(function = pyearthtools.data.archive.ERA5.sample()) >>> mem_cache_test('2000-01-01T00') ... # Cached into memory
Cache into memory
Uses either hash of args and kwargs or
patternto create key,- Parameters:
pattern (str | type | PatternIndex | None) – Pattern to use to create path to act as key. Defaults to None.
pattern_kwargs (dict[str, Any] | None) – Kwargs for
patternif given. Defaults to None.max_size (str | ByteSize | None) – Max size of cache, set to None for no limit. Defaults to None.
compute (bool) – Compute xarray / dask objects when given. Defaults to False.
transforms (Transform | TransformCollection) – Transforms to add upon data retrieval. Defaults to TransformCollection().
add_default_transforms (bool)
- cleanup(complete=False)#
Cleanup cache, limiting size to
max_sizeif given.- Parameters:
complete (bool, optional) – Completely remove cache. Defaults to False.
- get(*args, **kwargs)#
Get data from Memory Cache
- get_hash(*args)#
Get hash of args for unique key of data
If
patternis set, use it to create a path.- Return type:
str
- property pattern: PatternIndex | None#
Get Pattern from
__init__args
- property size#
Size of current cache,
Will fully count size of
xarrayobjects even if delayed
- class pyearthtools.data.indexes.cacheIndex.FileSystemCacheIndex(cache, pattern=None, pattern_kwargs={}, *, transforms=TransformCollection Initialisation A Collection of Transforms to be applied to Data apply_default False intelligence_level 100 Transforms, cleanup=None, override=None, verbose=False, save_kwargs=None, **kwargs)#
DataIndex Object that has no data on disk initially, but is being generated from other sources and saved in given cache.
Data Flowchart
graph LR A[Data Request '.get'] --> B{Cache Given?}; B --> | Yes | C{Data Exists...}; C --> | No | G; C --> | Yes | D[Get Data from Cache]; B --> | No | G[Generate Data];Base FileSystemCacheIndex Object to Cache data on the fly
If only
cacheis given, ExpandedDate, or TemporalExpandedDate will be used by default. Ifcacheandpatternnot given, will not save data, and the point of this class is lost.cachecan also be ‘temp’ to set to a TemporaryDirectory created on__init__, or include any environment variables, with $NOTATION.Warning
Existing Cache
If the
cacheis set to an existing cache location, and thepatternis the same being made and exists,pattern_kwargswill be set by default to the existing cache’s kwargs, and then updated by any given.- Parameters:
cache (str | Path | None) – Location to save data to.
pattern (str | type | PatternIndex | None) – String of pattern to use or defined pattern. Defaults to ExpandedDate, or TemporalExpandedDate.
pattern_kwargs (dict[str, Any] | str) – Kwargs to pass to initalisation of new pattern if pattern is str.
transforms (Transform | TransformCollection) – Base Transforms to apply.
cleanup (dict[str, Any] | float | int | str | None) –
Cache cleanup settings.
If a number type, assumed to represent age of file in days.
If dictionary type, the following keys can be used:
Key
Purpose
Type
delta
Time delta to delete files past
int, float, tuple, TimeDelta
dir_size
Maximum allowed directory size. Deletes oldest according to
keyint, float, str, ByteSize (if str, use ‘100 GB’ format)
key
Key to use to find time of file for other time based delete steps. Default ‘modified’.
Literal[‘modified’, ‘created’]
data_time
Maximum difference in time the data is of and current time
int, float, tuple, TimeDelta
verbose
Print files being deleted
bool
Cleanup is run on each initialisation and deletion of the
CacheIndex, and can be triggered manually with.cleanup()Defaults to None.
override (bool, optional) – Override cached data. Defaults to False.
save_kwargs (dict[str, Any], optional) – Kwargs to pass to save function. Defaults to None.
verbose (bool)
- Raises:
ValueError – If
cacheandpatternnot given.
- cleanup(complete=False)#
Cleanup cache directory using
cleanupas provided in__init__.- Parameters:
complete (bool, optional) – Complete directory cleanup. If set to True, this will delete all data in the cache. Defaults to False.
- filesystem(*args)#
Search for generated data if cache is given. If data does not exist yet, generate it, save it, and return the path to it
Data is generated here if cache is given so that
.seriesoperations, can work on filesystem, and thus any dask things work well.- Parameters:
args (Any) – Args to search for / generate data for
- Returns:
Filepath to discovered / generated data
- Return type:
(Path | list[str | Path] | dict[str, str | Path])
- Raises:
NotImplementedError – If
cacheis not set, cannot cache data.
- generate(*args, **kwargs)#
Using child classes implemented
_generate, generate data, and save it using the pattern.Return the saved data as managed by the pattern.
Only args is passed to save pattern to find the path to save at.
- Returns:
Saved and reloaded data
- Return type:
(Any)
- get(*args, **kwargs)#
Retrieve Data given a key
If cache is given, automatically check to see if the file is generated, else, generate it and return the data
If cache is not given, just generate and return the data
- Parameters:
*args (Any) – Arguments to generate data for
**kwargs (Any) – Kwargs to generate with
- Returns:
Loaded data
- Return type:
xr.Dataset
- property pattern: PatternIndex#
Get Pattern from
__init__args
- save_record()#
Save record of the cache and pattern within the cache directory.
- class pyearthtools.data.indexes.cacheIndex.CacheFactory(basecache, index, *, name=None, doc=None)#
Create Cache Subclasses
- Parameters:
basecache (type)
index (type[Index])
name (str | None)
doc (str | None)
- Return type:
type
- class pyearthtools.data.indexes.cacheIndex.FunctionalCache(*args, function, **kwargs)#
Introduce [transforms][pyearthtools.data.transforms] to data loading
- Parameters:
transforms (Transform | TransformCollection, optional) – Base Transforms to be applied to data. Defaults to TransformCollection().
add_default_transforms (bool, optional) – Add Default Transformations. Defaults to True
preprocess_transforms (Transform | TransformCollection | Callable | None) – Transforms to apply in preprocessing for datasets. Does not work on other file formats. Defaults to None.
function (Callable[[Any], Any])
- class pyearthtools.data.indexes.combine.InterpolationIndex(*ind, indexes=None, transforms=TransformCollection Initialisation A Collection of Transforms to be applied to Data apply_default False intelligence_level 100 Transforms, data_interval=None, **kwargs)#
Setup TimeDataIndex
For indexing with time and applying transforms
- Parameters:
transforms (Transform | TransformCollection, optional) – Transforms to add when retrieving data. Defaults to TransformCollection().
data_interval (tuple[int, str] | int, optional) – Temporal Interval of Data. Defaults to None.
preprocess_transforms (Transform | TransformCollection, optional) – Transforms to apply in preprocessing for datasets. Does not work on other file formats. Defaults to None.
indexes (Index | dict)
- get(*args, **kwargs)#
Base Level
.getcall, used to retrieve data from args
- retrieve(querytime, *, aggregation=None, select=True, use_simple=False, **kwargs)#
Retrieve data at timestep, but will use the resolution of the time to infer large scale retrievals.
Tip
“Date Behaviour”
>>> | Date | Behaviour | >>> | ------------------ | -----------------------| >>> | '2021-01-01T00:00' | Exact Data | >>> | '2021-01-01' | All Data in that day | >>> | '2021-01' | All Data in that month | >>> | '2021' | All Data in that year |
- Parameters:
querytime (str | Petdt) – Timestep to retrieve data at, can be exact data or range as described above.
aggregation (str) – If data becomes a range, can specify an aggregation method.
select (bool) – Whether to attempt to select the given timestep if date is either fully qualified or data_interval not given.
use_simple (bool) – Whether to simply use the
DataIndex.retrieveinstead.kwargs – Kwargs passed to downstream retrieval function
- Returns:
Loaded Dataset with transforms applied, and aggregated if
aggregation_methodgiven.- Raises:
DataNotFoundError – If Data not found at timestep.
- Return type:
Dataset
Note
Extra transforms can be supplied, using `transforms = `
- class pyearthtools.data.indexes.fake.FakeIndex(variable='data', *, interval=(1, 'hour'), max_value=1.0, data_size=(128, 128), random=True, **kwargs)#
Get fake random seed data at a given interval.
Appears to be a latitude longitude dataset.
As this implements the
AdvancedTimeDataIndex, selecting lower resolutions behaves correctly.Setup fake data indexer
- Parameters:
variable (list[str] | str, optional) – Name/Names of variables. Defaults to “data”.
interval (tuple, optional) – Interval of data. Defaults to (1, “hour”).
max_value (float, optional) – Maximum value in random data. Defaults to 1.0.
data_size (tuple[int, int], optional) – Lat, Lon size. Defaults to (128, 128).
random (bool) – Whether to make random data, if not, will make data with
max_valueas all values.
- pyearthtools.data.indexes.extensions.register_accessor(name, object=<class 'pyearthtools.data.indexes._indexes.Index'>)#
Register a custom accessor on
pyearthtools.dataindexes.Any decorated class will receive the
pyearthtools.data.Indexas it’s first and only argument.- Parameters:
name (str) – Name under which the accessor should be registered. A warning is issued if this name conflicts with a preexisting attribute.
object (str | type | ModuleType, optional) –
pyearthtools.data.indexesobject to register accessor to. By default this will add to the base level index, so is available from all. Defaults to Index.
- Return type:
Callable
Examples
In your library code:
>>> @pyearthtools.data.register_accessor("geo", 'DataIndex') ... class GeoAccessor: ... def __init__(self, pyearthtools_obj): ... self._obj = pyearthtools_obj
… # Using the
pyearthtools.data.Index, retrieve data and do something. … def plot(self): … # Run plotting … pass …Back in an interactive IPython session:
>>> era5 = pyearthtools.data.archive.ERA5( ... variables = '2t', level = 'single' ... ) >>> era5.geo.plot() # plots index on a map
data.modifications#
- class pyearthtools.data.modifications.Modification(variable, index_class, index_kwargs, variable_keyword)#
Modifications to
variablesfor Data IndexesThese are to be used when modifying variables at a core level, such that the more information is needed then what is returned upon a simple index into the data.
- For example:
- Creating an accumulation:
When getting data at particular time step, an accumulation cannot be found as it requires prior information, a
modificationcan then go and get this to create the accumulation.This is how it differs from a
transform, as they transform the data retrieved, and this creates and modifies effectively as it is being retrieved.
- Implementing:
To implement a Modification
single&seriesmust be provided.singletakes a single timestep and expects a dataset to be returned with the variable as modified.seriestakes a start, end and interval, as can be parsed bypyearthtools.data.TimeRange, and expects a dataset to be returned with the variable as modified but all timesteps as defined by the range.variablecontains the variable being modified.datacontains theTimeDataIndexto retrieve the data from.attribute_updatecan be overridden to specify a dictionary to update the attributes with.
Setup Modification
- Parameters:
variable (str) – Variable being modified
index_class (TimeDataIndex) – Class where data is being sourced from
index_kwargs (dict[str, Any]) – Kwargs used to init
index_classvariable_keyword (str) – Keyword for
variablewhen initingindex_class
- property attribute_update#
Attributes to update on variable
- property data#
Get the
TimeDataIndexas specified by the user in which to find the modification.
- abstractmethod series(start, end, interval)#
Get the modification for a series of timesteps
- Return type:
Dataset
- abstractmethod single(time)#
Get the modification for a single timestep
- Return type:
Dataset
- property variable#
Variable being modified as given by the user.
- pyearthtools.data.modifications.register_modification(name)#
Register a modification for use with
@pyearthtools.data.indexes.decorators.variable_modifications.- Parameters:
name (str) – Name under which the modification should be registered. A warning is issued if this name conflicts with a preexisting modification.
- Return type:
Callable
- pyearthtools.data.modifications.variable_modifications(variable_keyword='variable', *, remove_variables=False, skip_if_invalid_class=False)#
Allow modifications of variables dynamically,
- Parameters:
variable_keyword (str, optional) – Parameter name of variables to parse. Defaults to “variable”.
remove_variables (bool, optional) – Whether to remove variables from the initialisation of the underlying class. Defaults to False.
skip_if_invalid_class (bool, optional) – Whether to skip if discovered class is invalid. Is invalid if class is not a subclass of TimeIndex and DataIndex
- Raises:
KeyError – If cannot find
variable_keywordin init args.TypeError – If class is not a subclass of TimeIndex and DataIndex and not
skip_if_invalid_class.
- Return type:
Callable[[C], C]
- Syntax:
Within the specification of the variables, a user can set the modifications with either,
Can be str of form
'!accumulate[period: "6 hourly"]:tcwv>accum_tcwv', where:!accumulatereferences the function to applythe
[init kwargs]specify the required kwargs needed, supplied in json form,the string after
:being the normal variable specification with anything after>being the new name.
Or dictionary with following keys:
source_var(REQUIRED) Variable to modifymodification(REQUIRED) Modification to applytarget_varRename of variable**Any other keys formodification
This will be transparent to the user, and only act upon retrieval of data.
Available modifications include:
!accumulate!mean!aggregate
Examples
>>> class Archive(ArchiveIndex): >>> @variable_modifications(variable_keyword = 'variable') >>> def __init__(self, variable): ... ... ... ... # Then usage of that Archive >>> Archive('!accumulate[period = "6 hourly"]:tcwv)
Notes
If using this decorator with
check_argumentsput this one above it, and withalias_argumentsput it below.
- class pyearthtools.data.modifications.aggregations.Aggregation(period, align='past', **kwargs)#
Root class for the creation of an aggregated variable
Cannot be directly used.
time dimension will be renamed
aggregate_dimand is expected to be aggregated over forsingle.Setup aggregator
- Parameters:
period (str) – Period to aggregate over Used here to extend time
inclusive (bool, optional) – Include end time. Defaults to False.
align (Literal['past'])
- class pyearthtools.data.modifications.aggregations.AggregationGeneral(method, period, align='past', **kwargs)#
Create a general aggregation over time variable.
Aggregates as a rolling window of size
periodUsage: - !aggregation[method: ‘max’, period: “6 hours”]
General aggregation
- Parameters:
method (str) – Method name to use
period (str) – Period to apply
methodoverinclusive (bool, optional) – Include end time. Defaults to False.
align (Literal['past'])
- property attribute_update#
Attributes to update on variable
- series(start, end, interval)#
Get the modification for a series of timesteps
- Return type:
Dataset
- single(time)#
Get the modification for a single timestep
- Return type:
Dataset
- class pyearthtools.data.modifications.aggregations.Mean(period, align='past', **kwargs)#
Create a mean over time variable
Averages as a rolling window of size
periodUsage: - !mean[period: “6 hours”]
Setup aggregator
- Parameters:
period (str) – Period to aggregate over Used here to extend time
inclusive (bool, optional) – Include end time. Defaults to False.
align (Literal['past'])
- property attribute_update#
Attributes to update on variable
- series(start, end, interval)#
Get the modification for a series of timesteps
- Return type:
Dataset
- single(time)#
Get the modification for a single timestep
- Return type:
Dataset
- class pyearthtools.data.modifications.aggregations.Accumulate(period, align='past', **kwargs)#
Create an accumlated over time variable
Accumulates as a rolling window of size
periodUsage: - !accumulate[period: “6 hours”]
Setup aggregator
- Parameters:
period (str) – Period to aggregate over Used here to extend time
inclusive (bool, optional) – Include end time. Defaults to False.
align (Literal['past'])
- property attribute_update#
Attributes to update on variable
- series(start, end, interval)#
Get the modification for a series of timesteps
- Return type:
Dataset
- single(time)#
Get the modification for a single timestep
- Return type:
Dataset
- class pyearthtools.data.modifications.constants.Constant(query=None, memory=True, **kwargs)#
Force a variable to remain constant no matter the time requested.
Uses
queryif given, otherwise sets it off first time requested. Usememoryto control if precomputed.Usage: - !constant[query: ‘2000-01-01T00’, memory: True]:variable
General aggregation
- Parameters:
query (Optional[str]) – Query to use. If None, will use first time retrieved. Defaults to None.
memory (bool) – Whether to hold the data in memory. Defaults to True.
- property attribute_update#
Attributes to update on variable
- series(start, end, interval)#
Get the modification for a series of timesteps
- Return type:
Dataset
- single(time)#
Get the modification for a single timestep
- Return type:
Dataset
- class pyearthtools.data.modifications.decorator.VariableModification(specification)#
- Parameters:
specification (str | dict[str, Any])
- class pyearthtools.data.modifications.decorator.Modifier(index, modifications, index_kwargs, variable_keyword)#
Transformto apply the modification to variablesSetup Modifier
- Parameters:
index (TimeDataIndex) – Base Index in which data is being modified
modifications (dict[str, tuple[Type['Modification'], dict[str, Any]]]) –
- Dictionary of modifications:
variable: (Modification Class, modification init kwargs)
index_kwargs (dict[str, Any]) – Kwargs used to initialise
index, used to recreate the indexes.variable_keyword (str) – Keyword name for variable
- apply(dataset)#
Apply modifications to
datasetWill replace each variable being modified if in
dataset.- Parameters:
dataset (Dataset)
- Return type:
Dataset
- property modifiers: dict[str, Modification]#
Get initilised dictionary of modifiers
variable: Modifier
- to_repr_dict()#
Convert to dictionary ready for repr
- class pyearthtools.data.modifications.reductions.Reduction(variable, index_class, index_kwargs, variable_keyword)#
Setup Modification
- Parameters:
variable (str) – Variable being modified
index_class (TimeDataIndex) – Class where data is being sourced from
index_kwargs (dict[str, Any]) – Kwargs used to init
index_classvariable_keyword (str) – Keyword for
variablewhen initingindex_class
- class pyearthtools.data.modifications.reductions.Groupby(time_component, method, **kwargs)#
Setup Modification
- Parameters:
variable (str) – Variable being modified
index_class (TimeDataIndex) – Class where data is being sourced from
index_kwargs (dict[str, Any]) – Kwargs used to init
index_classvariable_keyword (str) – Keyword for
variablewhen initingindex_classtime_component (str)
method (str)
- series(start, end, interval)#
Get the modification for a series of timesteps
- Return type:
Dataset
- single(time)#
Get the modification for a single timestep
- Return type:
Dataset
- pyearthtools.data.modifications.reductions.Hourly(method='mean', **kwargs)#
- Parameters:
method (str)
- pyearthtools.data.modifications.reductions.Daily(method='mean', **kwargs)#
- Parameters:
method (str)
- pyearthtools.data.modifications.reductions.Monthly(method='mean', **kwargs)#
- Parameters:
method (str)
- pyearthtools.data.modifications.register.register_modification(name)#
Register a modification for use with
@pyearthtools.data.indexes.decorators.variable_modifications.- Parameters:
name (str) – Name under which the modification should be registered. A warning is issued if this name conflicts with a preexisting modification.
- Return type:
Callable
data.operations#
- pyearthtools.data.operations.percentile(dataset, percentiles)#
Find Percentiles of given data
- Parameters:
dataset (xr.DataArray | xr.Dataset) – Dataset to find percentiles of
percentiles (float | list[float]) – Percentiles to find either float or list[float]
- Returns:
Dataset with percentiles
- Return type:
(xr.Dataset)
Examples
>>> percentile(dataset, [1, 99]) # Dataset containing 1st and 99th percentiles
- pyearthtools.data.operations.aggregation(dataset, aggregation, reduce_dims=None, *, preserve_dims=None)#
Run an aggregation method over a given dataset
- !!! Warning
Either
reduce_dimsorpreserve_dimsmust be given, but not both.
- Parameters:
dataset (xr.Dataset) – Dataset to run aggregation over
aggregation (str | Callable) – Aggregation method, can be defined function or xarray function
reduce_dims (list | str, optional) – Dimensions to reduce over. Defaults to None.
preserve_dims (list | str, optional) – Dimensions to keep. Defaults to None.
- Raises:
ValueError – If invalid
reduce_dimsorpreserve_dimsare given- Returns:
Dataset with aggregation method applied
- Return type:
(xr.Dataset)
- pyearthtools.data.operations.binning(data, setup, *, dimension='time', expand=True, offset=None)#
Bin
databased on a binning setup.If
expandisTrueuseDELTAto create new bins until all included.## Implemented: | name | Description | | —- | ———– | | seasonal | Daily up till first week, than weekly | | daily | Daily grouping | | weekly | Weekly grouping |
- Parameters:
data (xr.Dataset | xr.DataArray) – Data to bin
setup (str) – Binning config to use.
dimension (str, optional) – Dimension to bin across. Defaults to ‘time’.
expand (bool, optional) – Whether to expand bins to encompass all the data. Defaults to True.
offset (int | TimeDelta | None, optional) – Offset to add to starting time. Will be the minimum value upon
timeaxis. Defaults to None.
- Raises:
ValueError – If
setupnot available, or not inDELTAwhileexpandis True.AttributeError – If
dimensionnot indata.
- Returns:
Data binned according to config.
- Return type:
(xr.DatasetGroupBy | xr.DataArrayGroupBy)
- class pyearthtools.data.operations.SpatialInterpolation(*datasets, reference_dataset=None, merge=True, method='linear', include_reference=True, **kwargs)#
Spatially Interpolate Datasets together Uses [pyearthtools.data.transforms.interpolation][pyearthtools.data.transforms.interpolation.InterpolateTransform], thus all kwargs passed there
- Parameters:
*datasets (Dataset) – All datasets to be spatially and temporally interpolated
reference_dataset (Dataset | None) – Reference Dataset to use as base, if not given use first dataset.
merge (bool) – Whether to merge datasets together.
method (str) – Spatially interpolation method. Uses [xarray interpolation][xarray.interpolation], which itself uses [scipy.interpolate][scipy.interpolate.interpn].
include_reference (bool) – Whether to include reference datasets.
**kwargs – Extra kwargs passed to [pyearthtools.data.transforms.interpolation][pyearthtools.data.transforms.interpolation.InterpolateTransform.like]
- Returns:
List of datasets if merge == false, else one merged datasets
- Return type:
list[Dataset] | Dataset
- class pyearthtools.data.operations.TemporalInterpolation(*datasets, reference_dataset=None, aggregation_function='mean', merge=True, include_reference=True, **kwargs)#
Temporally Interpolate Datasets together Uses [pyearthtools.data.transforms.Aggregation][pyearthtools.data.transforms.aggregation.AggregateTransform.over], thus all kwargs passed there
- !!! Behaviour
All timesteps will be aggregated to match time dim of reference dataset, Will only grab time before the given timestep
- Parameters:
*datasets (xr.Dataset) – All datasets to be spatially and temporally interpolated
reference_dataset (xr.Dataset, optional) – Reference Dataset to use as base, if not given use first dataset. Defaults to None.
aggregation_function (Callable | str, optional) – Aggregation function to use. Uses [pyearthtools.data.transforms.Aggregation][pyearthtools.data.transforms.aggregation.AggregateTransform.over]. Defaults to “mean”.
merge (bool, optional) – Whether to merge datasets together. Defaults to True.
include_reference (bool, optional) – Whether to include reference datasets. Defaults to True.
- Raises:
ValueError – If time dim not present in
reference_dataset- Returns:
List of datasets if merge == false, else one merged datasets
- Return type:
(list[xr.Dataset] | xr.Dataset)
- class pyearthtools.data.operations.FullInterpolation(*datasets, reference_dataset=None, temporal_reference_dataset=None, spatial_method='linear', aggregation_function='mean', merge=True, include_reference=True)#
Interpolate Datasets both spatially and temporally
- Parameters:
*datasets (xr.Dataset) – All datasets to be spatially and temporally interpolated
reference_dataset (xr.Dataset, optional) – Reference Dataset to use as base, if not given use first dataset. Defaults to None.
temporal_reference_dataset (xr.Dataset, optional) – Temporal Reference Dataset to use as base, if not given use reference_dataset. Defaults to None.
spatial_method (str, optional) – Spatially interpolation method. Defaults to “linear”.
aggregation_function (Callable | str, optional) – Aggregation function to use. Uses [pyearthtools.data.transforms.Aggregation][pyearthtools.data.transforms.aggregation.AggregateTransform.over]. Defaults to “mean”.
merge (bool, optional) – Whether to merge datasets together. Defaults to True.
include_reference (bool, optional) – Whether to include reference datasets. Defaults to True.
- Returns:
List of datasets if merge == false, else one merged datasets
- Return type:
(list[xr.Dataset] | xr.Dataset)
- pyearthtools.data.operations.index_routines.series(DataFunction, start, end, interval, *, inclusive=False, skip_invalid=False, transforms=TransformCollection Initialisation A Collection of Transforms to be applied to Data apply_default False intelligence_level 100 Transforms, verbose=False, force_get=False, subset_time=True, time_dim=None, tolerance=None, **kwargs)#
Index into Provided Data function to create a continuous series of Data
- Parameters:
DataFunction (AdvancedTimeIndex) – Data function, must be AdvancedTimeIndex or child
start (str | datetime.datetime | Petdt) – Timestep to begin series at
end (str | datetime.datetime | Petdt) – Timestep to end series at
interval (tuple[float, str]) – Time interval between samples. Use pandas.to_timedelta notation, (10, ‘minute’)
inclusive (bool, optional) – Whether end time is included in retrieval. Defaults to False.
skip_invalid (bool, optional) – Whether to skip invalid data. Defaults to False.
transforms (Transform | TransformCollection, optional) – Extra [Transform’s][pyearthtools.data.transforms.Transform] to be applied to data. Defaults to TransformCollection().
verbose (bool, optional) – Print logging messages. Defaults to False.
force_get (bool, optional) – Use series method which loads each dataset using
.get. WARNING: Takes significantly longer, as it does not use dask. Defaults to False.subset_time (bool, optional) – Whether to force subset time dim. Defaults to True.
tolerance (tuple | pd.Timedelta, optional) – Tolerance for time subsetting. Defaults to None.
time_dim (str | None)
- Returns:
Loaded xarray dataset
- Return type:
(xr.Dataset)
- pyearthtools.data.operations.index_routines.safe_series(DataFunction, start, end, interval, **kwargs)#
Safely index into the provided Data function to create a continuous series of Data.
Called by the series method
Uses [series][pyearthtools.data.operations.index_routines.series], but provides an automatic interpolation.
- !!! Warning
If data is missing or if a resolution higher than the actual data resolution is provided, those missing time steps will be interpolated,
- Parameters:
DataFunction (AdvancedTimeIndex) – Data function, must be AdvancedTimeIndex or child
start (str | Petdt) – Timestep to begin series at
end (str | Petdt) – Timestep to end series at
interval (TimeDelta) – Time interval between samples. Use pandas.to_timedelta notation, (10, ‘minute’)
**kwargs (dict, optional) – Any extra keyword arguments to pass to [series][pyearthtools.data.operations.index_routines.series]
- Returns:
Loaded xarray dataset
- Return type:
(xr.Dataset)
- pyearthtools.data.operations.index_operations.split_ds(dataset, divisions=1, dim='time')#
Split an xarray Dataset into a set number of datasets
- Parameters:
dataset (xr.Dataset) – Dataset to split
divisions (int, optional) – Number of divisions to make. Defaults to 1.
dim (str, optional) – Which dim to split on. Defaults to “time”.
- Returns:
List of Datasets
- Return type:
list[xr.Dataset]
- pyearthtools.data.operations.index_operations.split_ds_gen(dataset, divisions=1, dim='time')#
Generator version of split_ds
- Parameters:
dataset (xr.Dataset) – Dataset to split
divisions (int, optional) – Number of divisions to make. Defaults to 1.
dim (str, optional) – Which dim to split on. Defaults to “time”.
- Yields:
list[xr.Dataset] – List of Datasets
- Return type:
list[Dataset]
- pyearthtools.data.operations.index_operations.aggregation(DataFunction, start, end, interval, *, aggregation='mean', aggregation_dim='time', save_location=None, skip_invalid=True, num_divisions=1, transforms=TransformCollection Initialisation A Collection of Transforms to be applied to Data apply_default False intelligence_level 100 Transforms, verbose=False, **kwargs)#
Get aggregation of [TimeIndex][pyearthtools.data.TimeIndex] over given dimension
- !!! Warning:
Any
num_divisionsnot a factor of the number of data steps will result in some data being missed.
- Parameters:
DataFunction (TimeIndex) – TimeIndex to retrieve Data
start (str | datetime.datetime | Petdt) – Start Date
end (str | datetime.datetime | Petdt) – End Date
interval (tuple[float, str]) – Interval between samples. Use pandas.to_timedelta notation, (10, ‘minute’)
aggregation (str, optional) – Aggregation Function to apply. Defaults to “mean”.
aggregation_dim (str, optional) – Dimension to aggregate over apply. Defaults to “time”.
save_location (str | Path | None, optional) – Location to automatically save the result. Defaults to None.
skip_invalid (bool, optional) – Whether to skip invalid data. Defaults to True.
num_divisions (int, optional) – Number of times to divide series to alleviate memory issues. Defaults to 1.
transforms (Transform | TransformCollection, optional) – Extra Transforms to be applied. Defaults to TransformCollection().
verbose (bool, optional) – Whether to log progress messages. Defaults to False.
- Returns:
Dataset with aggregation applied
- Return type:
xr.Dataset
- pyearthtools.data.operations.index_operations.find_range(DataFunction, start, end, interval, *, skip_invalid=True, num_divisions=1, transforms=TransformCollection Initialisation A Collection of Transforms to be applied to Data apply_default False intelligence_level 100 Transforms, **kwargs)#
Find Minimum and Maximum of a [TimeIndex][pyearthtools.data.TimeIndex] in the given time range
- Parameters:
DataFunction (TimeIndex) – TimeIndex to retrieve Data
start (str | Petdt) – Start Date
end (str | Petdt) – End Date
interval (tuple[float, str]) – Interval between samples. Use pandas.to_timedelta notation, (10, ‘minute’)
skip_invalid (bool, optional) – Whether to skip invalid data. Defaults to True.
num_divisions (int, optional) – Number of times to divide series to alleviate memory issues. Defaults to 1.
transforms (Transform | TransformCollection, optional) – Extra Transforms to be applied. Defaults to TransformCollection().
- Returns:
Dictionary with max and min populated
- Return type:
dict
- pyearthtools.data.operations.utils.identify_time_dimension(data)#
Attempt to identify time dimension in dataset.
If cannot be identified, return ‘time’
- Parameters:
data (DataArray | Dataset)
- Return type:
str
- pyearthtools.data.operations.forecast_op.forecast_series(DataFunction, start, end, interval, *, lead_time=None, inclusive=False, skip_invalid=False, transforms=TransformCollection Initialisation A Collection of Transforms to be applied to Data apply_default False intelligence_level 100 Transforms, verbose=False)#
- pyearthtools.data.operations.forecast_op.forecast_as_basetime(DataFunction, start, end, interval, *, inclusive=False, skip_invalid=False, transforms=TransformCollection Initialisation A Collection of Transforms to be applied to Data apply_default False intelligence_level 100 Transforms, verbose=False)#
Forecast series concating by basetime
- pyearthtools.data.operations.forecast_op.forecast_select_time(DataFunction, start, end, interval, lead_time, *, inclusive=False, skip_invalid=False, transforms=TransformCollection Initialisation A Collection of Transforms to be applied to Data apply_default False intelligence_level 100 Transforms, verbose=False)#
Forecast Series operation selecting a particular lead time
data.patterns#
- class pyearthtools.data.patterns.PatternIndex(*args, root_dir, **kwargs)#
Introduce [transforms][pyearthtools.data.transforms] to data loading
- Parameters:
transforms (Transform | TransformCollection, optional) – Base Transforms to be applied to data. Defaults to TransformCollection().
add_default_transforms (bool, optional) – Add Default Transformations. Defaults to True
preprocess_transforms (Transform | TransformCollection | Callable | None) – Transforms to apply in preprocessing for datasets. Does not work on other file formats. Defaults to None.
root_dir (str | Path)
- cleanup(safe=False)#
Clean up temp_dir if it exists.
If not safe and not
temp_dirraise AttributeError- Parameters:
safe (bool)
- static from_pattern(pattern_function, *args, **kwargs)#
Create Pattern Index from given pattern name
- Parameters:
*args (Any) – Passed to discovered pattern
pattern_function (Callable | str) – Either the function to use, or the pattern name within pyearthtools.data.patterns
*kwargs (Any) – Passed to discovered pattern
- Raises:
KeyError – If pattern not found
TypeError – If not callable
- Returns:
Loaded Pattern Index
- Return type:
- get_root_dir()#
Get root dir if set.
- Raises:
RuntimeError – If
root_dirnot set- Returns:
Set
root_dir- Return type:
(str | Path)
- save(data, *args, **kwargs)#
Save data using this pattern to find where to save
- Parameters:
data (Any) – Data to save
*args (Any, optional) – Arguments to pass to
searchto find filepath*kwargs (Any, optional) – Keyword arguments to pass to
searchto find filepath
- class pyearthtools.data.patterns.PatternTimeIndex(*args, **kwargs)#
Temporal Pattern Index
Used for when a pattern can advanced time indexing, like [series][pyearthtools.data.AdvancedTimeIndex.series]
Setup TimeIndex.
Will warn a user if date is of incorrect resolution
- Parameters:
data_interval –
Interval of data. Must follow format for [TimeDelta][pyearthtools.data.time.TimeDelta].
E.g.
>>> (1, 'h') = 1 Hour >>> (10, 'D') = 10 Days. >>> 10 = 10 minutes.
round – Default value for round when retrieving data.
- class pyearthtools.data.patterns.PatternForecastIndex(*args, **kwargs)#
Setup TimeIndex.
Will warn a user if date is of incorrect resolution
- Parameters:
data_interval –
Interval of data. Must follow format for [TimeDelta][pyearthtools.data.time.TimeDelta].
E.g.
>>> (1, 'h') = 1 Hour >>> (10, 'D') = 10 Days. >>> 10 = 10 minutes.
round – Default value for round when retrieving data.
- class pyearthtools.data.patterns.PatternVariableAware(variables=None, *args, variable_parse=None, verbose=True, **kwargs)#
Base Pattern class for patterns which are variable aware.
That means, any dataset passed to be saved will be saved in individual variables, and files can be loaded from different variables.
A child class must implement
root_pattern, this informs this class which pattern to use when constructing a new PatternIndex for each variable. Usingvariable_parseallows a user to specify which arguments the variable is added to.A child class pattern can set a default
variable_parseby setting thedefault_variable_parseproperty.Examples
Say a pattern is initalised as,
ExpandedDateVariable(root_dir = 'test', prefix = 'prefix_1')If
variable_parsewas set toroot_dir, any variable being requested will be added to the end ofroot_dir. This newroot_dir=test/VARIABLEwill be used to create a new pattern soley used for that variable,ExpandedDateVariable(root_dir = 'test/VARIABLE', prefix = 'prefix_1')Construct a variable aware pattern.
- Parameters:
variables (str | list[str], optional) – Variables to find by default. When saving, variables will be appended to this list. If not given, can’t really be used to load data, but is useful for saving data. Defaults to []
variable_parse (str | list[str], optional) – Initalisation argument/s to add variable to when constructing the new pattern. Defaults to ‘prefix’ if
default_variable_parsenot set.verbose (bool, optional) – Whether to warn the user if a variable is saved and not already given. Defaults to True.
- !!! Note
variable_parsecan be used to reference many different types, and the following table details the behaviour. | Type | Behaviour | | —- | ——— | | Path | Added to the end as directory layer | | str | Attempt to parse to Path, or just append | | None | Replaced | | list | Appended |
- filesystem(*args, variables=None, **kwargs)#
Find paths on disk for all variables given the arguments
- Parameters:
*args (Any, optional) – Arguments to pass to underlying pattern
filesystemvariables (list[str] | str, optional) – Extra variables to add to find. Defaults to None
**kwargs (Any, optional) – Keyword arguments to pass to underlying pattern
filesystem
- Returns:
Dictionary of paths to each variable {variable: PathToVariable}
- Return type:
(dict)
- abstract property root_pattern: PatternIndex#
Get pattern for finding/saving a specific variable
- !!! Note
Must be implemented by the child class
- Returns:
Uninitalised pattern to use to find location of variable
- Return type:
- save(data, *save_args, **save_kwargs)#
Save a [dataset][xarray.Dataset] splitting it by variable.
Extra arguments are used in a
searchcall to find path to save data at.- Parameters:
data (xr.Dataset) – Data to save
*save_args (Any, optional) – Arguments to pass to underlying pattern save
**save_kwargs (Any, optional) – Keyword arguments to pass to underlying pattern save
- Raises:
TypeError – If data is not a [dataset][xarray.Dataset]
- variable_pattern(variable)#
Using the given
variableand theroot_pattern, parsevariable_parseso that the variable is added correctly to init arguments to construct a new pattern specific to that variable.- Parameters:
variable (str) – Variable to make pattern for
- Raises:
TypeError – If cannot add variable to init argument
KeyError – If variable parse not in init_kwargs
- Returns:
Initialised pattern to use for the parsed
variable- Return type:
- class pyearthtools.data.patterns.Argument(root_dir, *, prefix='', extension='pyearthtools', valid_arguments=None, filename_as_arguments=False, filename_delimiter='_', expand_tuples=False, **kwargs)#
Generate FilePath Structure based upon a single argument
The argument specifies the filename, and the path is built out from
__init__paramsExamples
>>> pattern = pyearthtools.data.patterns.Argument('/dir/', extension = '.nc') >>> str(pattern.search('test')) '/dir/test.nc'
Argument Expansion based DataIndexer.
- Parameters:
root_dir (str | Path) – Root Path to use
prefix (str) – prefix to add.
extension (str) – File Extension to use. Used to determine saving and loading function.
valid_arguments (list[Any] | None) – Valid arguments to limit usability to.
filename_as_arguments (bool) –
Whether the filename should be constructed from all arguments.
E.g.
>>> ArgumentExpansion('name', 'dir1') ... # root_dir/name/dir1/name_dir1.extension
If False, filename is first argument given.
filename_delimiter (str) – delimiter for filename if
filename_as_argumentsis True.expand_tuples (bool | int) – Whether to expand tuples when given in search. If True, levels = 1. If
intrepresents how many levels to descend in the Iterable.
- filesystem(filename)#
Get filepath from arguments.
If
filename_as_argumentsis True, filename will be made from all args. Otherwise, filename will be first arg, with remaining making up the directory.- Parameters:
filename (str)
- Return type:
Path
- class pyearthtools.data.patterns.ArgumentExpansion(root_dir, *, prefix='', extension='pyearthtools', valid_arguments=None, filename_as_arguments=False, filename_delimiter='_', expand_tuples=False, **kwargs)#
Generate FilePath Structure based upon expansion of arguments
- If
filename_as_argumentsis False: First argument specifies the FileID, and subsequent arguments are used to create folder path.
- Otherwise:
Filename is made from all args, and directory is all args too.
Examples
>>> pattern = pyearthtools.data.patterns.ArgumentExpansion('/dir/') >>> str(pattern.search('test','arg')) '/dir/arg/test.nc' >>> str(pattern.search('test','arg', 'another_arg')) '/dir/arg/another_arg/test.nc' >>> pattern = pyearthtools.data.patterns.ArgumentExpansion('/dir/', filename_as_arguments = True) >>> str(pattern.search('test','arg')) '/dir/test/arg/test_arg.nc' >>> pattern = pyearthtools.data.patterns.ArgumentExpansion('/dir/', expand_tuples = True) >>> [str(x) for x in pattern.search('test',('arg1', 'arg2'))] ['/dir/arg1/test.nc', '/dir/arg2/test.nc']
Argument Expansion based DataIndexer.
- Parameters:
root_dir (str | Path) – Root Path to use
prefix (str) – prefix to add.
extension (str) – File Extension to use. Used to determine saving and loading function.
valid_arguments (list[Any] | None) – Valid arguments to limit usability to.
filename_as_arguments (bool) –
Whether the filename should be constructed from all arguments.
E.g.
>>> ArgumentExpansion('name', 'dir1') ... # root_dir/name/dir1/name_dir1.extension
If False, filename is first argument given.
filename_delimiter (str) – delimiter for filename if
filename_as_argumentsis True.expand_tuples (bool | int) – Whether to expand tuples when given in search. If True, levels = 1. If
intrepresents how many levels to descend in the Iterable.
- factory(*, single_argument=False, variable=False, **kwargs)#
Create an ArgumentExpansion pattern based on the requirements
- Parameters:
single_argument (bool, optional) – Single Argument pattern. Defaults to False.
variable (bool, optional) – Variable aware, splits variables when loading and saving. Defaults to False.
args (Any)
- Returns:
Created
ArgumentExpansionpattern.- Return type:
- If
- class pyearthtools.data.patterns.ArgumentExpansionVariable(variables=None, *args, variable_parse=None, verbose=True, **kwargs)#
ArgumentExpansion pattern which is variable aware
Will split each variable into a seperate file, using the variable as another layer in the
root_dirExamples
>>> pattern = ArgumentExpansionVariable(root_dir = '/test/', variables = 'variable', extension = 'nc') >>> str(pattern.search('filename', 'arg2')) {'variable' : '/test/arg2/variable/filename.nc'}
Construct a variable aware pattern.
- Parameters:
variables (str | list[str], optional) – Variables to find by default. When saving, variables will be appended to this list. If not given, can’t really be used to load data, but is useful for saving data. Defaults to []
variable_parse (str | list[str], optional) – Initalisation argument/s to add variable to when constructing the new pattern. Defaults to ‘prefix’ if
default_variable_parsenot set.verbose (bool, optional) – Whether to warn the user if a variable is saved and not already given. Defaults to True.
- !!! Note
variable_parsecan be used to reference many different types, and the following table details the behaviour. | Type | Behaviour | | —- | ——— | | Path | Added to the end as directory layer | | str | Attempt to parse to Path, or just append | | None | Replaced | | list | Appended |
- property root_pattern: type[ArgumentExpansion]#
Get pattern for finding/saving a specific variable
- !!! Note
Must be implemented by the child class
- Returns:
Uninitalised pattern to use to find location of variable
- Return type:
- class pyearthtools.data.patterns.ArgumentExpansionFactory(*args, single_argument=False, variable=False, **kwargs)#
Create an ArgumentExpansion pattern based on the requirements
- Parameters:
single_argument (bool, optional) – Single Argument pattern. Defaults to False.
variable (bool, optional) – Variable aware, splits variables when loading and saving. Defaults to False.
args (Any)
- Returns:
Created
ArgumentExpansionpattern.- Return type:
- class pyearthtools.data.patterns.Direct(root_dir, *, extension='.pyearthtools', prefix=None, file_resolution='minute', delimiter='', **kwargs)#
Generate Filepath structure based on time at given root directory
Examples
>>> pattern = pyearthtools.data.patterns.Direct('/dir/', extension = '.nc') >>> str(pattern.search('2020-01-02T0030')) '/dir/20200102T0030.nc' >>> pattern = pyearthtools.data.patterns.Direct('/dir/', extension = '.nc', deliminator = ('@', '%')) >>> str(pattern.search('2020-01-02T0030')) '/dir/2020@01@02T00%30.nc'
Direct time based DataIndexer.
- Parameters:
root_dir (str | Path) – Root Path to use
extension (str, optional) – File extension to load. Defaults to ‘data.patterns.default_extension’
prefix (None | str, optional) – File prefix to add. Defaults to None.
file_resolution (str | TimeResolution, optional) – Temporal resolution of the file name. Defaults to ‘minute’.
delimiter (str | tuple[str | None] | list[str | None], optional) – str/s to seperate time values with. If iterable, First element used to replace ‘-’ in date, and second ‘:’ in time’. Can set either element to None to not replace. Defaults to “”
kwargs (Any, optional) – Kwargs passed to PatternIndex
- factory(*, temporal=False, variable=False, forecast=False, **kwargs)#
Create an Direct pattern based on the requirements
- Parameters:
temporal (bool, optional) – Temporally aware, exclusive with
forecast, allows for.seriesoperations. Defaults to False.variable (bool, optional) – Variable aware, splits variables when loading and saving. Defaults to False.
forecast (bool, optional) – Forecast product, exclusive with
temporal, provides.seriesbut with forecasts. Defaults to False.args (Any)
- Raises:
ValueError – If both
temporalandforecastset. Cannot be both.- Returns:
Created
_Directpattern.- Return type:
(_Direct)
- to_temporal(data_interval)#
Get pattern as
TemporalDirect- Parameters:
data_interval (tuple[int, str] | int)
- Return type:
- class pyearthtools.data.patterns.TemporalDirect(root_dir, *, extension='.pyearthtools', prefix=None, file_resolution='minute', delimiter='', **kwargs)#
Direct PatternIndex which is also a AdvancedTimeIndex
Examples
>>> pattern = pyearthtools.data.patterns.TemporalDirect('/dir/', extension = '.nc', data_interval = (1, 'month')) >>> str(pattern.search('2020-01-02')) '/dir/202001.nc'
Direct time based DataIndexer.
- Parameters:
root_dir (str | Path) – Root Path to use
extension (str, optional) – File extension to load. Defaults to ‘data.patterns.default_extension’
prefix (None | str, optional) – File prefix to add. Defaults to None.
file_resolution (str | TimeResolution, optional) – Temporal resolution of the file name. Defaults to ‘minute’.
delimiter (str | tuple[str | None] | list[str | None], optional) – str/s to seperate time values with. If iterable, First element used to replace ‘-’ in date, and second ‘:’ in time’. Can set either element to None to not replace. Defaults to “”
kwargs (Any, optional) – Kwargs passed to PatternIndex
- class pyearthtools.data.patterns.ForecastDirect(root_dir, *, extension='.pyearthtools', prefix=None, file_resolution='minute', delimiter='', **kwargs)#
Direct PatternIndex which is also a ForecastIndex
Direct time based DataIndexer.
- Parameters:
root_dir (str | Path) – Root Path to use
extension (str, optional) – File extension to load. Defaults to ‘data.patterns.default_extension’
prefix (None | str, optional) – File prefix to add. Defaults to None.
file_resolution (str | TimeResolution, optional) – Temporal resolution of the file name. Defaults to ‘minute’.
delimiter (str | tuple[str | None] | list[str | None], optional) – str/s to seperate time values with. If iterable, First element used to replace ‘-’ in date, and second ‘:’ in time’. Can set either element to None to not replace. Defaults to “”
kwargs (Any, optional) – Kwargs passed to PatternIndex
- class pyearthtools.data.patterns.DirectVariable(variables=None, *args, variable_parse=None, verbose=True, **kwargs)#
Direct pattern which is variable aware
Will split each variable into a seperate file, using the variable as the prefix
Examples
>>> direct_var = DirectVariable(root_dir = '/test/', variables = 'variable', extension = 'nc') >>> str(direct_var.search('2021-01')) {'variable' : '/test/variable_202101.nc'}
>>> direct_var = DirectVariable(root_dir = '/test/', variables = 'variable', extension = 'nc', variable_parse = 'root_dir') >>> str(direct_var.search('2021-01-01')) {'variable' : '/test/variable/202101.nc'}
Construct a variable aware pattern.
- Parameters:
variables (str | list[str], optional) – Variables to find by default. When saving, variables will be appended to this list. If not given, can’t really be used to load data, but is useful for saving data. Defaults to []
variable_parse (str | list[str], optional) – Initalisation argument/s to add variable to when constructing the new pattern. Defaults to ‘prefix’ if
default_variable_parsenot set.verbose (bool, optional) – Whether to warn the user if a variable is saved and not already given. Defaults to True.
- !!! Note
variable_parsecan be used to reference many different types, and the following table details the behaviour. | Type | Behaviour | | —- | ——— | | Path | Added to the end as directory layer | | str | Attempt to parse to Path, or just append | | None | Replaced | | list | Appended |
- class pyearthtools.data.patterns.ForecastDirectVariable(variables=None, *args, variable_parse=None, verbose=True, **kwargs)#
Direct pattern which is variable aware and retrieves Forecasts
Will split each variable into a seperate file, using the variable as the prefix
Examples
>>> direct_var = ForecastDirectVariable(root_dir = '/test/', variables = 'variable', extension = 'nc') >>> str(direct_var.search('2021-01')) {'variable' : '/test/variable_202101.nc'}
>>> direct_var = ForecastDirectVariable(root_dir = '/test/', variables = 'variable', extension = 'nc', variable_parse = 'root_dir') >>> str(direct_var.search('2021-01-01')) {'variable' : '/test/variable/202101.nc'}
Construct a variable aware pattern.
- Parameters:
variables (str | list[str], optional) – Variables to find by default. When saving, variables will be appended to this list. If not given, can’t really be used to load data, but is useful for saving data. Defaults to []
variable_parse (str | list[str], optional) – Initalisation argument/s to add variable to when constructing the new pattern. Defaults to ‘prefix’ if
default_variable_parsenot set.verbose (bool, optional) – Whether to warn the user if a variable is saved and not already given. Defaults to True.
- !!! Note
variable_parsecan be used to reference many different types, and the following table details the behaviour. | Type | Behaviour | | —- | ——— | | Path | Added to the end as directory layer | | str | Attempt to parse to Path, or just append | | None | Replaced | | list | Appended |
- property root_pattern: type[ForecastDirect]#
Get pattern for finding/saving a specific variable
- !!! Note
Must be implemented by the child class
- Returns:
Uninitalised pattern to use to find location of variable
- Return type:
- class pyearthtools.data.patterns.TemporalDirectVariable(variables=None, *args, variable_parse=None, verbose=True, **kwargs)#
TemporalDirect pattern which is variable aware
Will split each variable into a seperate file, using the variable as the prefix.
Examples
>>> direct_var = TemporalDirectVariable(root_dir = '/test/', variables = 'variable', extension = 'nc', data_interval = (1,'year')) >>> direct_var.search('2021-01-01') {'variable' : '/test/variable/2021.nc'}
>>> direct_var = TemporalDirectVariable(root_dir = '/test/', variables = 'variable', extension = 'nc', data_interval = (1,'year'), variable_parse = 'root_dir') >>> direct_var.search('2021-01-01') {'variable' : '/test/variable/2021.nc'}
Construct a variable aware pattern.
- Parameters:
variables (str | list[str], optional) – Variables to find by default. When saving, variables will be appended to this list. If not given, can’t really be used to load data, but is useful for saving data. Defaults to []
variable_parse (str | list[str], optional) – Initalisation argument/s to add variable to when constructing the new pattern. Defaults to ‘prefix’ if
default_variable_parsenot set.verbose (bool, optional) – Whether to warn the user if a variable is saved and not already given. Defaults to True.
- !!! Note
variable_parsecan be used to reference many different types, and the following table details the behaviour. | Type | Behaviour | | —- | ——— | | Path | Added to the end as directory layer | | str | Attempt to parse to Path, or just append | | None | Replaced | | list | Appended |
- property root_pattern: type[TemporalDirect]#
Get pattern for finding/saving a specific variable
- !!! Note
Must be implemented by the child class
- Returns:
Uninitalised pattern to use to find location of variable
- Return type:
- class pyearthtools.data.patterns.DirectFactory(*args, temporal=False, variable=False, forecast=False, **kwargs)#
Create an Direct pattern based on the requirements
- Parameters:
temporal (bool, optional) – Temporally aware, exclusive with
forecast, allows for.seriesoperations. Defaults to False.variable (bool, optional) – Variable aware, splits variables when loading and saving. Defaults to False.
forecast (bool, optional) – Forecast product, exclusive with
temporal, provides.seriesbut with forecasts. Defaults to False.args (Any)
- Raises:
ValueError – If both
temporalandforecastset. Cannot be both.- Returns:
Created
_Directpattern.- Return type:
(_Direct)
- class pyearthtools.data.patterns.ExpandedDate(root_dir, *, extension='.pyearthtools', prefix=None, delimiter='', file_resolution='minute', directory_resolution='day', **kwargs)#
Generate FilePath Structure based upon expanded date pattern
Examples
>>> pattern = pyearthtools.data.patterns.ExpandedDate('/dir/', extension = '.nc') >>> str(pattern.search('2020-01-02T0030')) '/dir/2020/01/02/20200102T0030.nc' >>> pattern = pyearthtools.data.patterns.ExpandedDate('/dir/', extension = '.nc', deliminator = ('#', None)) >>> str(pattern.search('2020-01-02T0030')) '/dir/2020/01/02/2020#01#02T00:30.nc'
Expanded Date based DataIndex
- Parameters:
root_dir (str | Path) – Root Path to use
extension (str, optional) – File extension to load. Defaults to ‘data.patterns.default_extension’
prefix (None | str, optional) – File prefix to add. Defaults to None.
delimiter (str | list[str | None] | tuple[str | None], optional) – str/s to seperate time values with. If iterable, First element used to replace ‘-’ in date, and second ‘:’ in time’. Can set either element to None to not replace. Defaults to “”
file_resolution (str | TimeResolution, optional) – Resolution of the files. Defaults to ‘minute’.
directory_resolution (str | TimeResolution, optional) – Resolution of directories. Defaults to ‘day’.
kwargs (Any, optional) – Kwargs passed to PatternIndex
- factory(*, temporal=False, variable=False, forecast=False, **kwargs)#
Create an ExpandedDate pattern based on the requirements
- Parameters:
temporal (bool, optional) – Temporally aware, exclusive with
forecast, allows for.seriesoperations. Defaults to False.variable (bool, optional) – Variable aware, splits variables when loading and saving. Defaults to False.
forecast (bool, optional) – Forecast product, exclusive with
temporal, provides.seriesbut with forecasts. Defaults to False.args (Any)
- Raises:
ValueError – If both
temporalandforecastset. Cannot be both.- Returns:
Created
_ExpandedDatepattern.- Return type:
(_ExpandedDate)
- to_temporal(data_interval)#
Get pattern as
TemporalExpandedDate- Parameters:
data_interval (tuple[int, str] | int | str)
- Return type:
- class pyearthtools.data.patterns.TemporalExpandedDate(*args, **kwargs)#
ExpandedDate PatternIndex which is also a AdvancedTimeIndex
Will create its path using the
data_intervalif set.If using this with data saved using
ExpandedDate, setdata_intervalto (1, ‘min’), the paths will match.Examples
>>> pattern = pyearthtools.data.patterns.TemporalExpandedDate('/dir/', extension = '.nc', data_interval = (1, 'month')) >>> str(pattern.search('2020-01-02')) '/dir/2020/01/202001.nc' >>> str(pattern.search('2020-01')) '/dir/2020/01/202001.nc'
Expanded Date based DataIndex
- Parameters:
root_dir (str | Path) – Root Path to use
extension (str, optional) – File extension to load. Defaults to ‘data.patterns.default_extension’
prefix (None | str, optional) – File prefix to add. Defaults to None.
delimiter (str | list[str | None] | tuple[str | None], optional) – str/s to seperate time values with. If iterable, First element used to replace ‘-’ in date, and second ‘:’ in time’. Can set either element to None to not replace. Defaults to “”
file_resolution (str | TimeResolution, optional) – Resolution of the files. Defaults to ‘minute’.
directory_resolution (str | TimeResolution, optional) – Resolution of directories. Defaults to ‘day’.
kwargs (Any, optional) – Kwargs passed to PatternIndex
- class pyearthtools.data.patterns.ForecastExpandedDate(root_dir, *, extension='.pyearthtools', prefix=None, delimiter='', file_resolution='minute', directory_resolution='day', **kwargs)#
ExpandedDate PatternIndex which is also a ForecastIndex
Expanded Date based DataIndex
- Parameters:
root_dir (str | Path) – Root Path to use
extension (str, optional) – File extension to load. Defaults to ‘data.patterns.default_extension’
prefix (None | str, optional) – File prefix to add. Defaults to None.
delimiter (str | list[str | None] | tuple[str | None], optional) – str/s to seperate time values with. If iterable, First element used to replace ‘-’ in date, and second ‘:’ in time’. Can set either element to None to not replace. Defaults to “”
file_resolution (str | TimeResolution, optional) – Resolution of the files. Defaults to ‘minute’.
directory_resolution (str | TimeResolution, optional) – Resolution of directories. Defaults to ‘day’.
kwargs (Any, optional) – Kwargs passed to PatternIndex
- class pyearthtools.data.patterns.ExpandedDateVariable(variables=None, *args, variable_parse=None, verbose=True, **kwargs)#
ExpandedDate pattern which is variable aware
Will split each variable into a seperate file, using the variable as another layer in the root_dir
Examples
>>> expanded_var = ExpandedDateVariable(root_dir = '/test/', variables = 'variable', extension = 'nc') >>> str(expanded_var.search('2020-01-02')) {'variable' : '/test/variable/2020/01/02/20200102T0000.nc'}
>>> expanded_var = ExpandedDateVariable(root_dir = '/test/', variables = 'variable', extension = 'nc', data_interval = (1,'year'), variable_parse = 'prefix') >>> str(expanded_var.search('2020-01')) {'variable' : '/test/2020/01/02/variable_20200102T0000.nc'}
Construct a variable aware pattern.
- Parameters:
variables (str | list[str], optional) – Variables to find by default. When saving, variables will be appended to this list. If not given, can’t really be used to load data, but is useful for saving data. Defaults to []
variable_parse (str | list[str], optional) – Initalisation argument/s to add variable to when constructing the new pattern. Defaults to ‘prefix’ if
default_variable_parsenot set.verbose (bool, optional) – Whether to warn the user if a variable is saved and not already given. Defaults to True.
- !!! Note
variable_parsecan be used to reference many different types, and the following table details the behaviour. | Type | Behaviour | | —- | ——— | | Path | Added to the end as directory layer | | str | Attempt to parse to Path, or just append | | None | Replaced | | list | Appended |
- property root_pattern: type[ExpandedDate]#
Get pattern for finding/saving a specific variable
- !!! Note
Must be implemented by the child class
- Returns:
Uninitalised pattern to use to find location of variable
- Return type:
- class pyearthtools.data.patterns.ForecastExpandedDateVariable(variables=None, *args, variable_parse=None, verbose=True, **kwargs)#
ForecastExpandedDate pattern which is variable aware and retrieves Forecasts
Will split each variable into a separate file, using the variable as another layer in the root_dir
Examples
>>> expanded_var = ForecastExpandedDateVariable(root_dir = '/test/', variables = 'variable', extension = 'nc') >>> str(expanded_var.search('2020-01-02')) {'variable' : '/test/variable/2020/01/02/20200102T0000.nc'}
>>> expanded_var = ForecastExpandedDateVariable(root_dir = '/test/', variables = 'variable', extension = 'nc', data_interval = (1,'year'), variable_parse = 'prefix') >>> str(expanded_var.search('2020-01')) {'variable' : '/test/2020/01/02/variable_20200102T0000.nc'}
Construct a variable aware pattern.
- Parameters:
variables (str | list[str], optional) – Variables to find by default. When saving, variables will be appended to this list. If not given, can’t really be used to load data, but is useful for saving data. Defaults to []
variable_parse (str | list[str], optional) – Initalisation argument/s to add variable to when constructing the new pattern. Defaults to ‘prefix’ if
default_variable_parsenot set.verbose (bool, optional) – Whether to warn the user if a variable is saved and not already given. Defaults to True.
- !!! Note
variable_parsecan be used to reference many different types, and the following table details the behaviour. | Type | Behaviour | | —- | ——— | | Path | Added to the end as directory layer | | str | Attempt to parse to Path, or just append | | None | Replaced | | list | Appended |
- property root_pattern: type[ForecastExpandedDate]#
Get pattern for finding/saving a specific variable
- !!! Note
Must be implemented by the child class
- Returns:
Uninitalised pattern to use to find location of variable
- Return type:
- class pyearthtools.data.patterns.TemporalExpandedDateVariable(variables=None, *args, variable_parse=None, verbose=True, **kwargs)#
TemporalExpandedDate pattern which is variable aware
Will split each variable into a seperate file, using the variable as another layer in the root_dir
Examples
>>> expanded_var = TemporalExpandedDateVariable(root_dir = '/test/', variables = 'variable', extension = 'nc', data_interval = (1,'year')) >>> str(expanded_var.search('2020-01')) {'variable' : '/test/variable/2020/2020.nc'}
>>> expanded_var = TemporalExpandedDateVariable(root_dir = '/test/', variables = 'variable', extension = 'nc', data_interval = (1,'year'), variable_parse = 'prefix') >>> str(expanded_var.search('2020-01')) {'variable' : '/test/2020/variable_2020.nc'}
Construct a variable aware pattern.
- Parameters:
variables (str | list[str], optional) – Variables to find by default. When saving, variables will be appended to this list. If not given, can’t really be used to load data, but is useful for saving data. Defaults to []
variable_parse (str | list[str], optional) – Initalisation argument/s to add variable to when constructing the new pattern. Defaults to ‘prefix’ if
default_variable_parsenot set.verbose (bool, optional) – Whether to warn the user if a variable is saved and not already given. Defaults to True.
- !!! Note
variable_parsecan be used to reference many different types, and the following table details the behaviour. | Type | Behaviour | | —- | ——— | | Path | Added to the end as directory layer | | str | Attempt to parse to Path, or just append | | None | Replaced | | list | Appended |
- property root_pattern: type[TemporalExpandedDate]#
Get pattern for finding/saving a specific variable
- !!! Note
Must be implemented by the child class
- Returns:
Uninitalised pattern to use to find location of variable
- Return type:
- class pyearthtools.data.patterns.ExpandedDateFactory(*args, temporal=False, variable=False, forecast=False, **kwargs)#
Create an ExpandedDate pattern based on the requirements
- Parameters:
temporal (bool, optional) – Temporally aware, exclusive with
forecast, allows for.seriesoperations. Defaults to False.variable (bool, optional) – Variable aware, splits variables when loading and saving. Defaults to False.
forecast (bool, optional) – Forecast product, exclusive with
temporal, provides.seriesbut with forecasts. Defaults to False.args (Any)
- Raises:
ValueError – If both
temporalandforecastset. Cannot be both.- Returns:
Created
_ExpandedDatepattern.- Return type:
(_ExpandedDate)
- class pyearthtools.data.patterns.Static(file, variables=None, *, enforce_existence=True, capture_arguments=False, transforms=TransformCollection Initialisation A Collection of Transforms to be applied to Data apply_default False intelligence_level 100 Transforms, **load_kwargs)#
Retrieve Static File for any date retrieval
Static File based data index
- Parameters:
file (str | Path) – File to load
variables (str | list[str], optional) – Variables to trim loaded data to. Defaults to None.
enforce_existence (bool, optional) – Enforce that
fileexists. Defaults to True.capture_arguments (bool, optional) – Capture arguments given to retrieval without throwing an error. Defaults to False.
transforms (Transform | TransformCollection, optional) – Base Transforms to apply. Defaults to TransformCollection().
load_kwargs (dict)
- Raises:
FileNotFoundError – If File not found
Examples
>>> pattern = pyearthtools.data.patterns.Static('/dir/file.nc', enforce_existence = False) >>> str(pattern.search()) '/dir/file.nc'
- filesystem(*args, **kwargs)#
Find datafiles given args on local filesystem.
Must be implemented by child class to specify data.
Can return a dictionary[str, str], tuple, list or path representing the files to load.
- load(*args, **kwargs)#
Load a given list of files.
Automatically determine method to load files for file extension
- Supported:
netcdf
pandas [csv]
numpy
- Parameters:
files (dict[str, str | Path] | Path | list[str | Path] | tuple[str | Path]) – Files to load
**kwargs (Any, optional) – Kwargs passed to underlying loading function
- Raises:
InvalidDataError – If an error arose when loading file
- Returns:
Loaded data
- Return type:
(Any)
- class pyearthtools.data.patterns.ParsingPattern(root_dir, parse_str, *, transforms=TransformCollection Initialisation A Collection of Transforms to be applied to Data apply_default False intelligence_level 100 Transforms, add_default_transforms=True, preprocess_transforms=None, **kwargs)#
PatternIndex to parse and format paths from str formats.
Values for the formatting are expected in kwargs / if data is saved will be added.
Will split datasets based on what is specified in the
parse_str. If a kwarg is given as a list, will look for all perturbations.Create pattern from a formatting string
If being used to retrieve data without saving it first, set values in
parse_strthroughkwargsor when usingsearch.- Parameters:
root_dir (str) – Root directory to begin the path, can be ‘temp’ for temp directory.
parse_str (str) – str to parse to find paths. Use ‘variable’ for data vars E.g. ‘{level}/{variable}/{time:%Y%M}’.
transforms (Transform | TransformCollection, optional) – Transforms to add on retrieval. Defaults to TransformCollection().
add_default_transforms (bool, optional) – Whether to add default transforms. Defaults to True.
preprocess_transforms (Transform | TransformCollection | Callable | None, optional) – Transforms to always add. Defaults to None.
kwargs (Any, optional) – Any values to fill
parse_strwith, if given as a list, will look for all perturbations.
Examples
>>> pattern = ParsingPattern('temp', '{level:04d}.nc', level = 10) >>> pattern.search() [PosixPath('/temp/0010.nc')] >>> pattern = ParsingPattern('temp', '{level:04d}.nc', level = [10,20]) >>> pattern.search() [PosixPath('/temp/0010.nc'), PosixPath('/temp/0020.nc')] >>> pattern = ParsingPattern('temp', '{time:%Y}.nc') >>> pattern.save(data)
- filesystem(options=None, **kwargs)#
Get all paths from this Index
- Parameters:
**kwargs (Any) – Extra options to provide to the parser
options (dict[str, Any] | None)
**kwargs
- Return type:
list[Path]
- get(*args, load_kwargs=None, **kwargs)#
Get data by loading it from the search.
All args & kwargs are passed through to search to allow extra supply of format values
- Parameters:
load_kwargs (dict[str, Any] | None, optional) – kwargs to pass to the
.loadfunction. Defaults to None.- Raises:
DataNotFoundError – Data could not be found
- Returns:
Loaded Data
- Return type:
(Any)
- save(data, *_)#
Save
datawith this pattern.Will split the dataset according to what is given in
parse_str.- E.g.
If data contains a
levelcoord, andlevelis inparse_str, the data will be split accordingly.
- Parameters:
data (xr.Dataset | xr.DataArray) – Dataset to save
- Raises:
KeyError – If
variableis being split on, and not axr.Dataset.
data.save#
- pyearthtools.data.save.save(data, callback, *args, save_kwargs={}, **kwargs)#
Save data at location specified by an Index
Automatically inferes to how to save data based on the type
Uses args and kwargs in
callback.searchto find path- Parameters:
data (Any) – Data to be saved
callback (FileSystemIndex) – FileSystemIndex to use to discover where to save data
*args (Any, optional) – Arguments to be passed to
callback.searchto find file pathsave_kwargs (dict, optional) – Kwargs to pass to underlying save function
*kwargs (Any, optional) – Keyword arguments to be passed to
callback.searchto find file path
- Raises:
TypeError – If type that is not known is passed
- Returns:
Location where data was saved
- Return type:
(Path)
- class pyearthtools.data.save.ManageFiles(files, timeout=5, *, lock=True, uuid=False, prefix='.tmp')#
Automatically manage the saving of files.
Using this, representative temporary files are provided to save to, and then automatically renamed.
If
lock== True, prevent multiple processes from writing to the same temp files by creating lock files, and checking for their existence.If a lock file is encountered, and after it’s removal the real file exists, the user is informed, as this data may have been saved by another process running concurrently, and may not need to be saved again.
Example:
>>> with ManageFiles('important_file.txt') as (filename, _): >>> print(filename) # '.tmp_important_file.txt' >>> with open(filename, 'w') as fd: >>> fd.write('42') >>> print(os.path.exists('important_file.txt')) ... True >>> print(os.open('important_file.txt').read()) ... 42
Manage the saving of files. Save to temp file first, and lock that file.
- Parameters:
files (VALID_PATH_TYPES) – Files for this to manage. Will return temporary files representing each file, in the same type.
timeout (float | int) – Max time waiting for lock release can take, in seconds.
timeout< 0, will not timeout and simply block until release.lock (bool) – Attempt to lock temp files when saving. Mutually exclusive with
uuid. This allows the logic checking if the temp file was locked, and now the real file exists thus potentially indicating it has been made by a concurrent thread. Iflockis False, this behaves exactly likeManageTemp, and always returnsexist= False.uuid (bool) – Add unique identifier to temp files. Mutually exclusive with
lock.prefix (str) – Prefix to add to indicate temp file.
- check_if_locked()#
Check if data is locked
- Return type:
bool
- class pyearthtools.data.save.ManageTemp(files, uuid=False, prefix='.tmp')#
Manage the saving to provide temporary files and when used as a context manager, automatically renamed to real files.
Can be used as not a Context manager with calls to
.temp_filesand.rename.Example:
>>> with ManageTemp('important_file.txt') as (filename, _): >>> print(filename) # '.tmp_important_file.txt' >>> with open(filename, 'w') as fd: >>> fd.write('42') >>> print(os.path.exists('important_file.txt')) True >>> print(os.open('important_file.txt').read()) 42
This differs from
ManageFilesas this does not provide a locking functionality, and this is therefore not Thread safe.Temp files may not exist when they should, as another thread may have already renamed it.
Create a temporary file manager
Automatically creates temp file names next to the real ones for saving. Upon exit, or
.renamecall, these are renamed to the real files.- Parameters:
files (VALID_PATH_TYPES) – Real files to manage and make temp files for
uuid (bool, optional) – Add a unique identifier to each temp file name. Defaults to False.
prefix (str, optional) – Prefix to add to indicate temp file. Defaults to ‘.tmp’.
- exists()#
Check if temporary files exist
- Return type:
bool
- property real_files#
Real files being used
- remove()#
Remove temporary files if they exist
- rename()#
Rename temporary files to real ones
- Raises:
FileNotFoundError – If temporary files do not exist
TypeError – If paths cannot be renamed.
- property temp_files#
Temporary files being managed by this object.
Is the exact same type form as the input
files.
- pyearthtools.data.save.array.save(dataarray, callback, *args, save_kwargs={}, try_thread_safe=True, **kwargs)#
- Parameters:
dataarray (ndarray)
callback (FileSystemIndex)
save_kwargs (dict[str, Any])
try_thread_safe (bool)
- pyearthtools.data.save.dask.save(dataarray, *args, **kwargs)#
- Parameters:
dataarray (Array)
- pyearthtools.data.save.dataset.save(dataset, callback, *args, zarr=None, save_kwargs, **kwargs)#
- Parameters:
zarr (bool | None)
save_kwargs (dict[str, Any])
- pyearthtools.data.save.dataset.to_netcdf(dataset, callback, *args, save_kwargs=None, try_thread_safe=True, **kwargs)#
Saves a dataset based on a callback to an index.
- Parameters:
dataset (tuple[Dataset] | DataArray | Dataset) – The xarray object to convert to netcdf
callback (FileSystemIndex) – Uses
callback.search()to fetch a Path, str, or dictionary of either. If a dictionary is returned, will only save dataset, and will only save specified keys.save_kwargs (dict[str, Any] | None)
try_thread_safe (bool)
- pyearthtools.data.save.dataset.to_zarr(dataset, callback, *args, save_kwargs=None, **kwargs)#
- Parameters:
dataset (DataArray | Dataset)
callback (FileSystemIndex)
save_kwargs (dict[str, Any] | None)
- pyearthtools.data.save.jsonsave.save(data, callback, *args, save_kwargs={}, **kwargs)#
Save json files
- Parameters:
data (dict)
callback (FileSystemIndex)
save_kwargs (dict[str, Any])
- pyearthtools.data.save.plot.save(plot, callback, *args, save_kwargs={}, **kwargs)#
Save plot objects
- Parameters:
callback (FileSystemIndex)
save_kwargs (dict)
- pyearthtools.data.save.save_utils.check_if_exists(path)#
Check if
path/s exist- Parameters:
path (VALID_PATH_TYPES) – Path/s to check existence of
- Returns:
Path/s existence
- Return type:
(bool)
- pyearthtools.data.save.save_utils.make_new_filename(path, *, add_uuid=False, prefix='.tmp', remove_suffix=False)#
Create temporary files using given path/s
Adds
prefixto all paths, and ifadd_uuidadds a unique identifier. Can also strip suffix.- Parameters:
path (VALID_PATH_TYPES) – Path/s to create tmp files of
add_uuid (bool, optional) – Whether to add a unique identifier. Defaults to False.
prefix (str, optional) – Prefix to indicate temporary file to add. Defaults to ‘.tmp’
remove_suffix (bool, optional) – Whether to remove the suffix when creating new file name.
- Returns:
pathwith temporary flags added to it. Is the exact same type as input.- Return type:
(VALID_PATH_TYPES)
- class pyearthtools.data.save.save_utils.keep_clear(path, enter=True, exit=True)#
Keep a given path clear
Delete paths upon entrance and/or exit (this is a fully-qualified path/filename) Basically useful for temporary files with known names that can be deleted if they’re already there
- Parameters:
path (VALID_PATH_TYPES) – Path/s to delete if they exist upon entrance or exit
enter (bool, optional) – Delete on entrance. Defaults to True.
exit (bool, optional) – Delete on exit. Defaults to True.
- delete()#
Delete given files if they exist
data.transforms#
- class pyearthtools.data.transforms.Transform(docstring=None)#
Base Transform Class to obfuscate a transform process.
A child class must implement
.apply(self, dataset: xr.Dataset), and.info.When using this transform, simply call it like a function. Can also add another transform to this.
Initalise root
TransformclassCannot be used as is, a child must implement the
.applyfunction.- Parameters:
docstring (str, optional) – Docstring to set this
Transformto. Defaults to None.- Raises:
TypeError – If cannot parse
docstring
- abstractmethod apply(dataset)#
Apply transformation to Dataset
- Parameters:
dataset (XR_TYPES) – Dataset to apply transform to
- Raises:
NotImplementedError – Base Transform does not implement this function
- Returns:
Transformed Dataset
- Return type:
XR_TYPES
- class pyearthtools.data.transforms.TransformCollection(*transforms, apply_default=False, intelligence_level=100)#
A Collection of Transforms to be applied to Data
Can be added to or appended to & called to apply all transforms in order.
Setup new TransformCollection
- Parameters:
*transforms (Transform | TransformCollection, Callable | None | list) – Transforms to include
apply_default (bool, optional) – Apply default transforms. Defaults to False.
intelligence_level (int, optional) – Intelligence level of default transforms. Defaults to 100.
- append(transform)#
Append a transform/s to the collection
- Parameters:
transform (list | FunctionType | Transform | TransformCollection) – Transform/s to add
- Raises:
TypeError – If transform cannot be understood
- apply(dataset)#
Apply Transforms to a Dataset
- Parameters:
dataset (xr.Dataset) – Dataset to apply transforms to
- Returns:
Same as input type with transforms applied
- Return type:
(Any)
- pop(index=-1)#
Remove and return item at index (default last). Raises IndexError if list is empty or index is out of range.
- Parameters:
index (int, optional) – Index to pop from list at. Defaults to -1.
- Returns:
Transform popped out
- Return type:
- remove(key)#
Remove first occurrence of value.
- Parameters:
key (type | str | Transform) – Key to search for
- Raises:
ValueError – If the value is not present.
- to_repr_dict()#
Convert to dictionary ready for repr
- class pyearthtools.data.transforms.FunctionTransform(function)#
Transform Function which applies a given function
Transform Function to apply a user given function
- Parameters:
function (Callable) – User given function to apply
- apply(dataset)#
Apply transformation to Dataset
- Parameters:
dataset (XR_TYPES) – Dataset to apply transform to
- Raises:
NotImplementedError – Base Transform does not implement this function
- Returns:
Transformed Dataset
- Return type:
XR_TYPES
- class pyearthtools.data.transforms.Derive(equation=None, drop=False, **equations)#
Derive new variables
Derive new variables from a dataset using provided equations.
Allows other variables to be used by indicating only their name, and than evaluated accordingly.
Each numerical or reference component in an equation must be seperated by a space.
If using function based symbols like ‘sqrt’ or ‘sin’, the next item will be evaluated using said function. These functions can be given with brackets next to them.
Without brackets given, the equation will be evaluated left -> right.
- E.g.
` equation = {'new_variable' : 'old_variable_1 * old_variable_2'} equation = {'new_variable' : 'sqrt(old_variable_1 * old_variable_2)'} `- !!! Warn
This will evaluate an equation left -> right, but respects brackets.
‘var_1 + 9.8 * var_2’ != ‘var_1 + (9.8 * var_2)’
- !!! Warning
Components of an equation should be split by ‘ ‘, a whitespace.
` a*b # Bad a * b # Good `
- Parameters:
equation (dict[str, str | tuple[str, dict[str, str]]] | None, optional) – Equation configuration. If str, equation is evaluated. If tuple, first element is assumed to be equation, and the second a dictionary to update the new vars attributes with. Defaults to None.
drop (bool, optional) – Drop variables used in the calculation. Can be overwritten per equation, by setting
dropin attributes dictionary. Defaults to False.**equations (dict[str, str | tuple[str, dict[str, str]]], optional) – Keyword arg form of
equation.
- Raises:
EquationException – If equation cannot be parsed
- Returns:
Transform to apply derivation.
- Return type:
Examples
>>> derive(new_variable = 'old_variable_1 * old_variable_2', drop = True) # Create a `new_variable` as the product of the old two. >>> derive(new_variable = 'old_variable_1 * 9.8', drop = True) # Scale `old_variable_1` by 9.8 >>> derive(new_variable = ('old_variable_1 * 9.8', {'long_name': 'Scaled old_variable_1'}, drop = False) # Scale `old_variable_1` by 9.8, and update the `long_name` to be 'Scaled old_variable_1', leaving the old var there >>> derive(new_variable = 'old_variable_1 - old_variable_2 * 9.8', drop = True) # Set `new_variable` as the difference scaled by 9.8. In effect acts as (old_variable_1 - old_variable_2) * 9.8 >>> derive(new_variable = 'old_variable_1 - (9.8 * old_variable_2)', drop = True) # Multiply `old_variable_2` by 9.8 and than find difference with `old_variable_1` >>> derive(new_var == 'sqrt(old_var)') # Square root of the old variable
- apply(dataset)#
Apply transformation to Dataset
- Parameters:
dataset (XR_TYPES) – Dataset to apply transform to
- Raises:
NotImplementedError – Base Transform does not implement this function
- Returns:
Transformed Dataset
- Return type:
XR_TYPES
- pyearthtools.data.transforms.aggregation.over(*, dimension, method)#
Get Aggregation Transform to run aggregation method over given dimensions
- Parameters:
method (Callable | str | dict) – Method to use, can be known method or user defined
dimension (str | list[str]) – Dimensions to run aggregation over
- Returns:
Transform to apply aggregation
- Return type:
- pyearthtools.data.transforms.aggregation.leaving(method, dimension)#
Get Aggregation Transform to run aggregation method leaving only given dimensions
- Parameters:
method (Callable | str | dict) – Method to use, can be known method or user defined
dimension (str | list[str]) – Dimensions to leave after aggregation
- Returns:
Transform to apply aggregation
- Return type:
- class pyearthtools.data.transforms.aggregation.Aggregate(method, reduce_dims=None, keep_dims=None)#
Aggregation Transforms,
Initalise root
TransformclassCannot be used as is, a child must implement the
.applyfunction.- Parameters:
docstring (str, optional) – Docstring to set this
Transformto. Defaults to None.method (Callable | str | dict[str, Callable | str])
reduce_dims (Optional[list[str] | str])
keep_dims (Optional[list[str] | str])
- Raises:
TypeError – If cannot parse
docstring
- apply(dataset, **kwargs)#
Apply Aggregation to Dataset
- Parameters:
dataset (xr.Dataset) – Dataset to apply aggregation to
method (Callable | str) – Method of aggregation, either func or string
dimension (str | list[str]) – Dimension to apply aggregation on
- Returns:
Aggregated Dataset
- Return type:
(xr.Dataset)
- class pyearthtools.data.transforms.attributes.SetAttributes(attrs=None, reference=None, apply_on='dataset', **attributes)#
Set Attributes
Modify Attributes to a dataset
- Parameters:
attrs (dict[str, Any] | None) – Attributes to set, key: value pairs. Set
apply_onto choose where attributes are applied. | Key | Description | | — | ———– | | dataset | Attributes updated on dataset | | dataarray | If applied on a dataset, update each dataarray inside the dataset | | both | Do both above | | per_variable | Treatattrsas a dictionary of dictionaries, applying on dataarray if in dataset. | Defaults to None.apply_on (Literal['dataset', 'dataarray', 'both'], optional) – On what type to update attributes. Defaults to ‘dataset’.
**attributes (dict) – Keyword argument form of
attrs.reference (xr.DataArray | xr.Dataset | None)
- Returns:
Transform to set attributes
- Return type:
- apply(data_obj)#
Apply transformation to Dataset
- Parameters:
dataset (XR_TYPES) – Dataset to apply transform to
- Raises:
NotImplementedError – Base Transform does not implement this function
- Returns:
Transformed Dataset
- Return type:
XR_TYPES
- class pyearthtools.data.transforms.attributes.SetEncoding(encoding=None, reference=None, limit=None, **variables)#
Set Encoding
Set encoding of a dataset.
Can get encoding from a reference dataset. That dataset is then not used, as the encoding has already been retrieved.
- Parameters:
encoding (dict[str, dict[str, Any]] | None) – Variable value pairs assigning encoding to the given variable. Can set key to ‘all’ to apply to all variables. Defaults to None.
reference (xr.DataArray | xr.Dataset | None, optional) – Reference object to retrieve and update encoding from. Defaults to None.
limit (list[str] | None, optional) – When getting encoding from
referenceobject, limit the retrieved encoding. If not given will get['units', 'dtype', 'calendar', '_FillValue', 'scale_factor', 'add_offset', 'missing_value']. Defaults to None.**variables (dict) – Keyword argument form of
encoding
- apply(dataset)#
Apply transformation to Dataset
- Parameters:
dataset (XR_TYPES) – Dataset to apply transform to
- Raises:
NotImplementedError – Base Transform does not implement this function
- Returns:
Transformed Dataset
- Return type:
XR_TYPES
- class pyearthtools.data.transforms.attributes.SetType(dtype=None, **variables)#
Set type of variables
Set type of variables/coordinates.
At least
dtypeorvariablesmust be set.Applies “same_kind” casting
- Parameters:
dtype (str | dict[str, str] | None) – Datatype to set to. If only
dtypeis given, this will set all coordinates of the dataset to thisdtype. Defaults to None.**variables (Any, optional) – Variable dtype configuration.
- apply(dataset)#
Apply transformation to Dataset
- Parameters:
dataset (XR_TYPES) – Dataset to apply transform to
- Raises:
NotImplementedError – Base Transform does not implement this function
- Returns:
Transformed Dataset
- Return type:
XR_TYPES
- class pyearthtools.data.transforms.attributes.Rename(names=None, **extra_names)#
Rename Components inside Dataset
Rename Dataset components
- Parameters:
names (dict[str, Any] | None) – Dictionary assigning name replacements [old: new] Defaults to None.
**extra_names (Any, optional) – Keyword args form of
names.
- apply(dataset)#
Apply transformation to Dataset
- Parameters:
dataset (XR_TYPES) – Dataset to apply transform to
- Raises:
NotImplementedError – Base Transform does not implement this function
- Returns:
Transformed Dataset
- Return type:
XR_TYPES
- pyearthtools.data.transforms.coordinates.get_longitude(data, transform=True)#
From a given data source, attempt to identify the orientation of the
longitudecoordinate.Either ‘0-360’ or ‘-180-180’
- Parameters:
data (xr.Dataset | xr.DataArray) – Data to check
transform (bool, optional) – Whether to return a
Transformto set to the same orientation. Defaults to True.
- Raises:
ValueError – If unable to identify the
longitudecoordinate orientation- Returns:
Either str of orientation or Transform to set longitude of a data source to the same as
dataDepends ontransformbool state.- Return type:
(str | Transform)
- class pyearthtools.data.transforms.coordinates.StandardLongitude(type='-180-180', longitude_name='longitude')#
Standardise format of longitude.
Standardise format of longitude.
Shifts the longitude coordinate to that of the specified. Must be in [“-180-180”, “0-360”]
- Parameters:
type (VALID_COORDINATE_DEFINITIONS) – Longitude Specification. Defaults to “-180-180”.
- apply(dataset)#
Apply transformation to Dataset
- Parameters:
dataset (XR_TYPES) – Dataset to apply transform to
- Raises:
NotImplementedError – Base Transform does not implement this function
- Returns:
Transformed Dataset
- Return type:
XR_TYPES
- class pyearthtools.data.transforms.coordinates.ReIndex(coordinates=None, **coords)#
Reindex Coordinates
Reindex coordinates
Can be sorted, or in set list
- Parameters:
coordinates (dict[str, Literal['reversed','sorted'] | Iterable | xr.Coordinates] | None, optional) – Coordinate to reindex, and Iterable to reindex at. If ‘reversed’ or ‘sorted’, take current coord and sort. If
xr.Coordinates, use any coordinates with len > 1. Defaults to None.
- apply(dataset)#
Apply transformation to Dataset
- Parameters:
dataset (XR_TYPES) – Dataset to apply transform to
- Raises:
NotImplementedError – Base Transform does not implement this function
- Returns:
Transformed Dataset
- Return type:
XR_TYPES
- class pyearthtools.data.transforms.coordinates.StandardCoordinateNames(replacement_dictionary=None, **repl_kwargs)#
Convert xr.Dataset Coordinate Names into Standard Naming Scheme
Convert xr.Dataset Coordinate Names into Standard Naming Scheme
- Parameters:
replacement_dictionary (dict | None, optional) – Dictionary assigning name replacements [old: new]. One of replacement_dictionary or repl_kwargs must be provided. Defaults to None.
**repl_kwargs (dict, optional) – Kwarg version of replacement_dictionary
- apply(dataset)#
Apply transformation to Dataset
- Parameters:
dataset (XR_TYPES) – Dataset to apply transform to
- Raises:
NotImplementedError – Base Transform does not implement this function
- Returns:
Transformed Dataset
- Return type:
XR_TYPES
- class pyearthtools.data.transforms.coordinates.Select(indexers=None, *, ignore_missing=False, tolerance=None, isel=False, **indexers_kwargs)#
Select on Coordinates
Select values on coordinates
- Parameters:
indexers (dict[str, Any] | None, optional) – A dict with keys matching dimensions and values One of indexers or indexers_kwargs must be provided. Defaults to None.
**indexers_kwargs (dict) – Index keyword arguments
ignore_missing (bool, optional) – Ignore coordinates not in dataset. Defaults to False
tolerance (float | None, optional) – Tolerance for selection. Defaults to None.
isel (bool, optional) – Whether to use isel. Defaults to False.
- apply(dataset)#
Apply transformation to Dataset
- Parameters:
dataset (XR_TYPES) – Dataset to apply transform to
- Raises:
NotImplementedError – Base Transform does not implement this function
- Returns:
Transformed Dataset
- Return type:
XR_TYPES
- class pyearthtools.data.transforms.coordinates.Drop(coordinates=None, *extra_coords, ignore_missing=False)#
Drop items from Dataset
Drop Items from xr.Dataset
- Parameters:
coordinates (list[Hashable] | tuple[Hashable] | Hashable | None) – Coordinates to drop. Defaults to None.
ignore_missing (bool, optional) – Ignore coordinates not in dataset. Defaults to False
extra_coords (Hashable)
- Returns:
Transform to apply drop
- Return type:
- apply(dataset)#
Apply transformation to Dataset
- Parameters:
dataset (XR_TYPES) – Dataset to apply transform to
- Raises:
NotImplementedError – Base Transform does not implement this function
- Returns:
Transformed Dataset
- Return type:
XR_TYPES
- class pyearthtools.data.transforms.coordinates.Assign(coordinates=None, as_dataarray=False, **coordinate_kwargs)#
Assign coordinates to object
Assign coordinates to Xarray Object.
Uses
.assign_coords- Parameters:
coordinates (dict[str, Any] | None, optional) – Coordinates to assign. Defaults to None.
as_dataarray (bool, optional) – Assign coordinates seperately to each variable. Defaults to False.
- apply(dataset)#
Apply transformation to Dataset
- Parameters:
dataset (XR_TYPES) – Dataset to apply transform to
- Raises:
NotImplementedError – Base Transform does not implement this function
- Returns:
Transformed Dataset
- Return type:
XR_TYPES
- class pyearthtools.data.transforms.coordinates.Pad(coordinates=None, **kwargs)#
Pad data
Pad data
This will automatically pad the coordinate values with an odd reflection to allow periodicy.
- Parameters:
coordinates (dict[str, Any] | None) – Coordinate pad_width. From xarray docs: Mapping with the form of
{dim: (pad_before, pad_after)}describing the number of values padded along each dimension.{dim: pad}is a shortcut for pad_before = pad_after = pad**kwargs – Any kwargs to pass to
.pad
- apply(dataset)#
Apply transformation to Dataset
- Parameters:
dataset (XR_TYPES) – Dataset to apply transform to
- Raises:
NotImplementedError – Base Transform does not implement this function
- Returns:
Transformed Dataset
- Return type:
XR_TYPES
- pyearthtools.data.transforms.default.get_default_transforms(intelligence_level=2)#
Get Default Transforms to be applied to all datasets
- Parameters:
intelligence_level (int, optional) – Level of Intelligence in operation. Defaults to 2.
- Returns:
Collection of default transforms
- Return type:
- pyearthtools.data.transforms.derive.evaluate(eq, *, dataset=None)#
- Evaluate a given equation
Use
datasetto set variables.
Each numerical or reference component in an equation must be seperated by a space.
If using function based symbols like ‘sqrt’ or ‘sin’, the next item will be evaluated using said function. These functions can be given with brackets next to them.
Without brackets given, the equation will be evaluated left -> right.
- Parameters:
eq (str) – Equation to solve
dataset (xr.Dataset | None, optional) – Dataset to get variables from. Defaults to None
- Returns:
Result of equation
- Return type:
(xr.DataArray | float)
- pyearthtools.data.transforms.derive.derive_equations(dataset, equation=None, *, drop=False, **equations)#
Derive new variables from specified
equation/s, and set variables in thedatasetaccordingly- Parameters:
dataset (xr.Dataset) – Dataset to get variables from, and to set new ones on
equation (dict[str, str | tuple[str, dict[str, Any]]] | None, optional) – Dictionary of equations, key represents new variable name. Can be tuple to set equation, and attribute update dictionary. Defaults to {}.
drop (bool, optional) – Drop variables used in calculations. Defaults to False.
- Returns:
Dataset with equations applied to it
- Return type:
(xr.Dataset)
- class pyearthtools.data.transforms.dimensions.StandardDimensionNames(replacement_dictionary=None, **kwargs)#
Standardise dimension names
Convert Dataset Dimension Names into Standard Naming Scheme
- Parameters:
replacement_dictionary (dict[Hashable, Hashable]) – Dictionary assigning dimension name replacements [old: new]
kwargs (str)
- Returns:
Transform to replace dimension names
- Return type:
- apply(dataset)#
Apply transformation to Dataset
- Parameters:
dataset (XR_TYPES) – Dataset to apply transform to
- Raises:
NotImplementedError – Base Transform does not implement this function
- Returns:
Transformed Dataset
- Return type:
XR_TYPES
- class pyearthtools.data.transforms.dimensions.Expand(dim=None, axis=None, as_dataarray=True, missing='error', exists='error', **kwargs)#
Expand Dimensions
Expand Dimensions.
Uses
xarray.expand_dims.- Parameters:
dim (list[str] | dict | str | None, optional) – Dimensions to include on the new variable. If provided as str or sequence of str, then dimensions are inserted with length 1. If provided as a dict, then the keys are the new dimensions and the values are either integers (giving the length of the new dimensions) or sequence/ndarray (giving the coordinates of the new dimensions).
axis (int | list[int] | None, optional) – Axis position(s) where new axis is to be inserted (position(s) on the result array). If a sequence of integers is passed, multiple axes are inserted. In this case, dim arguments should be same length list. If axis=None is passed, all the axes will be inserted to the start of the result array.
as_dataarray (bool, optional) – Expand each variable independently. Defaults to True.
missing (Literal['skip','error'], optional) – What to do when a missing
dimis given. Defaults to ‘error’.kwargs (int) – Keywords form of
dim.exists (Literal['skip', 'error'])
- apply(dataset)#
Apply transformation to Dataset
- Parameters:
dataset (XR_TYPES) – Dataset to apply transform to
- Raises:
NotImplementedError – Base Transform does not implement this function
- Returns:
Transformed Dataset
- Return type:
XR_TYPES
- class pyearthtools.data.transforms.interpolation.Interpolate(method='linear', keep_encoding=False, skip_missing=False, pad=False, **kwargs)#
Interpolation Transform
Interpolation Transform passing kwargs
- Parameters:
**kwargs – Kwargs to pass to
xr.interp. Should be variables with new coordinates to interpolate to. e.g.latitude = [-90,-80,...,80,90]method (InterpOptions) – Method to use for interpolate. Defaults to “linear”. Must be one of xarray.interp methods “linear”, “nearest”, “zero”, “slinear”, “quadratic”, “cubic”, “polynomial”, “barycentric”, “krog”, “pchip”, “spline”, “akima”
keep_encoding (bool) – Whether to keep the encoding of the incoming dataset.
skip_missing (bool) – Skip missing dimensions as given in
kwargsbut not in dataset.pad (bool | int) – Whether to pad all coords by 1. If
intsize to pad by.
- Returns:
Transform to interpolate datasets
- Return type:
- apply(dataset)#
Apply transformation to Dataset
- Parameters:
dataset (XR_TYPES) – Dataset to apply transform to
- Raises:
NotImplementedError – Base Transform does not implement this function
- Returns:
Transformed Dataset
- Return type:
XR_TYPES
- class pyearthtools.data.transforms.interpolation.XESMF(reference_dataset=None, method='bilinear', **coords)#
Interpolate using xesmf
Create Transform using xesmf
Either
reference_datasetorcoordsmust be given- Parameters:
reference_dataset (xr.Dataset | None) – Reference Dataset.
**coords – Coordinates to create reference_dataset from. Can be fully created or tuple to use to fill np.arange. Either
lat = (["lat"], np.arange(16, 75, 1.0))orlat = (16, 75, 1.0)method (str) – Interpolation method to use.
- Raises:
ImportError – xesmf could not be imported
KeyError – No arguments given
- apply(dataset)#
Apply transformation to Dataset
- Parameters:
dataset (XR_TYPES) – Dataset to apply transform to
- Raises:
NotImplementedError – Base Transform does not implement this function
- Returns:
Transformed Dataset
- Return type:
XR_TYPES
- class pyearthtools.data.transforms.interpolation.InterpolateNan(dim, method='linear', keep_encoding=False, fill_value='extrapolate', **kwargs)#
Interpolate Nan’s
Interpolate Nan Transform.
Uses
xarray.ds.interpolate_na, see for all kwargs.Automatically reindexes to be monotonic, and reverts before pass back.
- Parameters:
**kwargs (Any) – Kwargs to pass to
xr.interpolate_namethod (InterpOptions, optional) –
Method to use for interpolate. Defaults to “nearest”. Must be one of xarray.interp methods
”linear”, “nearest”, “zero”, “slinear”, “quadratic”, “cubic”, “polynomial”, “barycentric”, “krog”, “pchip”, “spline”, “akima”
keep_encoding (bool, optional) – Whether to keep the encoding of the incoming dataset. Defaults to False.
fill_value (str | None, optional) – See
scipy.interpolate.interp1d.dim (str)
- apply(dataset)#
Apply transformation to Dataset
- Parameters:
dataset (XR_TYPES) – Dataset to apply transform to
- Raises:
NotImplementedError – Base Transform does not implement this function
- Returns:
Transformed Dataset
- Return type:
XR_TYPES
- pyearthtools.data.transforms.interpolation.like(reference_dataset, method='linear', drop_coords=None, pad=False, **kwargs)#
From reference dataset setup interpolation transform
- Parameters:
reference_dataset (xr.Dataset | str) – Dataset to use to set coords. Can be path to dataset to open
method (InterpOptions, optional) – Method to use in interpolation. Defaults to “linear”.
drop_coords (str | list[str], optional) – Coords to drop from reference dataset. Defaults to None.
pad (bool | int, optional) – Whether to pad all coords by 1. If
intsize to pad by. Defaults to False.
- Returns:
Transform to interpolate dataset like reference_dataset
- Return type:
- class pyearthtools.data.transforms.mask.UnderlyingMaskTransform(docstring=None)#
Initalise root
TransformclassCannot be used as is, a child must implement the
.applyfunction.- Parameters:
docstring (str, optional) – Docstring to set this
Transformto. Defaults to None.- Raises:
TypeError – If cannot parse
docstring
- filter(data, value, *, replacement_value=nan, operation='==', **kwargs)#
Run filtering, But if any of the given kwargs are dictionaries retrieve the correct element
Will raise an error if a key is missing from a dictionary when it was present in another
- Parameters:
data (Dataset | DataArray)
value (dict | float | str | Path)
replacement_value (Dataset | ndarray | float | Path | str | dict[str, Any])
operation (Literal['==', '!=', '>', '<', '>=', '<='] | dict[str, ~typing.Literal['==', '!=', '>', '<', '>=', '<=']])
- class pyearthtools.data.transforms.mask.Dataset(value, reference_dataset, operation='==', replacement_value=nan, squeeze='None')#
Mask data using a reference dataset
Will replace data on incoming dataset where condition is met on
reference_dataset- Parameters:
reference_dataset (xr.Dataset | str | dict) – Reference dataset to calculate mask from. Can be dataset, str as Path, or a dictionary referencing incoming data variables containing the prior types.
value (Any, optional) – Value to mask at. Can be array, dataset, string or dictionary. Defaults to np.NaN.
operation (Literal['==', '!=', '>', '<', '>=','<='] | dict, optional) – Criteria to search by. Can be dictionary for dataset keys. Defaults to “==”.
replacement_value (float | str | xr.Dataset | dict, optional) – Value to replace with. Can be str pointing to dataset or dataset itself, or a dictionary. Defaults to np.nan
squeeze (str | list, optional) – Dims to squeeze on reference dataset. Defaults to ‘None’
- Returns:
Transform to apply mask to data
- Return type:
- apply(dataset)#
Apply transformation to Dataset
- Parameters:
dataset (XR_TYPES) – Dataset to apply transform to
- Raises:
NotImplementedError – Base Transform does not implement this function
- Returns:
Transformed Dataset
- Return type:
XR_TYPES
- class pyearthtools.data.transforms.mask.Replace(value, operation='==', replacement_value=nan)#
Replace Values in dataset with replacement_value when matching criteria
- Parameters:
value (dict | float | str) – Value to mask at. Can be array, dataset, string or dictionary. Dictionary refers to variables and values.
operation (Literal['==', '!=', '>', '<', '>=','<='] | dict, optional) – Criteria to search by. Can be dictionary for dataset keys. Defaults to “==”.
replacement_value (float | str | xr.Dataset | dict, optional) – Value to replace with. Can be str pointing to dataset or dataset itself, or a dictionary. Defaults to np.nan
- Raises:
KeyError – If invalid operation is provided
- apply(data)#
Apply transformation to Dataset
- Parameters:
dataset (XR_TYPES) – Dataset to apply transform to
- Raises:
NotImplementedError – Base Transform does not implement this function
- Returns:
Transformed Dataset
- Return type:
XR_TYPES
- class pyearthtools.data.transforms.optimisation.Rechunk(method)#
Rechunk data
Rechunk data
- Parameters:
method (int | dict[str, Any] | Literal['auto', 'encoding']) – Rechunk either by encoding, auto or by variable config.
- apply(dataset)#
Apply transformation to Dataset
- Parameters:
dataset (XR_TYPES) – Dataset to apply transform to
- Raises:
NotImplementedError – Base Transform does not implement this function
- Returns:
Transformed Dataset
- Return type:
XR_TYPES
- pyearthtools.data.transforms.region.check_shape(data)#
Calculate multiplied shape of xarray data container
- Parameters:
data (xr.Dataset | xr.DataArray) – Data to find shape for
- Returns:
Multiplied shape of data
- Return type:
int
- pyearthtools.data.transforms.region.order(*args)#
Order arguments with sort & return as tuple
- pyearthtools.data.transforms.region.like(dataset)#
Use Reference Dataset to inform spatial extent & transform geospatial extent accordingly
- Parameters:
dataset (xr.Dataset | str) – Reference Dataset to use. Can be path to dataset to load
- Returns:
Transform to cut region to extent of given reference dataset
- Return type:
- class pyearthtools.data.transforms.region.Bounding(min_lat, max_lat, min_lon, max_lon)#
Cut with Bounding box
Use Bounding Coordinates to transform geospatial extent
- Parameters:
min_lat (float) – Minimum Latitude to slice with
max_lat (float) – Maximum Latitude to slice with
min_lon (float) – Minimum Longitude to slice with
max_lon (float) – Maximum Longitude to slice with
- apply(dataset)#
Apply transformation to Dataset
- Parameters:
dataset (XR_TYPES) – Dataset to apply transform to
- Raises:
NotImplementedError – Base Transform does not implement this function
- Returns:
Transformed Dataset
- Return type:
XR_TYPES
- class pyearthtools.data.transforms.region.Select#
Select on a dataset with
sel_kwargs- Parameters:
sel_kwargs (dict[str, Any] | None)
- apply(dataset)#
Apply transformation to Dataset
- Parameters:
dataset (XR_TYPES) – Dataset to apply transform to
- Raises:
NotImplementedError – Base Transform does not implement this function
- Returns:
Transformed Dataset
- Return type:
XR_TYPES
- class pyearthtools.data.transforms.region.ISelect#
Index select on a dataset with
sel_kwargs- Parameters:
sel_kwargs (dict[str, Any] | None)
- apply(dataset)#
Apply transformation to Dataset
- Parameters:
dataset (XR_TYPES) – Dataset to apply transform to
- Raises:
NotImplementedError – Base Transform does not implement this function
- Returns:
Transformed Dataset
- Return type:
XR_TYPES
- pyearthtools.data.transforms.region.PointBox(point, size)#
Create a region bounding box of
sizearoundpoint- Parameters:
point (tuple[float]) – Latitude and Longitude point
size (float) – Size in degrees to expand the box Total box width / length =
size* 2
- Returns:
Transform to cut region to bounding box around point
- Return type:
- pyearthtools.data.transforms.region.Lookup(key, regionfile=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/pyearthtools/checkouts/latest/packages/data/src/pyearthtools/data/transforms/RegionLookup.yaml'))#
Use string to retrieve preset lat and lon extent to transform geospatial extent
- Parameters:
key (str) – Lookup key within the preset file
regionfile (str | Path) – Yaml File to look for keys in. Defaults to RegionLookupFILE
- Raises:
KeyError – If key not in preset file
- Returns:
Transform to cut region to define bounding box
- Return type:
- pyearthtools.data.transforms.region.Geosearch(key, column=None, value=None, crs=None, **kwargs)#
Using [static.geographic][pyearthtools.data.static.geographic] retrieve a Shapefile. Allows selection of geopandas file, column and value to filter by
If no column nor value provided, use all geometry in geopandas file
- Parameters:
key (str) – A [Geographic][pyearthtools.data.static.geographic] search key
column (str | None, optional) – Column in geopandas to search in. Defaults to None.
value (list[str] | str, optional) – Values to search for, can be list. Defaults to None.
crs (str | None, optional) –
Coordinate Reference System (CRS) to apply to data. Will check if
shapefilehas crs information and attempt to use if not provided. Otherwise an error will be raised.Can be any code accepted by
geopandas. See (here)[https://geopandas.org/en/stable/docs/user_guide/projections.html#coordinate-reference-systems]
- class pyearthtools.data.transforms.region.ShapeFile(shapefile, crs=None)#
Use Shapefile to create region bounding.
- Parameters:
shapefile (Any | str) – Shapefile to use
crs (str | None, optional) –
Coordinate Reference System (CRS) to apply to data. Will check if
shapefilehas crs information and attempt to use if not provided. Otherwise an error will be raised.Can be any code accepted by
geopandas. See (here)[https://geopandas.org/en/stable/docs/user_guide/projections.html#coordinate-reference-systems]Defaults to None.
- Raises:
ImportError – If geopandas cannot be imported
- apply(dataset)#
Apply transformation to Dataset
- Parameters:
dataset (XR_TYPES) – Dataset to apply transform to
- Raises:
NotImplementedError – Base Transform does not implement this function
- Returns:
Transformed Dataset
- Return type:
XR_TYPES
- pyearthtools.data.transforms.utils.parse_dataset(value)#
Attempt to load dataset if value is str or Path Return the original value if not
- Parameters:
value (str | Path | Any)
- Return type:
Any
data.catalog#
- class pyearthtools.data.catalog.Catalog(*, catalog_name=None, entries=None)#
Keep a Catalog of Data Sources
Used to track known kwargs for functions
Can be used for any class with specifies the function
to_init_dict, which returns a dictionary with the key being the fully featured class name, and the value, a dictionary with the init kwargs. Anamekwarg specifies theCatalogEntryname.Initalise a new Catalog of Data Sources
- Parameters:
name (str, optional) – Name for this catalog. Defaults to None.
named_entries – {name : (Path, ‘Catalog’, CatalogEntry | pyearthtools.data.Index)} Named entries to add to catalog Names may be None
catalog_name (Optional[str])
entries (Optional[dict])
Examples
>>> test_catalog = Catalog() >>> def return_function(x, **kwargs): ... return '-'.join([x,*list(kwargs.keys())]) >>> test_catalog.append(CatalogEntry(return_function, name = 'Test' wow = 1)) >>> test_catalog.Test('entry') 'entry-wow'
- append(other, *, name=None)#
Append Elements to Catalog.
- Parameters:
other (str, Path, Catalog, CatalogEntry | pyearthtools.data.DataIndex | dict) – Items to add to Catalog
name (str, optional) – Override for name of entry. Defaults to None.
- Raises:
KeyError – If
pyearthtools.data.Indexhas no attrcatalogTypeError – If other not recognised
- static load(catalog_to_load, direct_load=False, **kwargs)#
Load saved catalog file into new catalog object
- !!! Tip:
If pointed at a folder, will search the folder looking for a catalog file of
.cat. If found that catalog will be loaded, instead. Used to create folders loadable from pyearthtools.
- Parameters:
catalog_to_load (str | Path) – Filepath to catalog file All function pointers are converted from str to function pointer
direct_load (bool, optional) – If Catalog contains one entry, this flag can be used to return that index instead. Defaults to False
- Raises:
FileNotFoundError – If file does not exist
- Returns:
Loaded Catalog
- Return type:
- pop(key)#
Pop element from Catalog
- Parameters:
key (str) – Key to pop
- Raises:
KeyError – If key not in catalog
- Returns:
Popped entry
- Return type:
- remove(key)#
Remove element from catalog
- Parameters:
key (str) – Key to remove
- Raises:
KeyError – If Key not catalog
- save(output_file=None, direct_load=False)#
Save Catalog to specified file
Auto converts any function pointers to fully qualified path
- Parameters:
output_file (str | Path | None, optional) – Save file path. Defaults to None.
direct_load (bool, optional) – If Catalog contains one entry, this flag can be used so that when the catalog is loaded, the index is returned instead. Defaults to False
- Return type:
None | dict
- to_dict()#
Get catalog as dictionary
- class pyearthtools.data.catalog.CatalogEntry(item_class, args=[], *extra_args, name=None, class_path=None, kwargs={}, **extra_kwargs)#
Catalog Entry
Setup Catalog Entry.
Can be used to catalog any class, and the args and kwargs to initalise it.
- Parameters:
item_class (Callable | None) – Class for which to setup a catalog entry
args (list[Any]) – args to be passed to
item_class*extra_args – also passed to
item_classname (str | None) – Name of this entry
class_path (str | None) – Override for class path. If not given will be auto found.
**kwargs (dict) – kwargs to be passed to
item_class
- property call_underlying_function: Any#
Get underlying Class of the catalog entry
Returns: Underlying Class
- del_kwargs(key)#
Remove kwargs
- Parameters:
key (str) – Key to remove
- Raises:
KeyError – If key not found
- static from_dict(init_dict, **kwargs)#
Create
CatalogEntryfrom dictionaryThis dictionary can be of two forms, one that is the result of
CatalogEntry.to_dict(), and the other a more general form.Form of the init_dict
>>> { >>> CLASS: >>> { # All are optional >>> args: #Arguments to initalise with >>> kwargs: #Keyword arguments to initalise with >>> name: #Name of entry >>> } >>> >>> }
- Parameters:
init_dict (dict) – Initialisation Dictionary.
**kwargs – Kwargs to replace init_dict[‘kwargs’] with.
- Return type:
Returns: Loaded
CatalogEntry
- save(output_file=None, direct_load=True)#
Save this
CatalogEntryas a catalog at Path- Parameters:
output_file (str | Path | None, optional) – Path to savefile. Defaults to None.
direct_load (bool, optional) – When loading this catalog entry, should the index be directly returned Defaults to True.
- Return type:
None | dict
- set_kwargs(**kwargs)#
Add extra kwargs
- Parameters:
**kwargs (Any) – Extra kwargs
- to_dict()#
Convert
CatalogEntryinto dict- Returns:
Dictionary containing all info needed to reconstruct the object.
Structure:
item_class: Function class path name: Catalog Entry name args: Args used to init kwargs: Kwargs used to init
- Return type:
dict
- pyearthtools.data.catalog.get_name(obj)#
Get name of object
- Parameters:
obj (Any)
- Return type:
str
data.collection#
- class pyearthtools.data.collection.Collection(*args, **kwds)#
A modified tuple type object which allows attributes and methods to be accessed.
Attributes and methods will be returned as a
Collection, thus allowing their attributes and methods to be accessed.Any item in a
Collectioncan be accessed by using the[]syntax, and can be iterated over.Examples
>>> collec = pyearthtools.data.Collection({'item_1':10}, {'item_2':42}) >>> collec Collection Containing: {'item_1': 10} {'item_2': 42} >>> collec.keys() Collection Containing: dict_keys(['item_1']) dict_keys(['item_2']) >>> collec[0] {'item_1': 10}
- Parameters:
args (Any)
- class pyearthtools.data.collection.LabelledCollection(*args, **kwds)#
A modified unmutable dict like object which allows attributes and methods to be accessed of the underlying objects, while retaining the original names. This allows for a name to be given to a root object, and any operations or attributes from said object will remain linked to that name.
Attributes and methods will be returned as a
LabelledCollection, thus allowing their attributes and methods to be accessed.Any item in a
LabelledCollectioncan be accessed by it’s given name, and can be iterated over.- Parameters:
kwargs (Any)
data.exceptions#
- class pyearthtools.data.exceptions.InvalidIndexError(message, *args)#
If an invalid index was provided
- class pyearthtools.data.exceptions.InvalidDataError(message, *args)#
If data cannot be loaded
- class pyearthtools.data.exceptions.DataNotFoundError(message, *args)#
If Data was not found
data.load#
- pyearthtools.data.load.load(stream, **kwargs)#
Load a
savedpyearthtools.data.Index- Parameters:
stream (Union[str, Path]) – Stream to load, can be either path to config or yaml str
- Returns:
Loaded Index
- Return type:
(pyearthtools.data.Index)
data.time#
- pyearthtools.data.time.multisplit(element, splits)#
Split a str by multiple characters.
- Parameters:
element (str)
splits (tuple[str | int, ...])
- Return type:
list[str]
- pyearthtools.data.time.find_components(time)#
Find Specified Time components in given time str (e.g. indicate which of year, month, day, hour etc set is set in the time string)
- Parameters:
time (str) – String of time, usually in isoformat e.g. ‘2021-02-03T0000’
- Returns:
resolution_component -> flag
- Return type:
dict[str, bool]
Examples
>>> pyearthtools.data.time.find_components('2020-01') {'year': True, 'month': True, 'day': False, 'minute': False, 'second': False}
- pyearthtools.data.time.strip_to_common_resolution(component)#
Remove common suffix for time resolution vernacular
- Parameters:
component (str)
- Return type:
str
- pyearthtools.data.time.time_delta(time_amount)#
Create a pandas timedelta
- Parameters:
time (Any) – time of delta, can be: int: automatic unit of ‘minutes’ applied tuple: (int, str) with str being unit
time_amount (Any)
- Returns:
Discovered pandas timedelta
- Return type:
pd.Timedelta
- pyearthtools.data.time.time_delta_resolution(timedelta)#
Find resolution of timedelta
- Parameters:
timedelta (pd.Timedelta) – Given timedelta
- Returns:
Resolution of
timedelta- Return type:
TimeResolution
- pyearthtools.data.time.range_samples(start, end, step, inclusive=False)#
Cache generation of time samples
- class pyearthtools.data.time.Petdt(time, *, resolution=None)#
PyEarthTools Datetime object which has additional functionality relating to temporal resolution and resolution conversion compared to other libraries, and also supports alternative calendars to some degree.
Examples
>>> str(Petdt('2021-01')) "2021-01" >>> str(Petdt('2021-01-12')) "2021-01-12"
- Parameters:
time (Any) – Time to get resolution of. Can use ‘today’ to get today
resolution (str | TimeResolution | None) – Override for resolution specification. Defaults to None.
Notes
time must be a str or Petdt for resolution awareness to take effect, If str, it must be in isoformat
- Valid time resolutions are:
“year”, “month”, “day”, “hour”, “minute”, “second”, “nanosecond”,
Time when supplied as a string may be underspecified (e.g. just the year).
The resolution of a supplied time string will be inferred from the time components which are present in the string.
If a resolution is specified lower than the specified time string, the datetime will be down-sampled to match the specified resolution.
- at_resolution(resolution)#
Get Petdt at specified resolution
- property datetime: datetime#
Get
datetime.datetimeobject
- datetime64(time_unit='ns')#
Get Petdt as a
np.datetime64in given unit- Parameters:
time_unit (str, optional) – Time unit to get datetime64 in. Defaults to “ns”.
- Returns:
Defined time as a np.datetime64
- Return type:
np.datetime64
- static is_time(time_to_parse)#
Check if object can be parsed to a
PetdtAttempts to make
Petdtbut catches all exceptions.- Parameters:
time_to_parse (Any) – Object to check if can be
Petdt- Returns:
Boolean value of if can be
Petdt- Return type:
(bool)
- to_cftime(calendar='noleap')#
This method will throw an exception if cftime is not installed.
- class pyearthtools.data.time.TimeDelta(timedelta=None, *args)#
Create a TimeDelta Object
Effectively a wrapper around the
pandas.Timedelta.If no units are supplied,
minutesis automatically assumed.- Parameters:
timedelta (Any) – Timedelta arguments, can be int or tuple
*args (Any) – Extra Timedelta arguments. If
timedeltais int, set unit.
Examples
>>> TimeDelta(10, 'days') 10 days 00:00:00 >>> TimeDelta((10, 'days')) 10 days 00:00:00 >>> TimeDelta(10) 0 days 00:10:00
- property np_timedelta: timedelta64#
Numpy timedelta64 of TimeDelta
- property pd_timedelta: Timedelta#
Pandas Timedelta
- property resolution: TimeResolution#
Resolution of the TimeDelta
- class pyearthtools.data.time.TimeRange(start, end, step, *, inclusive=False, use_tqdm=False, desc='', **kwargs)#
Get all timesteps between two points at an interval
Generate all timesteps between start & end at step interval.
- Parameters:
start (Petdt | str) – Starting time
end (Petdt | str) – Ending Time
step (TimeDelta | int | tuple) – Step Interval
inclusive (bool, optional) – Include end time. Defaults to False.
use_tqdm (bool, optional) – Format iterator with tqdm for interactive use. Defaults to False.
desc (str, optional) – Description if
use_tqdm == True. Defaults to False.**kwargs (Any, optional) – If using tqdm, all kwargs passed through
data.warnings#
- class pyearthtools.data.warnings.pyearthtoolsDataWarning#
General warning for
pyearthtools.dataprocesses.
- class pyearthtools.data.warnings.IndexWarning#
Data Index Warning.
- class pyearthtools.data.warnings.AccessorRegistrationWarning#
Warning for conflicts in object registration.