Dataset Operations#

[1]:
import pyearthtools.data
import warnings

with warnings.catch_warnings(action="ignore"):
    import site_archive_nci

Variables#

[2]:
var = 'tcwv'

Builtin Operations#

pyearthtools.data provides a couple useful operations which can be applied to any DataIndex,

As of the writing of this Notebook, the following operations are accessible from the DataIndex

  • Range

  • Aggreagtion

Other operations are also implemented, and accessible at pyearthtools.data.operations

[3]:
era5 = pyearthtools.data.archive.ERA5(var)

Range of Data#

Lets find the range of data in March 2021, with data retrieved at hourly intervals

[4]:
era5.range('2021-03-01', '2021-04-01', interval = (1, 'hour'))
[4]:
{'tcwv': {'max': dask.array<_nanmax_skip-aggregate, shape=(), dtype=float64, chunksize=(), chunktype=numpy.ndarray>,
  'min': dask.array<_nanmin_skip-aggregate, shape=(), dtype=float64, chunksize=(), chunktype=numpy.ndarray>}}

Aggregation#

Lets find the temporal mean of data in March 2021, with data retrieved at 6 hourly intervals

[5]:
era5.aggregation('2021-03-01', '2021-04-01',  interval = (6, 'hour'), aggregation = 'mean')
[5]:
<xarray.Dataset> Size: 8MB
Dimensions:    (longitude: 1440, latitude: 721)
Coordinates:
  * longitude  (longitude) float32 6kB -180.0 -179.8 -179.5 ... 179.5 179.8
  * latitude   (latitude) float32 3kB 90.0 89.75 89.5 ... -89.5 -89.75 -90.0
Data variables:
    tcwv       (latitude, longitude) float64 8MB dask.array<chunksize=(182, 360), meta=np.ndarray>
Attributes:
    Conventions:  CF-1.6
    history:      2021-07-04 07:02:45 UTC+1000 by era5_replication_tools-1.9....
    license:      Licence to use Copernicus Products: https://apps.ecmwf.int/...
    summary:      ERA5 is the fifth generation ECMWF atmospheric reanalysis o...
    title:        ERA5 single-levels reanalysis total_column_water_vapour 202...
    Aggregation:  Aggregation mean from 2021-03-01T00 to 2021-04-01 at 0 days...
[6]:
era5.aggregation('2021-03-01', '2021-04-01',  interval = (6, 'hour'), aggregation = 'min').tcwv.plot()
[6]:
<matplotlib.collections.QuadMesh at 0x1545b19beb10>
../../_images/notebooks_data_UsingTheInBuiltOperations_10_1.png

Percentiles#

As there is no clear way to find percentiles with xarray.Dataset, a custom function is provided. It can retrieve any number of percentiles when given a Dataset

[7]:
pyearthtools.data.operations.percentile(era5('2021-01-01'), [1,5,95,99])
[7]:
<xarray.Dataset> Size: 64B
Dimensions:     (Percentile: 4)
Coordinates:
  * Percentile  (Percentile) int64 32B 1 5 95 99
Data variables:
    tcwv        (Percentile) float64 32B 0.4732 0.7271 52.75 60.32
[ ]: