Introduction to cf_xarray
¶
This notebook is a brief introduction to cf_xarray
’s current capabilities.
import numpy as np
import xarray as xr
import cf_xarray as cfxr
# For this notebooks, it's nicer if we don't show the array values by default
xr.set_options(display_expand_data=False)
<xarray.core.options.set_options at 0x7f9f78cd0440>
cf_xarray
works best when xarray
keeps attributes by default.
xr.set_options(keep_attrs=True)
<xarray.core.options.set_options at 0x7f9f786fdf90>
Lets read two datasets.
ds = xr.tutorial.load_dataset("air_temperature")
ds.air.attrs["standard_name"] = "air_temperature"
ds
<xarray.Dataset> Size: 31MB Dimensions: (lat: 25, time: 2920, lon: 53) Coordinates: * lat (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0 * lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0 * time (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00 Data variables: air (time, lat, lon) float64 31MB 241.2 242.5 243.5 ... 296.2 295.7 Attributes: Conventions: COARDS title: 4x daily NMC reanalysis (1948) description: Data is from NMC initialized reanalysis\n(4x/day). These a... platform: Model references: http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...
This one is inspired by POP model output and illustrates how the coordinates
attribute is interpreted. It also illustrates one way of tagging curvilinear
grids for convenient use of cf_xarray
from cf_xarray.datasets import popds as pop
pop
<xarray.Dataset> Size: 29kB Dimensions: (nlat: 20, nlon: 30) Coordinates: TLONG (nlat, nlon) float64 5kB 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 TLAT (nlat, nlon) float64 5kB 2.0 2.0 2.0 2.0 2.0 ... 2.0 2.0 2.0 2.0 ULONG (nlat, nlon) float64 5kB 0.5 0.5 0.5 0.5 0.5 ... 0.5 0.5 0.5 0.5 ULAT (nlat, nlon) float64 5kB 2.5 2.5 2.5 2.5 2.5 ... 2.5 2.5 2.5 2.5 * nlon (nlon) int64 240B 0 1 2 3 4 5 6 7 8 ... 21 22 23 24 25 26 27 28 29 * nlat (nlat) int64 160B 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Data variables: UVEL (nlat, nlon) float64 5kB 15.0 15.0 15.0 15.0 ... 15.0 15.0 15.0 TEMP (nlat, nlon) float64 5kB 15.0 15.0 15.0 15.0 ... 15.0 15.0 15.0
This synthetic dataset has multiple X
and Y
coords. An example would be
model output on a staggered grid.
from cf_xarray.datasets import multiple
multiple
<xarray.Dataset> Size: 6kB Dimensions: (x1: 30, y1: 20, x2: 10, y2: 5) Coordinates: * x1 (x1) int64 240B 0 1 2 3 4 5 6 7 8 9 ... 21 22 23 24 25 26 27 28 29 * y1 (y1) int64 160B 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 * x2 (x2) int64 80B 0 1 2 3 4 5 6 7 8 9 * y2 (y2) int64 40B 0 1 2 3 4 Data variables: v1 (x1, y1) float64 5kB 15.0 15.0 15.0 15.0 ... 15.0 15.0 15.0 15.0 v2 (x2, y2) float64 400B 15.0 15.0 15.0 15.0 ... 15.0 15.0 15.0 15.0
This dataset has ancillary variables
from cf_xarray.datasets import anc
anc
<xarray.Dataset> Size: 3kB Dimensions: (x: 10, y: 20) Dimensions without coordinates: x, y Data variables: q (x, y) float64 2kB -1.688 -0.1002 ... 0.7781 -0.5075 q_error_limit (x, y) float64 2kB 1.656 -0.7808 -0.473 ... -0.5146 -1.11 q_detection_limit float64 8B 0.001
What attributes have been discovered?¶
The criteria for identifying variables using CF attributes are listed here.
ds.lon
<xarray.DataArray 'lon' (lon: 53)> Size: 212B 200.0 202.5 205.0 207.5 210.0 212.5 ... 317.5 320.0 322.5 325.0 327.5 330.0 Coordinates: * lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0 Attributes: standard_name: longitude long_name: Longitude units: degrees_east axis: X
ds.lon
has attributes axis: X
. This means that cf_xarray
can identify the
'X'
axis as being represented by the lon
variable.
It can also use the standard_name
and units
attributes to infer that lon
is “Longitude”. To see variable names that cf_xarray
can infer, use ds.cf
ds.cf
Coordinates:
CF Axes: * X: ['lon']
* Y: ['lat']
* T: ['time']
Z: n/a
CF Coordinates: * longitude: ['lon']
* latitude: ['lat']
* time: ['time']
vertical: n/a
Cell Measures: area, volume: n/a
Standard Names: * latitude: ['lat']
* longitude: ['lon']
* time: ['time']
Bounds: n/a
Grid Mappings: n/a
Data Variables:
Cell Measures: area, volume: n/a
Standard Names: air_temperature: ['air']
Bounds: n/a
Grid Mappings: n/a
For pop
, only latitude
and longitude
are detected, not X
or Y
. Please
comment here: https://github.com/xarray-contrib/cf-xarray/issues/23 if you have
opinions about this behaviour.
pop.cf
Coordinates:
CF Axes: * X: ['nlon']
* Y: ['nlat']
Z, T: n/a
CF Coordinates: longitude: ['TLONG', 'ULONG']
latitude: ['TLAT', 'ULAT']
vertical, time: n/a
Cell Measures: area, volume: n/a
Standard Names: n/a
Bounds: n/a
Grid Mappings: n/a
Data Variables:
Cell Measures: area, volume: n/a
Standard Names: sea_water_potential_temperature: ['TEMP']
sea_water_x_velocity: ['UVEL']
Bounds: n/a
Grid Mappings: n/a
For multiple
, multiple X
and Y
coordinates are detected
multiple.cf
Coordinates:
CF Axes: * X: ['x1', 'x2']
* Y: ['y1', 'y2']
Z, T: n/a
CF Coordinates: longitude, latitude, vertical, time: n/a
Cell Measures: area, volume: n/a
Standard Names: n/a
Bounds: n/a
Grid Mappings: n/a
Data Variables:
Cell Measures: area, volume: n/a
Standard Names: n/a
Bounds: n/a
Grid Mappings: n/a
Feature: Accessing coordinate variables¶
.cf
implements __getitem__
to allow easy access to coordinate and axis
variables.
ds.cf["X"]
<xarray.DataArray 'lon' (lon: 53)> Size: 212B 200.0 202.5 205.0 207.5 210.0 212.5 ... 317.5 320.0 322.5 325.0 327.5 330.0 Coordinates: * lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0 Attributes: standard_name: longitude long_name: Longitude units: degrees_east axis: X
Indexing with a scalar key raises an error if the key maps to multiple variables names
multiple.cf["X"]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[12], line 1
----> 1 multiple.cf["X"]
File ~/checkouts/readthedocs.org/user_builds/cf-xarray/checkouts/latest/cf_xarray/accessor.py:2349, in CFDatasetAccessor.__getitem__(self, key)
2317 def __getitem__(self, key: Hashable | Iterable[Hashable]) -> DataArray | Dataset:
2318 """
2319 Index into a Dataset making use of CF attributes.
2320
(...)
2347 Add additional keys by specifying "custom criteria". See :ref:`custom_criteria` for more.
2348 """
-> 2349 return _getitem(self, key)
File ~/checkouts/readthedocs.org/user_builds/cf-xarray/checkouts/latest/cf_xarray/accessor.py:838, in _getitem(accessor, key, skip)
836 names = _get_all(obj, k)
837 names = drop_bounds(names)
--> 838 check_results(names, k)
839 successful[k] = bool(names)
840 coords.extend(names)
File ~/checkouts/readthedocs.org/user_builds/cf-xarray/checkouts/latest/cf_xarray/accessor.py:808, in _getitem.<locals>.check_results(names, key)
806 def check_results(names, key):
807 if scalar_key and len(names) > 1:
--> 808 raise KeyError(
809 f"Receive multiple variables for key {key!r}: {names}. "
810 f"Expected only one. Please pass a list [{key!r}] "
811 f"instead to get all variables matching {key!r}."
812 )
KeyError: "Receive multiple variables for key 'X': {'x1', 'x2'}. Expected only one. Please pass a list ['X'] instead to get all variables matching 'X'."
pop.cf["longitude"]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[13], line 1
----> 1 pop.cf["longitude"]
File ~/checkouts/readthedocs.org/user_builds/cf-xarray/checkouts/latest/cf_xarray/accessor.py:2349, in CFDatasetAccessor.__getitem__(self, key)
2317 def __getitem__(self, key: Hashable | Iterable[Hashable]) -> DataArray | Dataset:
2318 """
2319 Index into a Dataset making use of CF attributes.
2320
(...)
2347 Add additional keys by specifying "custom criteria". See :ref:`custom_criteria` for more.
2348 """
-> 2349 return _getitem(self, key)
File ~/checkouts/readthedocs.org/user_builds/cf-xarray/checkouts/latest/cf_xarray/accessor.py:838, in _getitem(accessor, key, skip)
836 names = _get_all(obj, k)
837 names = drop_bounds(names)
--> 838 check_results(names, k)
839 successful[k] = bool(names)
840 coords.extend(names)
File ~/checkouts/readthedocs.org/user_builds/cf-xarray/checkouts/latest/cf_xarray/accessor.py:808, in _getitem.<locals>.check_results(names, key)
806 def check_results(names, key):
807 if scalar_key and len(names) > 1:
--> 808 raise KeyError(
809 f"Receive multiple variables for key {key!r}: {names}. "
810 f"Expected only one. Please pass a list [{key!r}] "
811 f"instead to get all variables matching {key!r}."
812 )
KeyError: "Receive multiple variables for key 'longitude': {'ULONG', 'TLONG'}. Expected only one. Please pass a list ['longitude'] instead to get all variables matching 'longitude'."
To get back all variables associated with that key, pass a single element list instead.
multiple.cf[["X"]]
<xarray.Dataset> Size: 320B Dimensions: (x1: 30, x2: 10) Coordinates: * x1 (x1) int64 240B 0 1 2 3 4 5 6 7 8 9 ... 21 22 23 24 25 26 27 28 29 * x2 (x2) int64 80B 0 1 2 3 4 5 6 7 8 9 Data variables: *empty*
pop.cf[["longitude"]]
<xarray.Dataset> Size: 10kB Dimensions: (nlat: 20, nlon: 30) Coordinates: ULONG (nlat, nlon) float64 5kB 0.5 0.5 0.5 0.5 0.5 ... 0.5 0.5 0.5 0.5 TLONG (nlat, nlon) float64 5kB 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 * nlon (nlon) int64 240B 0 1 2 3 4 5 6 7 8 ... 21 22 23 24 25 26 27 28 29 * nlat (nlat) int64 160B 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Data variables: *empty*
DataArrays return DataArrays
pop.UVEL.cf["longitude"]
<xarray.DataArray 'ULONG' (nlat: 20, nlon: 30)> Size: 5kB 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ... 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 Coordinates: * nlon (nlon) int64 240B 0 1 2 3 4 5 6 7 8 ... 21 22 23 24 25 26 27 28 29 * nlat (nlat) int64 160B 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Attributes: units: degrees_east
Dataset.cf[...]
returns a single DataArray
, parsing the coordinates
attribute if present, so we correctly get the TLONG
variable and not the
ULONG
variable
pop.cf["TEMP"]
<xarray.DataArray 'TEMP' (nlat: 20, nlon: 30)> Size: 5kB 15.0 15.0 15.0 15.0 15.0 15.0 15.0 15.0 ... 15.0 15.0 15.0 15.0 15.0 15.0 15.0 Coordinates: * nlon (nlon) int64 240B 0 1 2 3 4 5 6 7 8 ... 21 22 23 24 25 26 27 28 29 * nlat (nlat) int64 160B 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 TLONG (nlat, nlon) float64 5kB 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 TLAT (nlat, nlon) float64 5kB 2.0 2.0 2.0 2.0 2.0 ... 2.0 2.0 2.0 2.0 Attributes: coordinates: TLONG TLAT standard_name: sea_water_potential_temperature
Dataset.cf[...]
also interprets the ancillary_variables
attribute. The
ancillary variables are returned as coordinates of a DataArray
anc.cf["q"]
<xarray.DataArray 'q' (x: 10, y: 20)> Size: 2kB -1.688 -0.1002 -0.2317 0.06959 -1.971 ... 0.3968 -0.7902 -0.8904 0.7781 -0.5075 Coordinates: q_error_limit (x, y) float64 2kB 1.656 -0.7808 -0.473 ... -0.5146 -1.11 q_detection_limit float64 8B 0.001 Dimensions without coordinates: x, y Attributes: standard_name: specific_humidity units: g/g ancillary_variables: q_error_limit q_detection_limit
Feature: Accessing variables by standard names¶
pop.cf[["sea_water_potential_temperature", "UVEL"]]
<xarray.Dataset> Size: 29kB Dimensions: (nlat: 20, nlon: 30) Coordinates: TLONG (nlat, nlon) float64 5kB 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 TLAT (nlat, nlon) float64 5kB 2.0 2.0 2.0 2.0 2.0 ... 2.0 2.0 2.0 2.0 ULONG (nlat, nlon) float64 5kB 0.5 0.5 0.5 0.5 0.5 ... 0.5 0.5 0.5 0.5 ULAT (nlat, nlon) float64 5kB 2.5 2.5 2.5 2.5 2.5 ... 2.5 2.5 2.5 2.5 * nlon (nlon) int64 240B 0 1 2 3 4 5 6 7 8 ... 21 22 23 24 25 26 27 28 29 * nlat (nlat) int64 160B 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Data variables: TEMP (nlat, nlon) float64 5kB 15.0 15.0 15.0 15.0 ... 15.0 15.0 15.0 UVEL (nlat, nlon) float64 5kB 15.0 15.0 15.0 15.0 ... 15.0 15.0 15.0
Note that ancillary variables are included as coordinate variables
anc.cf["specific_humidity"]
<xarray.DataArray 'q' (x: 10, y: 20)> Size: 2kB -1.688 -0.1002 -0.2317 0.06959 -1.971 ... 0.3968 -0.7902 -0.8904 0.7781 -0.5075 Coordinates: q_error_limit (x, y) float64 2kB 1.656 -0.7808 -0.473 ... -0.5146 -1.11 q_detection_limit float64 8B 0.001 Dimensions without coordinates: x, y Attributes: standard_name: specific_humidity units: g/g ancillary_variables: q_error_limit q_detection_limit
Feature: Utility functions¶
There are some utility functions to allow use by downstream libraries
pop.cf.keys()
{'X',
'Y',
'latitude',
'longitude',
'sea_water_potential_temperature',
'sea_water_x_velocity'}
You can test for presence of these keys
"sea_water_x_velocity" in pop.cf
True
You can also get out the available Axis names
pop.cf.axes
{'X': ['nlon'], 'Y': ['nlat']}
or available Coordinate names. Same for cell measures (.cf.cell_measures
) and
standard names (.cf.standard_names
).
pop.cf.coordinates
{'longitude': ['TLONG', 'ULONG'], 'latitude': ['TLAT', 'ULAT']}
Note: Although it is possible to assign additional coordinates,
.cf.coordinates
only returns a subset of
("longitude", "latitude", "vertical", "time")
.
Feature: Rewriting property dictionaries¶
cf_xarray
will rewrite the .sizes
and .chunks
dictionaries so that one can
index by a special CF axis or coordinate name
ds.cf.sizes
{'latitude': 25, 'Y': 25, 'time': 2920, 'T': 2920, 'longitude': 53, 'X': 53}
Note the duplicate entries above:
One for
X
,Y
,T
and one for
longitude
,latitude
andtime
.
An error is raised if there are multiple 'X'
variables (for example)
multiple.cf.sizes
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[26], line 1
----> 1 multiple.cf.sizes
File ~/checkouts/readthedocs.org/user_builds/cf-xarray/checkouts/latest/cf_xarray/accessor.py:1550, in CFAccessor.__getattr__(self, attr)
1549 def __getattr__(self, attr):
-> 1550 return _getattr(
1551 obj=self._obj,
1552 attr=attr,
1553 accessor=self,
1554 key_mappers=_DEFAULT_KEY_MAPPERS,
1555 wrap_classes=True,
1556 )
File ~/checkouts/readthedocs.org/user_builds/cf-xarray/checkouts/latest/cf_xarray/accessor.py:704, in _getattr(obj, attr, accessor, key_mappers, wrap_classes, extra_decorator)
702 for name in inverted[key]:
703 if name in newmap:
--> 704 raise AttributeError(
705 f"cf_xarray can't wrap attribute {attr!r} because there are multiple values for {name!r}. "
706 f"There is no unique mapping from {name!r} to a value in {attr!r}."
707 )
708 newmap.update(dict.fromkeys(inverted[key], value))
709 newmap.update({key: attribute[key] for key in unused_keys})
AttributeError: cf_xarray can't wrap attribute 'sizes' because there are multiple values for 'X'. There is no unique mapping from 'X' to a value in 'sizes'.
multiple.v1.cf.sizes
{'X': 30, 'Y': 20}
Feature: Renaming variables¶
cf_xarray
lets you rewrite variables in one dataset to like variables in
another dataset.
In this example, a one-to-one mapping is not possible and the coordinate variables are not renamed.
da = pop.cf["TEMP"]
da.cf.rename_like(ds)
/tmp/ipykernel_3095/2327264871.py:2: UserWarning: Conflicting variables skipped:
['TLAT']: ['lat'] (latitude)
['TLONG']: ['lon'] (longitude)
['nlat']: ['lat'] (Y)
['nlon']: ['lon'] (X)
da.cf.rename_like(ds)
<xarray.DataArray 'TEMP' (nlat: 20, nlon: 30)> Size: 5kB 15.0 15.0 15.0 15.0 15.0 15.0 15.0 15.0 ... 15.0 15.0 15.0 15.0 15.0 15.0 15.0 Coordinates: * nlon (nlon) int64 240B 0 1 2 3 4 5 6 7 8 ... 21 22 23 24 25 26 27 28 29 * nlat (nlat) int64 160B 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 TLONG (nlat, nlon) float64 5kB 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 TLAT (nlat, nlon) float64 5kB 2.0 2.0 2.0 2.0 2.0 ... 2.0 2.0 2.0 2.0 Attributes: coordinates: TLONG TLAT standard_name: sea_water_potential_temperature
If we exclude all axes (variables with axis
attribute), a one-to-one mapping
is possible. In this example, TLONG
and TLAT
are renamed to lon
and lat
i.e. their counterparts in ds
. Note the the coordinates
attribute is
appropriately changed.
da.cf.rename_like(ds, skip="axes")
<xarray.DataArray 'TEMP' (nlat: 20, nlon: 30)> Size: 5kB 15.0 15.0 15.0 15.0 15.0 15.0 15.0 15.0 ... 15.0 15.0 15.0 15.0 15.0 15.0 15.0 Coordinates: * nlon (nlon) int64 240B 0 1 2 3 4 5 6 7 8 ... 21 22 23 24 25 26 27 28 29 * nlat (nlat) int64 160B 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 lon (nlat, nlon) float64 5kB 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 lat (nlat, nlon) float64 5kB 2.0 2.0 2.0 2.0 2.0 ... 2.0 2.0 2.0 2.0 Attributes: coordinates: lon lat standard_name: sea_water_potential_temperature
Feature: Rewriting arguments¶
cf_xarray
can rewrite arguments for a large number of xarray functions. By
this I mean that instead of specifying say dim="lon"
, you can pass dim="X"
or dim="longitude"
and cf_xarray
will rewrite that to dim="lon"
based on
the attributes present in the dataset.
Here are a few examples
Slicing¶
ds.air.cf.isel(T=1)
<xarray.DataArray 'air' (lat: 25, lon: 53)> Size: 11kB 242.1 242.7 243.1 243.4 243.6 243.8 ... 297.5 297.1 296.9 296.4 296.4 296.6 Coordinates: * lat (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0 * lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0 time datetime64[ns] 8B 2013-01-01T06:00:00 Attributes: long_name: 4xDaily Air temperature at sigma level 995 units: degK precision: 2 GRIB_id: 11 GRIB_name: TMP var_desc: Air temperature dataset: NMC Reanalysis level_desc: Surface statistic: Individual Obs parent_stat: Other actual_range: [185.16 322.1 ] standard_name: air_temperature
Slicing works will expand a single key like X
to multiple dimensions if those
dimensions are tagged with axis: X
multiple.cf.isel(X=1, Y=1)
<xarray.Dataset> Size: 48B Dimensions: () Coordinates: x1 int64 8B 1 y1 int64 8B 1 x2 int64 8B 1 y2 int64 8B 1 Data variables: v1 float64 8B 15.0 v2 float64 8B 15.0
Reductions¶
ds.air.cf.mean("X")
<xarray.DataArray 'air' (time: 2920, lat: 25)> Size: 584kB 242.0 242.0 243.7 251.2 257.2 260.8 ... 292.3 294.4 295.8 297.0 297.9 298.8 Coordinates: * lat (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0 * time (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00 Attributes: long_name: 4xDaily Air temperature at sigma level 995 units: degK precision: 2 GRIB_id: 11 GRIB_name: TMP var_desc: Air temperature dataset: NMC Reanalysis level_desc: Surface statistic: Individual Obs parent_stat: Other actual_range: [185.16 322.1 ] standard_name: air_temperature
Expanding to multiple dimensions is also supported
# takes the mean along ["x1", "x2"]
multiple.cf.mean("X")
<xarray.Dataset> Size: 400B Dimensions: (y1: 20, y2: 5) Coordinates: * y1 (y1) int64 160B 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 * y2 (y2) int64 40B 0 1 2 3 4 Data variables: v1 (y1) float64 160B 15.0 15.0 15.0 15.0 15.0 ... 15.0 15.0 15.0 15.0 v2 (y2) float64 40B 15.0 15.0 15.0 15.0 15.0
Plotting¶
ds.air.cf.isel(time=1).cf.plot(x="X", y="Y")
<matplotlib.collections.QuadMesh at 0x7f9f737ed400>
ds.air.cf.isel(T=1, Y=[0, 1, 2]).cf.plot(x="longitude", hue="latitude")
[<matplotlib.lines.Line2D at 0x7f9f6b7074d0>,
<matplotlib.lines.Line2D at 0x7f9f6b707610>,
<matplotlib.lines.Line2D at 0x7f9f6b707750>]
cf_xarray
can facet
seasonal = (
ds.air.groupby("time.season").mean().reindex(season=["DJF", "MAM", "JJA", "SON"])
)
seasonal.cf.plot(x="longitude", y="latitude", col="season")
<xarray.plot.facetgrid.FacetGrid at 0x7f9f737ef620>
Resample & groupby¶
ds.cf.resample(T="D").mean()
<xarray.Dataset> Size: 8MB Dimensions: (time: 730, lat: 25, lon: 53) Coordinates: * lat (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0 * lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0 * time (time) datetime64[ns] 6kB 2013-01-01 2013-01-02 ... 2014-12-31 Data variables: air (time, lat, lon) float64 8MB 241.9 242.3 242.7 ... 295.9 295.5 Attributes: Conventions: COARDS title: 4x daily NMC reanalysis (1948) description: Data is from NMC initialized reanalysis\n(4x/day). These a... platform: Model references: http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...
cf_xarray
also understands the “datetime accessor” syntax for groupby
ds.cf.groupby("T.month").mean("longitude")
<xarray.Dataset> Size: 607kB Dimensions: (time: 2920, lat: 25) Coordinates: * lat (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0 * time (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00 Data variables: air (time, lat) float64 584kB 242.0 242.0 243.7 ... 297.0 297.9 298.8 Attributes: Conventions: COARDS title: 4x daily NMC reanalysis (1948) description: Data is from NMC initialized reanalysis\n(4x/day). These a... platform: Model references: http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...
Rolling & coarsen¶
ds.cf.rolling(X=5).mean()
<xarray.Dataset> Size: 31MB Dimensions: (lat: 25, lon: 53, time: 2920) Coordinates: * lat (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0 * lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0 * time (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00 Data variables: air (time, lat, lon) float64 31MB nan nan nan nan ... 297.6 297.0 296.6 Attributes: Conventions: COARDS title: 4x daily NMC reanalysis (1948) description: Data is from NMC initialized reanalysis\n(4x/day). These a... platform: Model references: http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...
coarsen
works but everything later will break because of xarray bug
https://github.com/pydata/xarray/issues/4120
ds.isel(lon=slice(50)).cf.coarsen(Y=5, X=10).mean()
Feature: mix “special names” and variable names¶
ds.cf.groupby("T.month").mean(["lat", "X"])
<xarray.Dataset> Size: 47kB Dimensions: (time: 2920) Coordinates: * time (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00 Data variables: air (time) float64 23kB 274.2 273.5 273.2 273.6 ... 273.0 273.0 273.4 Attributes: Conventions: COARDS title: 4x daily NMC reanalysis (1948) description: Data is from NMC initialized reanalysis\n(4x/day). These a... platform: Model references: http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...
Feature: Weight by Cell Measures¶
cf_xarray
can weight by cell measure variables if the appropriate attribute is
set
# Lets make some weights (not sure if this is right)
ds.coords["cell_area"] = (
np.cos(ds.air.cf["latitude"] * np.pi / 180)
* xr.ones_like(ds.air.cf["longitude"])
* 105e3
* 110e3
)
# and set proper attributes
ds["cell_area"].attrs = dict(standard_name="cell_area", units="m2")
ds.air.attrs["cell_measures"] = "area: cell_area"
ds.air.cf.weighted("area").mean(["latitude", "time"]).cf.plot(x="longitude")
ds.air.mean(["lat", "time"]).cf.plot(x="longitude")
[<matplotlib.lines.Line2D at 0x7f9f69baca50>]
Feature: Cell boundaries and vertices¶
cf_xarray
can infer cell boundaries (for rectilinear grids) and convert
CF-standard bounds variables to vertices.
ds_bnds = ds.cf.add_bounds(["lat", "lon"])
ds_bnds
<xarray.Dataset> Size: 31MB Dimensions: (lat: 25, time: 2920, lon: 53, bounds: 2) Coordinates: * lat (lat) float32 100B 75.0 72.5 70.0 67.5 ... 22.5 20.0 17.5 15.0 * lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0 * time (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00 cell_area (lat, lon) float32 5kB 2.989e+09 2.989e+09 ... 1.116e+10 lon_bounds (lon, bounds) float32 424B 198.8 201.2 201.2 ... 328.8 331.2 lat_bounds (lat, bounds) float32 200B 76.25 73.75 73.75 ... 16.25 13.75 Dimensions without coordinates: bounds Data variables: air (time, lat, lon) float64 31MB 241.2 242.5 243.5 ... 296.2 295.7 Attributes: Conventions: COARDS title: 4x daily NMC reanalysis (1948) description: Data is from NMC initialized reanalysis\n(4x/day). These a... platform: Model references: http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...
We can also convert each bounds variable independently with the top-level functions
lat_bounds = ds_bnds.cf.get_bounds("latitude")
lat_vertices = cfxr.bounds_to_vertices(lat_bounds, bounds_dim="bounds")
lat_vertices
<xarray.DataArray 'lat_bounds' (lat_vertices: 26)> Size: 104B 76.25 73.75 71.25 68.75 66.25 63.75 ... 26.25 23.75 21.25 18.75 16.25 13.75 Dimensions without coordinates: lat_vertices Attributes: standard_name: latitude long_name: Latitude units: degrees_north axis: Y
# Or we can convert _all_ bounds variables on a dataset
ds_crns = ds_bnds.cf.bounds_to_vertices()
ds_crns
<xarray.Dataset> Size: 31MB Dimensions: (lat: 25, time: 2920, lon: 53, bounds: 2, lat_vertices: 26, lon_vertices: 54) Coordinates: * lat (lat) float32 100B 75.0 72.5 70.0 67.5 ... 22.5 20.0 17.5 15.0 * lon (lon) float32 212B 200.0 202.5 205.0 ... 325.0 327.5 330.0 * time (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00 cell_area (lat, lon) float32 5kB 2.989e+09 2.989e+09 ... 1.116e+10 lon_bounds (lon, bounds) float32 424B 198.8 201.2 201.2 ... 328.8 331.2 lat_bounds (lat, bounds) float32 200B 76.25 73.75 73.75 ... 16.25 13.75 * lat_vertices (lat_vertices) float32 104B 76.25 73.75 71.25 ... 16.25 13.75 * lon_vertices (lon_vertices) float32 216B 198.8 201.2 203.8 ... 328.8 331.2 Dimensions without coordinates: bounds Data variables: air (time, lat, lon) float64 31MB 241.2 242.5 ... 296.2 295.7 Attributes: Conventions: COARDS title: 4x daily NMC reanalysis (1948) description: Data is from NMC initialized reanalysis\n(4x/day). These a... platform: Model references: http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...
Feature: Add canonical CF attributes¶
cf_xarray
can add missing canonical CF attributes consistent with the official
CF standard name table.
ds_canonical = ds.cf.add_canonical_attributes(verbose=True)
ds_canonical
---------------------------------------------------------------------------
HTTPError Traceback (most recent call last)
Cell In[46], line 1
----> 1 ds_canonical = ds.cf.add_canonical_attributes(verbose=True)
2 ds_canonical
File ~/checkouts/readthedocs.org/user_builds/cf-xarray/checkouts/latest/cf_xarray/accessor.py:2252, in CFAccessor.add_canonical_attributes(self, override, skip, verbose, source)
2249 skip_ = [skip] if isinstance(skip, Hashable) else (skip or [])
2251 # Parse table
-> 2252 info, table, aliases = parse_cf_standard_name_table(source)
2254 # Loop over standard names
2255 ds = self._maybe_to_dataset().copy(deep=False)
File ~/checkouts/readthedocs.org/user_builds/cf-xarray/checkouts/latest/cf_xarray/utils.py:111, in parse_cf_standard_name_table(source)
108 if not source:
109 import pooch
--> 111 source = pooch.retrieve(
112 "https://raw.githubusercontent.com/cf-convention/cf-convention.github.io/"
113 "master/Data/cf-standard-names/current/src/cf-standard-name-table.xml",
114 known_hash=None,
115 )
116 root = ElementTree.parse(source).getroot()
118 # Build dictionaries
File ~/checkouts/readthedocs.org/user_builds/cf-xarray/conda/latest/lib/python3.13/site-packages/pooch/core.py:239, in retrieve(url, known_hash, fname, path, processor, downloader, progressbar)
236 if downloader is None:
237 downloader = choose_downloader(url, progressbar=progressbar)
--> 239 stream_download(url, full_path, known_hash, downloader, pooch=None)
241 if known_hash is None:
242 get_logger().info(
243 "SHA256 hash of downloaded file: %s\n"
244 "Use this value as the 'known_hash' argument of 'pooch.retrieve'"
(...)
247 file_hash(str(full_path)),
248 )
File ~/checkouts/readthedocs.org/user_builds/cf-xarray/conda/latest/lib/python3.13/site-packages/pooch/core.py:807, in stream_download(url, fname, known_hash, downloader, pooch, retry_if_failed)
803 try:
804 # Stream the file to a temporary so that we can safely check its
805 # hash before overwriting the original.
806 with temporary_file(path=str(fname.parent)) as tmp:
--> 807 downloader(url, tmp, pooch)
808 hash_matches(tmp, known_hash, strict=True, source=str(fname.name))
809 shutil.move(tmp, str(fname))
File ~/checkouts/readthedocs.org/user_builds/cf-xarray/conda/latest/lib/python3.13/site-packages/pooch/downloaders.py:221, in HTTPDownloader.__call__(self, url, output_file, pooch, check_only)
219 try:
220 response = requests.get(url, timeout=timeout, **kwargs)
--> 221 response.raise_for_status()
222 content = response.iter_content(chunk_size=self.chunk_size)
223 total = int(response.headers.get("content-length", 0))
File ~/checkouts/readthedocs.org/user_builds/cf-xarray/conda/latest/lib/python3.13/site-packages/requests/models.py:1024, in Response.raise_for_status(self)
1019 http_error_msg = (
1020 f"{self.status_code} Server Error: {reason} for url: {self.url}"
1021 )
1023 if http_error_msg:
-> 1024 raise HTTPError(http_error_msg, response=self)
HTTPError: 403 Client Error: Forbidden for url: https://raw.githubusercontent.com/cf-convention/cf-convention.github.io/master/Data/cf-standard-names/current/src/cf-standard-name-table.xml