Introduction to cf_xarray

This notebook is a brief introduction to cf_xarray’s current capabilities.

import numpy as np
import xarray as xr

import cf_xarray as cfxr

# For this notebooks, it's nicer if we don't show the array values by default
xr.set_options(display_expand_data=False)
<xarray.core.options.set_options at 0x7f9f78cd0440>

cf_xarray works best when xarray keeps attributes by default.

xr.set_options(keep_attrs=True)
<xarray.core.options.set_options at 0x7f9f786fdf90>

Lets read two datasets.

ds = xr.tutorial.load_dataset("air_temperature")
ds.air.attrs["standard_name"] = "air_temperature"
ds
<xarray.Dataset> Size: 31MB
Dimensions:  (lat: 25, time: 2920, lon: 53)
Coordinates:
  * lat      (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
  * lon      (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
  * time     (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
    air      (time, lat, lon) float64 31MB 241.2 242.5 243.5 ... 296.2 295.7
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...

This one is inspired by POP model output and illustrates how the coordinates attribute is interpreted. It also illustrates one way of tagging curvilinear grids for convenient use of cf_xarray

from cf_xarray.datasets import popds as pop

pop
<xarray.Dataset> Size: 29kB
Dimensions:  (nlat: 20, nlon: 30)
Coordinates:
    TLONG    (nlat, nlon) float64 5kB 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0
    TLAT     (nlat, nlon) float64 5kB 2.0 2.0 2.0 2.0 2.0 ... 2.0 2.0 2.0 2.0
    ULONG    (nlat, nlon) float64 5kB 0.5 0.5 0.5 0.5 0.5 ... 0.5 0.5 0.5 0.5
    ULAT     (nlat, nlon) float64 5kB 2.5 2.5 2.5 2.5 2.5 ... 2.5 2.5 2.5 2.5
  * nlon     (nlon) int64 240B 0 1 2 3 4 5 6 7 8 ... 21 22 23 24 25 26 27 28 29
  * nlat     (nlat) int64 160B 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Data variables:
    UVEL     (nlat, nlon) float64 5kB 15.0 15.0 15.0 15.0 ... 15.0 15.0 15.0
    TEMP     (nlat, nlon) float64 5kB 15.0 15.0 15.0 15.0 ... 15.0 15.0 15.0

This synthetic dataset has multiple X and Y coords. An example would be model output on a staggered grid.

from cf_xarray.datasets import multiple

multiple
<xarray.Dataset> Size: 6kB
Dimensions:  (x1: 30, y1: 20, x2: 10, y2: 5)
Coordinates:
  * x1       (x1) int64 240B 0 1 2 3 4 5 6 7 8 9 ... 21 22 23 24 25 26 27 28 29
  * y1       (y1) int64 160B 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
  * x2       (x2) int64 80B 0 1 2 3 4 5 6 7 8 9
  * y2       (y2) int64 40B 0 1 2 3 4
Data variables:
    v1       (x1, y1) float64 5kB 15.0 15.0 15.0 15.0 ... 15.0 15.0 15.0 15.0
    v2       (x2, y2) float64 400B 15.0 15.0 15.0 15.0 ... 15.0 15.0 15.0 15.0

This dataset has ancillary variables

from cf_xarray.datasets import anc

anc
<xarray.Dataset> Size: 3kB
Dimensions:            (x: 10, y: 20)
Dimensions without coordinates: x, y
Data variables:
    q                  (x, y) float64 2kB -1.688 -0.1002 ... 0.7781 -0.5075
    q_error_limit      (x, y) float64 2kB 1.656 -0.7808 -0.473 ... -0.5146 -1.11
    q_detection_limit  float64 8B 0.001

What attributes have been discovered?

The criteria for identifying variables using CF attributes are listed here.

ds.lon
<xarray.DataArray 'lon' (lon: 53)> Size: 212B
200.0 202.5 205.0 207.5 210.0 212.5 ... 317.5 320.0 322.5 325.0 327.5 330.0
Coordinates:
  * lon      (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
Attributes:
    standard_name:  longitude
    long_name:      Longitude
    units:          degrees_east
    axis:           X

ds.lon has attributes axis: X. This means that cf_xarray can identify the 'X' axis as being represented by the lon variable.

It can also use the standard_name and units attributes to infer that lon is “Longitude”. To see variable names that cf_xarray can infer, use ds.cf

ds.cf
Coordinates:
             CF Axes: * X: ['lon']
                      * Y: ['lat']
                      * T: ['time']
                        Z: n/a

      CF Coordinates: * longitude: ['lon']
                      * latitude: ['lat']
                      * time: ['time']
                        vertical: n/a

       Cell Measures:   area, volume: n/a

      Standard Names: * latitude: ['lat']
                      * longitude: ['lon']
                      * time: ['time']

              Bounds:   n/a

       Grid Mappings:   n/a

Data Variables:
       Cell Measures:   area, volume: n/a

      Standard Names:   air_temperature: ['air']

              Bounds:   n/a

       Grid Mappings:   n/a

For pop, only latitude and longitude are detected, not X or Y. Please comment here: https://github.com/xarray-contrib/cf-xarray/issues/23 if you have opinions about this behaviour.

pop.cf
Coordinates:
             CF Axes: * X: ['nlon']
                      * Y: ['nlat']
                        Z, T: n/a

      CF Coordinates:   longitude: ['TLONG', 'ULONG']
                        latitude: ['TLAT', 'ULAT']
                        vertical, time: n/a

       Cell Measures:   area, volume: n/a

      Standard Names:   n/a

              Bounds:   n/a

       Grid Mappings:   n/a

Data Variables:
       Cell Measures:   area, volume: n/a

      Standard Names:   sea_water_potential_temperature: ['TEMP']
                        sea_water_x_velocity: ['UVEL']

              Bounds:   n/a

       Grid Mappings:   n/a

For multiple, multiple X and Y coordinates are detected

multiple.cf
Coordinates:
             CF Axes: * X: ['x1', 'x2']
                      * Y: ['y1', 'y2']
                        Z, T: n/a

      CF Coordinates:   longitude, latitude, vertical, time: n/a

       Cell Measures:   area, volume: n/a

      Standard Names:   n/a

              Bounds:   n/a

       Grid Mappings:   n/a

Data Variables:
       Cell Measures:   area, volume: n/a

      Standard Names:   n/a

              Bounds:   n/a

       Grid Mappings:   n/a

Feature: Accessing coordinate variables

.cf implements __getitem__ to allow easy access to coordinate and axis variables.

ds.cf["X"]
<xarray.DataArray 'lon' (lon: 53)> Size: 212B
200.0 202.5 205.0 207.5 210.0 212.5 ... 317.5 320.0 322.5 325.0 327.5 330.0
Coordinates:
  * lon      (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
Attributes:
    standard_name:  longitude
    long_name:      Longitude
    units:          degrees_east
    axis:           X

Indexing with a scalar key raises an error if the key maps to multiple variables names

multiple.cf["X"]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[12], line 1
----> 1 multiple.cf["X"]

File ~/checkouts/readthedocs.org/user_builds/cf-xarray/checkouts/latest/cf_xarray/accessor.py:2349, in CFDatasetAccessor.__getitem__(self, key)
   2317 def __getitem__(self, key: Hashable | Iterable[Hashable]) -> DataArray | Dataset:
   2318     """
   2319     Index into a Dataset making use of CF attributes.
   2320 
   (...)
   2347     Add additional keys by specifying "custom criteria". See :ref:`custom_criteria` for more.
   2348     """
-> 2349     return _getitem(self, key)

File ~/checkouts/readthedocs.org/user_builds/cf-xarray/checkouts/latest/cf_xarray/accessor.py:838, in _getitem(accessor, key, skip)
    836 names = _get_all(obj, k)
    837 names = drop_bounds(names)
--> 838 check_results(names, k)
    839 successful[k] = bool(names)
    840 coords.extend(names)

File ~/checkouts/readthedocs.org/user_builds/cf-xarray/checkouts/latest/cf_xarray/accessor.py:808, in _getitem.<locals>.check_results(names, key)
    806 def check_results(names, key):
    807     if scalar_key and len(names) > 1:
--> 808         raise KeyError(
    809             f"Receive multiple variables for key {key!r}: {names}. "
    810             f"Expected only one. Please pass a list [{key!r}] "
    811             f"instead to get all variables matching {key!r}."
    812         )

KeyError: "Receive multiple variables for key 'X': {'x1', 'x2'}. Expected only one. Please pass a list ['X'] instead to get all variables matching 'X'."
pop.cf["longitude"]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[13], line 1
----> 1 pop.cf["longitude"]

File ~/checkouts/readthedocs.org/user_builds/cf-xarray/checkouts/latest/cf_xarray/accessor.py:2349, in CFDatasetAccessor.__getitem__(self, key)
   2317 def __getitem__(self, key: Hashable | Iterable[Hashable]) -> DataArray | Dataset:
   2318     """
   2319     Index into a Dataset making use of CF attributes.
   2320 
   (...)
   2347     Add additional keys by specifying "custom criteria". See :ref:`custom_criteria` for more.
   2348     """
-> 2349     return _getitem(self, key)

File ~/checkouts/readthedocs.org/user_builds/cf-xarray/checkouts/latest/cf_xarray/accessor.py:838, in _getitem(accessor, key, skip)
    836 names = _get_all(obj, k)
    837 names = drop_bounds(names)
--> 838 check_results(names, k)
    839 successful[k] = bool(names)
    840 coords.extend(names)

File ~/checkouts/readthedocs.org/user_builds/cf-xarray/checkouts/latest/cf_xarray/accessor.py:808, in _getitem.<locals>.check_results(names, key)
    806 def check_results(names, key):
    807     if scalar_key and len(names) > 1:
--> 808         raise KeyError(
    809             f"Receive multiple variables for key {key!r}: {names}. "
    810             f"Expected only one. Please pass a list [{key!r}] "
    811             f"instead to get all variables matching {key!r}."
    812         )

KeyError: "Receive multiple variables for key 'longitude': {'ULONG', 'TLONG'}. Expected only one. Please pass a list ['longitude'] instead to get all variables matching 'longitude'."

To get back all variables associated with that key, pass a single element list instead.

multiple.cf[["X"]]
<xarray.Dataset> Size: 320B
Dimensions:  (x1: 30, x2: 10)
Coordinates:
  * x1       (x1) int64 240B 0 1 2 3 4 5 6 7 8 9 ... 21 22 23 24 25 26 27 28 29
  * x2       (x2) int64 80B 0 1 2 3 4 5 6 7 8 9
Data variables:
    *empty*
pop.cf[["longitude"]]
<xarray.Dataset> Size: 10kB
Dimensions:  (nlat: 20, nlon: 30)
Coordinates:
    ULONG    (nlat, nlon) float64 5kB 0.5 0.5 0.5 0.5 0.5 ... 0.5 0.5 0.5 0.5
    TLONG    (nlat, nlon) float64 5kB 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0
  * nlon     (nlon) int64 240B 0 1 2 3 4 5 6 7 8 ... 21 22 23 24 25 26 27 28 29
  * nlat     (nlat) int64 160B 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Data variables:
    *empty*

DataArrays return DataArrays

pop.UVEL.cf["longitude"]
<xarray.DataArray 'ULONG' (nlat: 20, nlon: 30)> Size: 5kB
0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ... 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
Coordinates:
  * nlon     (nlon) int64 240B 0 1 2 3 4 5 6 7 8 ... 21 22 23 24 25 26 27 28 29
  * nlat     (nlat) int64 160B 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Attributes:
    units:    degrees_east

Dataset.cf[...] returns a single DataArray, parsing the coordinates attribute if present, so we correctly get the TLONG variable and not the ULONG variable

pop.cf["TEMP"]
<xarray.DataArray 'TEMP' (nlat: 20, nlon: 30)> Size: 5kB
15.0 15.0 15.0 15.0 15.0 15.0 15.0 15.0 ... 15.0 15.0 15.0 15.0 15.0 15.0 15.0
Coordinates:
  * nlon     (nlon) int64 240B 0 1 2 3 4 5 6 7 8 ... 21 22 23 24 25 26 27 28 29
  * nlat     (nlat) int64 160B 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
    TLONG    (nlat, nlon) float64 5kB 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0
    TLAT     (nlat, nlon) float64 5kB 2.0 2.0 2.0 2.0 2.0 ... 2.0 2.0 2.0 2.0
Attributes:
    coordinates:    TLONG TLAT
    standard_name:  sea_water_potential_temperature

Dataset.cf[...] also interprets the ancillary_variables attribute. The ancillary variables are returned as coordinates of a DataArray

anc.cf["q"]
<xarray.DataArray 'q' (x: 10, y: 20)> Size: 2kB
-1.688 -0.1002 -0.2317 0.06959 -1.971 ... 0.3968 -0.7902 -0.8904 0.7781 -0.5075
Coordinates:
    q_error_limit      (x, y) float64 2kB 1.656 -0.7808 -0.473 ... -0.5146 -1.11
    q_detection_limit  float64 8B 0.001
Dimensions without coordinates: x, y
Attributes:
    standard_name:        specific_humidity
    units:                g/g
    ancillary_variables:  q_error_limit q_detection_limit

Feature: Accessing variables by standard names

pop.cf[["sea_water_potential_temperature", "UVEL"]]
<xarray.Dataset> Size: 29kB
Dimensions:  (nlat: 20, nlon: 30)
Coordinates:
    TLONG    (nlat, nlon) float64 5kB 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0
    TLAT     (nlat, nlon) float64 5kB 2.0 2.0 2.0 2.0 2.0 ... 2.0 2.0 2.0 2.0
    ULONG    (nlat, nlon) float64 5kB 0.5 0.5 0.5 0.5 0.5 ... 0.5 0.5 0.5 0.5
    ULAT     (nlat, nlon) float64 5kB 2.5 2.5 2.5 2.5 2.5 ... 2.5 2.5 2.5 2.5
  * nlon     (nlon) int64 240B 0 1 2 3 4 5 6 7 8 ... 21 22 23 24 25 26 27 28 29
  * nlat     (nlat) int64 160B 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Data variables:
    TEMP     (nlat, nlon) float64 5kB 15.0 15.0 15.0 15.0 ... 15.0 15.0 15.0
    UVEL     (nlat, nlon) float64 5kB 15.0 15.0 15.0 15.0 ... 15.0 15.0 15.0

Note that ancillary variables are included as coordinate variables

anc.cf["specific_humidity"]
<xarray.DataArray 'q' (x: 10, y: 20)> Size: 2kB
-1.688 -0.1002 -0.2317 0.06959 -1.971 ... 0.3968 -0.7902 -0.8904 0.7781 -0.5075
Coordinates:
    q_error_limit      (x, y) float64 2kB 1.656 -0.7808 -0.473 ... -0.5146 -1.11
    q_detection_limit  float64 8B 0.001
Dimensions without coordinates: x, y
Attributes:
    standard_name:        specific_humidity
    units:                g/g
    ancillary_variables:  q_error_limit q_detection_limit

Feature: Utility functions

There are some utility functions to allow use by downstream libraries

pop.cf.keys()
{'X',
 'Y',
 'latitude',
 'longitude',
 'sea_water_potential_temperature',
 'sea_water_x_velocity'}

You can test for presence of these keys

"sea_water_x_velocity" in pop.cf
True

You can also get out the available Axis names

pop.cf.axes
{'X': ['nlon'], 'Y': ['nlat']}

or available Coordinate names. Same for cell measures (.cf.cell_measures) and standard names (.cf.standard_names).

pop.cf.coordinates
{'longitude': ['TLONG', 'ULONG'], 'latitude': ['TLAT', 'ULAT']}

Note: Although it is possible to assign additional coordinates, .cf.coordinates only returns a subset of ("longitude", "latitude", "vertical", "time").

Feature: Rewriting property dictionaries

cf_xarray will rewrite the .sizes and .chunks dictionaries so that one can index by a special CF axis or coordinate name

ds.cf.sizes
{'latitude': 25, 'Y': 25, 'time': 2920, 'T': 2920, 'longitude': 53, 'X': 53}

Note the duplicate entries above:

  1. One for X, Y, T

  2. and one for longitude, latitude and time.

An error is raised if there are multiple 'X' variables (for example)

multiple.cf.sizes
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[26], line 1
----> 1 multiple.cf.sizes

File ~/checkouts/readthedocs.org/user_builds/cf-xarray/checkouts/latest/cf_xarray/accessor.py:1550, in CFAccessor.__getattr__(self, attr)
   1549 def __getattr__(self, attr):
-> 1550     return _getattr(
   1551         obj=self._obj,
   1552         attr=attr,
   1553         accessor=self,
   1554         key_mappers=_DEFAULT_KEY_MAPPERS,
   1555         wrap_classes=True,
   1556     )

File ~/checkouts/readthedocs.org/user_builds/cf-xarray/checkouts/latest/cf_xarray/accessor.py:704, in _getattr(obj, attr, accessor, key_mappers, wrap_classes, extra_decorator)
    702     for name in inverted[key]:
    703         if name in newmap:
--> 704             raise AttributeError(
    705                 f"cf_xarray can't wrap attribute {attr!r} because there are multiple values for {name!r}. "
    706                 f"There is no unique mapping from {name!r} to a value in {attr!r}."
    707             )
    708     newmap.update(dict.fromkeys(inverted[key], value))
    709 newmap.update({key: attribute[key] for key in unused_keys})

AttributeError: cf_xarray can't wrap attribute 'sizes' because there are multiple values for 'X'. There is no unique mapping from 'X' to a value in 'sizes'.
multiple.v1.cf.sizes
{'X': 30, 'Y': 20}

Feature: Renaming variables

cf_xarray lets you rewrite variables in one dataset to like variables in another dataset.

In this example, a one-to-one mapping is not possible and the coordinate variables are not renamed.

da = pop.cf["TEMP"]
da.cf.rename_like(ds)
/tmp/ipykernel_3095/2327264871.py:2: UserWarning: Conflicting variables skipped:
['TLAT']: ['lat'] (latitude)
['TLONG']: ['lon'] (longitude)
['nlat']: ['lat'] (Y)
['nlon']: ['lon'] (X)
  da.cf.rename_like(ds)
<xarray.DataArray 'TEMP' (nlat: 20, nlon: 30)> Size: 5kB
15.0 15.0 15.0 15.0 15.0 15.0 15.0 15.0 ... 15.0 15.0 15.0 15.0 15.0 15.0 15.0
Coordinates:
  * nlon     (nlon) int64 240B 0 1 2 3 4 5 6 7 8 ... 21 22 23 24 25 26 27 28 29
  * nlat     (nlat) int64 160B 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
    TLONG    (nlat, nlon) float64 5kB 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0
    TLAT     (nlat, nlon) float64 5kB 2.0 2.0 2.0 2.0 2.0 ... 2.0 2.0 2.0 2.0
Attributes:
    coordinates:    TLONG TLAT
    standard_name:  sea_water_potential_temperature

If we exclude all axes (variables with axis attribute), a one-to-one mapping is possible. In this example, TLONG and TLAT are renamed to lon and lat i.e. their counterparts in ds. Note the the coordinates attribute is appropriately changed.

da.cf.rename_like(ds, skip="axes")
<xarray.DataArray 'TEMP' (nlat: 20, nlon: 30)> Size: 5kB
15.0 15.0 15.0 15.0 15.0 15.0 15.0 15.0 ... 15.0 15.0 15.0 15.0 15.0 15.0 15.0
Coordinates:
  * nlon     (nlon) int64 240B 0 1 2 3 4 5 6 7 8 ... 21 22 23 24 25 26 27 28 29
  * nlat     (nlat) int64 160B 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
    lon      (nlat, nlon) float64 5kB 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0
    lat      (nlat, nlon) float64 5kB 2.0 2.0 2.0 2.0 2.0 ... 2.0 2.0 2.0 2.0
Attributes:
    coordinates:    lon lat
    standard_name:  sea_water_potential_temperature

Feature: Rewriting arguments

cf_xarray can rewrite arguments for a large number of xarray functions. By this I mean that instead of specifying say dim="lon", you can pass dim="X" or dim="longitude" and cf_xarray will rewrite that to dim="lon" based on the attributes present in the dataset.

Here are a few examples

Slicing

ds.air.cf.isel(T=1)
<xarray.DataArray 'air' (lat: 25, lon: 53)> Size: 11kB
242.1 242.7 243.1 243.4 243.6 243.8 ... 297.5 297.1 296.9 296.4 296.4 296.6
Coordinates:
  * lat      (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
  * lon      (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
    time     datetime64[ns] 8B 2013-01-01T06:00:00
Attributes:
    long_name:      4xDaily Air temperature at sigma level 995
    units:          degK
    precision:      2
    GRIB_id:        11
    GRIB_name:      TMP
    var_desc:       Air temperature
    dataset:        NMC Reanalysis
    level_desc:     Surface
    statistic:      Individual Obs
    parent_stat:    Other
    actual_range:   [185.16 322.1 ]
    standard_name:  air_temperature

Slicing works will expand a single key like X to multiple dimensions if those dimensions are tagged with axis: X

multiple.cf.isel(X=1, Y=1)
<xarray.Dataset> Size: 48B
Dimensions:  ()
Coordinates:
    x1       int64 8B 1
    y1       int64 8B 1
    x2       int64 8B 1
    y2       int64 8B 1
Data variables:
    v1       float64 8B 15.0
    v2       float64 8B 15.0

Reductions

ds.air.cf.mean("X")
<xarray.DataArray 'air' (time: 2920, lat: 25)> Size: 584kB
242.0 242.0 243.7 251.2 257.2 260.8 ... 292.3 294.4 295.8 297.0 297.9 298.8
Coordinates:
  * lat      (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
  * time     (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00
Attributes:
    long_name:      4xDaily Air temperature at sigma level 995
    units:          degK
    precision:      2
    GRIB_id:        11
    GRIB_name:      TMP
    var_desc:       Air temperature
    dataset:        NMC Reanalysis
    level_desc:     Surface
    statistic:      Individual Obs
    parent_stat:    Other
    actual_range:   [185.16 322.1 ]
    standard_name:  air_temperature

Expanding to multiple dimensions is also supported

# takes the mean along ["x1", "x2"]
multiple.cf.mean("X")
<xarray.Dataset> Size: 400B
Dimensions:  (y1: 20, y2: 5)
Coordinates:
  * y1       (y1) int64 160B 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
  * y2       (y2) int64 40B 0 1 2 3 4
Data variables:
    v1       (y1) float64 160B 15.0 15.0 15.0 15.0 15.0 ... 15.0 15.0 15.0 15.0
    v2       (y2) float64 40B 15.0 15.0 15.0 15.0 15.0

Plotting

ds.air.cf.isel(time=1).cf.plot(x="X", y="Y")
<matplotlib.collections.QuadMesh at 0x7f9f737ed400>
../_images/841b753d704db228cc8a1e18c485b16de0ee8a2a449701f1ad0e30f3109667c1.png
ds.air.cf.isel(T=1, Y=[0, 1, 2]).cf.plot(x="longitude", hue="latitude")
[<matplotlib.lines.Line2D at 0x7f9f6b7074d0>,
 <matplotlib.lines.Line2D at 0x7f9f6b707610>,
 <matplotlib.lines.Line2D at 0x7f9f6b707750>]
../_images/910243ff3fc8a321e63553dd4246d0d0f8bb55d54de54358c6e1d5a039fdc582.png

cf_xarray can facet

seasonal = (
    ds.air.groupby("time.season").mean().reindex(season=["DJF", "MAM", "JJA", "SON"])
)
seasonal.cf.plot(x="longitude", y="latitude", col="season")
<xarray.plot.facetgrid.FacetGrid at 0x7f9f737ef620>
../_images/eba9660c516b45668d659d4b2940f4bf22dbc06a6f1e358c9f25b67b3b0788e3.png

Resample & groupby

ds.cf.resample(T="D").mean()
<xarray.Dataset> Size: 8MB
Dimensions:  (time: 730, lat: 25, lon: 53)
Coordinates:
  * lat      (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
  * lon      (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
  * time     (time) datetime64[ns] 6kB 2013-01-01 2013-01-02 ... 2014-12-31
Data variables:
    air      (time, lat, lon) float64 8MB 241.9 242.3 242.7 ... 295.9 295.5
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...

cf_xarray also understands the “datetime accessor” syntax for groupby

ds.cf.groupby("T.month").mean("longitude")
<xarray.Dataset> Size: 607kB
Dimensions:  (time: 2920, lat: 25)
Coordinates:
  * lat      (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
  * time     (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
    air      (time, lat) float64 584kB 242.0 242.0 243.7 ... 297.0 297.9 298.8
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...

Rolling & coarsen

ds.cf.rolling(X=5).mean()
<xarray.Dataset> Size: 31MB
Dimensions:  (lat: 25, lon: 53, time: 2920)
Coordinates:
  * lat      (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
  * lon      (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
  * time     (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
    air      (time, lat, lon) float64 31MB nan nan nan nan ... 297.6 297.0 296.6
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...

coarsen works but everything later will break because of xarray bug https://github.com/pydata/xarray/issues/4120

ds.isel(lon=slice(50)).cf.coarsen(Y=5, X=10).mean()

Feature: mix “special names” and variable names

ds.cf.groupby("T.month").mean(["lat", "X"])
<xarray.Dataset> Size: 47kB
Dimensions:  (time: 2920)
Coordinates:
  * time     (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
    air      (time) float64 23kB 274.2 273.5 273.2 273.6 ... 273.0 273.0 273.4
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...

Feature: Weight by Cell Measures

cf_xarray can weight by cell measure variables if the appropriate attribute is set

# Lets make some weights (not sure if this is right)
ds.coords["cell_area"] = (
    np.cos(ds.air.cf["latitude"] * np.pi / 180)
    * xr.ones_like(ds.air.cf["longitude"])
    * 105e3
    * 110e3
)
# and set proper attributes
ds["cell_area"].attrs = dict(standard_name="cell_area", units="m2")
ds.air.attrs["cell_measures"] = "area: cell_area"
ds.air.cf.weighted("area").mean(["latitude", "time"]).cf.plot(x="longitude")
ds.air.mean(["lat", "time"]).cf.plot(x="longitude")
[<matplotlib.lines.Line2D at 0x7f9f69baca50>]
../_images/31fbf29041736d3b86d1c4d080202859b38e64a6f46997358bfe0277cff479a9.png

Feature: Cell boundaries and vertices

cf_xarray can infer cell boundaries (for rectilinear grids) and convert CF-standard bounds variables to vertices.

ds_bnds = ds.cf.add_bounds(["lat", "lon"])
ds_bnds
<xarray.Dataset> Size: 31MB
Dimensions:     (lat: 25, time: 2920, lon: 53, bounds: 2)
Coordinates:
  * lat         (lat) float32 100B 75.0 72.5 70.0 67.5 ... 22.5 20.0 17.5 15.0
  * lon         (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
  * time        (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00
    cell_area   (lat, lon) float32 5kB 2.989e+09 2.989e+09 ... 1.116e+10
    lon_bounds  (lon, bounds) float32 424B 198.8 201.2 201.2 ... 328.8 331.2
    lat_bounds  (lat, bounds) float32 200B 76.25 73.75 73.75 ... 16.25 13.75
Dimensions without coordinates: bounds
Data variables:
    air         (time, lat, lon) float64 31MB 241.2 242.5 243.5 ... 296.2 295.7
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...

We can also convert each bounds variable independently with the top-level functions

lat_bounds = ds_bnds.cf.get_bounds("latitude")

lat_vertices = cfxr.bounds_to_vertices(lat_bounds, bounds_dim="bounds")
lat_vertices
<xarray.DataArray 'lat_bounds' (lat_vertices: 26)> Size: 104B
76.25 73.75 71.25 68.75 66.25 63.75 ... 26.25 23.75 21.25 18.75 16.25 13.75
Dimensions without coordinates: lat_vertices
Attributes:
    standard_name:  latitude
    long_name:      Latitude
    units:          degrees_north
    axis:           Y
# Or we can convert _all_ bounds variables on a dataset
ds_crns = ds_bnds.cf.bounds_to_vertices()
ds_crns
<xarray.Dataset> Size: 31MB
Dimensions:       (lat: 25, time: 2920, lon: 53, bounds: 2, lat_vertices: 26,
                   lon_vertices: 54)
Coordinates:
  * lat           (lat) float32 100B 75.0 72.5 70.0 67.5 ... 22.5 20.0 17.5 15.0
  * lon           (lon) float32 212B 200.0 202.5 205.0 ... 325.0 327.5 330.0
  * time          (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00
    cell_area     (lat, lon) float32 5kB 2.989e+09 2.989e+09 ... 1.116e+10
    lon_bounds    (lon, bounds) float32 424B 198.8 201.2 201.2 ... 328.8 331.2
    lat_bounds    (lat, bounds) float32 200B 76.25 73.75 73.75 ... 16.25 13.75
  * lat_vertices  (lat_vertices) float32 104B 76.25 73.75 71.25 ... 16.25 13.75
  * lon_vertices  (lon_vertices) float32 216B 198.8 201.2 203.8 ... 328.8 331.2
Dimensions without coordinates: bounds
Data variables:
    air           (time, lat, lon) float64 31MB 241.2 242.5 ... 296.2 295.7
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...

Feature: Add canonical CF attributes

cf_xarray can add missing canonical CF attributes consistent with the official CF standard name table.

ds_canonical = ds.cf.add_canonical_attributes(verbose=True)
ds_canonical
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
Cell In[46], line 1
----> 1 ds_canonical = ds.cf.add_canonical_attributes(verbose=True)
      2 ds_canonical

File ~/checkouts/readthedocs.org/user_builds/cf-xarray/checkouts/latest/cf_xarray/accessor.py:2252, in CFAccessor.add_canonical_attributes(self, override, skip, verbose, source)
   2249 skip_ = [skip] if isinstance(skip, Hashable) else (skip or [])
   2251 # Parse table
-> 2252 info, table, aliases = parse_cf_standard_name_table(source)
   2254 # Loop over standard names
   2255 ds = self._maybe_to_dataset().copy(deep=False)

File ~/checkouts/readthedocs.org/user_builds/cf-xarray/checkouts/latest/cf_xarray/utils.py:111, in parse_cf_standard_name_table(source)
    108 if not source:
    109     import pooch
--> 111     source = pooch.retrieve(
    112         "https://raw.githubusercontent.com/cf-convention/cf-convention.github.io/"
    113         "master/Data/cf-standard-names/current/src/cf-standard-name-table.xml",
    114         known_hash=None,
    115     )
    116 root = ElementTree.parse(source).getroot()
    118 # Build dictionaries

File ~/checkouts/readthedocs.org/user_builds/cf-xarray/conda/latest/lib/python3.13/site-packages/pooch/core.py:239, in retrieve(url, known_hash, fname, path, processor, downloader, progressbar)
    236 if downloader is None:
    237     downloader = choose_downloader(url, progressbar=progressbar)
--> 239 stream_download(url, full_path, known_hash, downloader, pooch=None)
    241 if known_hash is None:
    242     get_logger().info(
    243         "SHA256 hash of downloaded file: %s\n"
    244         "Use this value as the 'known_hash' argument of 'pooch.retrieve'"
   (...)
    247         file_hash(str(full_path)),
    248     )

File ~/checkouts/readthedocs.org/user_builds/cf-xarray/conda/latest/lib/python3.13/site-packages/pooch/core.py:807, in stream_download(url, fname, known_hash, downloader, pooch, retry_if_failed)
    803 try:
    804     # Stream the file to a temporary so that we can safely check its
    805     # hash before overwriting the original.
    806     with temporary_file(path=str(fname.parent)) as tmp:
--> 807         downloader(url, tmp, pooch)
    808         hash_matches(tmp, known_hash, strict=True, source=str(fname.name))
    809         shutil.move(tmp, str(fname))

File ~/checkouts/readthedocs.org/user_builds/cf-xarray/conda/latest/lib/python3.13/site-packages/pooch/downloaders.py:221, in HTTPDownloader.__call__(self, url, output_file, pooch, check_only)
    219 try:
    220     response = requests.get(url, timeout=timeout, **kwargs)
--> 221     response.raise_for_status()
    222     content = response.iter_content(chunk_size=self.chunk_size)
    223     total = int(response.headers.get("content-length", 0))

File ~/checkouts/readthedocs.org/user_builds/cf-xarray/conda/latest/lib/python3.13/site-packages/requests/models.py:1024, in Response.raise_for_status(self)
   1019     http_error_msg = (
   1020         f"{self.status_code} Server Error: {reason} for url: {self.url}"
   1021     )
   1023 if http_error_msg:
-> 1024     raise HTTPError(http_error_msg, response=self)

HTTPError: 403 Client Error: Forbidden for url: https://raw.githubusercontent.com/cf-convention/cf-convention.github.io/master/Data/cf-standard-names/current/src/cf-standard-name-table.xml