Quickstart#

cf_xarray allows you to write code that works on many datasets by interpreting CF-compliant attributes (.attrs) present on xarray DataArray or Dataset objects. First, let’s load a dataset.

import cf_xarray as cfxr
import xarray as xr

xr.set_options(keep_attrs=True)

ds = xr.tutorial.open_dataset("air_temperature")
ds
<xarray.Dataset> Size: 15MB
Dimensions:  (lat: 25, time: 2920, lon: 53)
Coordinates:
  * lat      (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
  * lon      (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
  * time     (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
    air      (time, lat, lon) float32 15MB ...
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...

Finding CF information#

cf_xarray registers an “accessor” named cf on import. For a quick overview of attributes that cf_xarray can interpret use .cf This will display the “repr” or a representation of all detected CF information.

ds.cf
Coordinates:
             CF Axes: * X: ['lon']
                      * Y: ['lat']
                      * T: ['time']
                        Z: n/a

      CF Coordinates: * longitude: ['lon']
                      * latitude: ['lat']
                      * time: ['time']
                        vertical: n/a

       Cell Measures:   area, volume: n/a

      Standard Names: * latitude: ['lat']
                      * longitude: ['lon']
                      * time: ['time']

              Bounds:   n/a

       Grid Mappings:   n/a

Data Variables:
       Cell Measures:   area, volume: n/a

      Standard Names:   n/a

              Bounds:   n/a

       Grid Mappings:   n/a

The plain text repr can be a little hard to read. In a Jupyter environment simply install rich and use the Jupyter extension with %load_ext rich. Then ds.cf will automatically use the rich representation. See the rich docs for more.

%load_ext rich

ds.cf

rich repr

Using attributes#

Now instead of the usual xarray names on the right, you can use the “CF names” on the left.

ds.cf.mean("latitude")  # identical to ds.mean("lat")
<xarray.Dataset> Size: 643kB
Dimensions:  (time: 2920, lon: 53)
Coordinates:
  * lon      (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
  * time     (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
    air      (time, lon) float32 619kB 279.4 279.7 279.7 ... 279.4 280.0 280.5
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...

This works because the attributes standard_name: "latitude" and units: "degrees_north" are present on ds.latitude

ds.lat.attrs
{'standard_name': 'latitude',
 'long_name': 'Latitude',
 'units': 'degrees_north',
 'axis': 'Y'}

Tip

For a list of criteria used to identify the “latitude” variable (for e.g.) see Coordinate Criteria.

Similarly we could use ds.cf.mean("Y") because the attribute axis: "Y" is present.

Tip

For best results, we recommend you tell xarray to preserve attributes as much as possible using xr.set_options(keep_attrs=True) but be warned, this can preserve out-of-date metadata.

Tip

Sometimes datasets don’t have all the necessary attributes. Use guess_coord_axis() and add_canonical_attributes() to automatically add attributes to variables that match some heuristics.

Indexing#

We can use these “CF names” to index into the dataset

ds.cf["latitude"]
<xarray.DataArray 'lat' (lat: 25)> Size: 100B
75.0 72.5 70.0 67.5 65.0 62.5 60.0 57.5 ... 30.0 27.5 25.0 22.5 20.0 17.5 15.0
Coordinates:
  * lat      (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
Attributes:
    standard_name:  latitude
    long_name:      Latitude
    units:          degrees_north
    axis:           Y

This is particularly useful if a standard_name attribute is present. For demonstration purposes lets add one:

ds.air.attrs["standard_name"] = "air_temperature"
ds.cf["air_temperature"]
<xarray.DataArray 'air' (time: 2920, lat: 25, lon: 53)> Size: 15MB
array([[[241.2    , 242.5    , ..., 235.5    , 238.59999],
        [243.79999, 244.5    , ..., 235.29999, 239.29999],
        ...,
        [295.9    , 296.19998, ..., 295.9    , 295.19998],
        [296.29   , 296.79   , ..., 296.79   , 296.6    ]],

       [[242.09999, 242.7    , ..., 233.59999, 235.79999],
        [243.59999, 244.09999, ..., 232.5    , 235.7    ],
        ...,
        [296.19998, 296.69998, ..., 295.5    , 295.1    ],
        [296.29   , 297.19998, ..., 296.4    , 296.6    ]],

       ...,

       [[245.79   , 244.79   , ..., 243.98999, 244.79   ],
        [249.89   , 249.29   , ..., 242.48999, 244.29   ],
        ...,
        [296.29   , 297.19   , ..., 295.09   , 294.38998],
        [297.79   , 298.38998, ..., 295.49   , 295.19   ]],

       [[245.09   , 244.29   , ..., 241.48999, 241.79   ],
        [249.89   , 249.29   , ..., 240.29   , 241.68999],
        ...,
        [296.09   , 296.88998, ..., 295.69   , 295.19   ],
        [297.69   , 298.09   , ..., 296.19   , 295.69   ]]], dtype=float32)
Coordinates:
  * lat      (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
  * lon      (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
  * time     (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00
Attributes:
    long_name:      4xDaily Air temperature at sigma level 995
    units:          degK
    precision:      2
    GRIB_id:        11
    GRIB_name:      TMP
    var_desc:       Air temperature
    dataset:        NMC Reanalysis
    level_desc:     Surface
    statistic:      Individual Obs
    parent_stat:    Other
    actual_range:   [185.16 322.1 ]
    standard_name:  air_temperature

Finding variable names#

Sometimes it is more useful to extract the actual variable names associated with a given “CF name”. cf_xarray exposes these variable names under a few properties:

These properties all return dictionaries mapping a standard key name to a list of matching variable names in the Dataset or DataArray.

ds.cf.axes
{'X': ['lon'], 'Y': ['lat'], 'T': ['time']}
ds.cf.coordinates
{'longitude': ['lon'], 'latitude': ['lat'], 'time': ['time']}
ds.cf.standard_names
{'latitude': ['lat'],
 'air_temperature': ['air'],
 'longitude': ['lon'],
 'time': ['time']}