Quickstart¶

cf_xarray allows you to write code that works on many datasets by interpreting CF-compliant attributes (.attrs) present on xarray DataArray or Dataset objects. First, let’s load a dataset.

import cf_xarray as cfxr
import xarray as xr

xr.set_options(keep_attrs=True)

ds = xr.tutorial.open_dataset("air_temperature")
ds

<xarray.Dataset> Size: 31MB
Dimensions:  (lat: 25, time: 2920, lon: 53)
Coordinates:
  * lat      (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
  * lon      (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
  * time     (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
    air      (time, lat, lon) float64 31MB ...
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...

xarray.Dataset

Dimensions:
- lat: 25
- time: 2920
- lon: 53

Coordinates: (3)

lat

(lat)

float32

75.0 72.5 70.0 ... 20.0 17.5 15.0

standard_name :: latitude
long_name :: Latitude
units :: degrees_north
axis :: Y

array([75. , 72.5, 70. , 67.5, 65. , 62.5, 60. , 57.5, 55. , 52.5, 50. , 47.5,
       45. , 42.5, 40. , 37.5, 35. , 32.5, 30. , 27.5, 25. , 22.5, 20. , 17.5,
       15. ], dtype=float32)

lon

(lon)

float32

200.0 202.5 205.0 ... 327.5 330.0

standard_name :: longitude
long_name :: Longitude
units :: degrees_east
axis :: X

array([200. , 202.5, 205. , 207.5, 210. , 212.5, 215. , 217.5, 220. , 222.5,
       225. , 227.5, 230. , 232.5, 235. , 237.5, 240. , 242.5, 245. , 247.5,
       250. , 252.5, 255. , 257.5, 260. , 262.5, 265. , 267.5, 270. , 272.5,
       275. , 277.5, 280. , 282.5, 285. , 287.5, 290. , 292.5, 295. , 297.5,
       300. , 302.5, 305. , 307.5, 310. , 312.5, 315. , 317.5, 320. , 322.5,
       325. , 327.5, 330. ], dtype=float32)

time

(time)

datetime64[ns]

2013-01-01 ... 2014-12-31T18:00:00

standard_name :: time
long_name :: Time

array(['2013-01-01T00:00:00.000000000', '2013-01-01T06:00:00.000000000',
       '2013-01-01T12:00:00.000000000', ..., '2014-12-31T06:00:00.000000000',
       '2014-12-31T12:00:00.000000000', '2014-12-31T18:00:00.000000000'],
      dtype='datetime64[ns]')

Data variables: (1)
- air
  (time, lat, lon)
  float64
  ...
  long_name :
  4xDaily Air temperature at sigma level 995
  units :
  degK
  precision :
  2
  GRIB_id :
  11
  GRIB_name :
  TMP
  var_desc :
  Air temperature
  dataset :
  NMC Reanalysis
  level_desc :
  Surface
  statistic :
  Individual Obs
  parent_stat :
  Other
  actual_range :
  [185.16 322.1 ]
```
[3869000 values with dtype=float64]
```

Indexes: (3)

lat

PandasIndex

PandasIndex(Index([75.0, 72.5, 70.0, 67.5, 65.0, 62.5, 60.0, 57.5, 55.0, 52.5, 50.0, 47.5,
       45.0, 42.5, 40.0, 37.5, 35.0, 32.5, 30.0, 27.5, 25.0, 22.5, 20.0, 17.5,
       15.0],
      dtype='float32', name='lat'))

lon

PandasIndex

PandasIndex(Index([200.0, 202.5, 205.0, 207.5, 210.0, 212.5, 215.0, 217.5, 220.0, 222.5,
       225.0, 227.5, 230.0, 232.5, 235.0, 237.5, 240.0, 242.5, 245.0, 247.5,
       250.0, 252.5, 255.0, 257.5, 260.0, 262.5, 265.0, 267.5, 270.0, 272.5,
       275.0, 277.5, 280.0, 282.5, 285.0, 287.5, 290.0, 292.5, 295.0, 297.5,
       300.0, 302.5, 305.0, 307.5, 310.0, 312.5, 315.0, 317.5, 320.0, 322.5,
       325.0, 327.5, 330.0],
      dtype='float32', name='lon'))

time

PandasIndex

PandasIndex(DatetimeIndex(['2013-01-01 00:00:00', '2013-01-01 06:00:00',
               '2013-01-01 12:00:00', '2013-01-01 18:00:00',
               '2013-01-02 00:00:00', '2013-01-02 06:00:00',
               '2013-01-02 12:00:00', '2013-01-02 18:00:00',
               '2013-01-03 00:00:00', '2013-01-03 06:00:00',
               ...
               '2014-12-29 12:00:00', '2014-12-29 18:00:00',
               '2014-12-30 00:00:00', '2014-12-30 06:00:00',
               '2014-12-30 12:00:00', '2014-12-30 18:00:00',
               '2014-12-31 00:00:00', '2014-12-31 06:00:00',
               '2014-12-31 12:00:00', '2014-12-31 18:00:00'],
              dtype='datetime64[ns]', name='time', length=2920, freq=None))

Attributes: (5)
Conventions :
COARDS
title :
4x daily NMC reanalysis (1948)
description :
Data is from NMC initialized reanalysis (4x/day). These are the 0.9950 sigma level values.
platform :
Model
references :
http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis.html

Finding CF information¶

cf_xarray registers an “accessor” named cf on import. For a quick overview of attributes that cf_xarray can interpret use .cf This will display the “repr” or a representation of all detected CF information.

ds.cf

Coordinates:
             CF Axes: * X: ['lon']
                      * Y: ['lat']
                      * T: ['time']
                        Z: n/a

      CF Coordinates: * longitude: ['lon']
                      * latitude: ['lat']
                      * time: ['time']
                        vertical: n/a

       Cell Measures:   area, volume: n/a

      Standard Names: * latitude: ['lat']
                      * longitude: ['lon']
                      * time: ['time']

              Bounds:   n/a

       Grid Mappings:   n/a

Data Variables:
       Cell Measures:   area, volume: n/a

      Standard Names:   n/a

              Bounds:   n/a

       Grid Mappings:   n/a

The plain text repr can be a little hard to read. In a Jupyter environment simply install rich and use the Jupyter extension with %load_ext rich. Then ds.cf will automatically use the rich representation. See the rich docs for more.

%load_ext rich

ds.cf

rich repr

Using attributes¶

Now instead of the usual xarray names on the right, you can use the “CF names” on the left.

ds.cf.mean("latitude")  # identical to ds.mean("lat")

<xarray.Dataset> Size: 1MB
Dimensions:  (time: 2920, lon: 53)
Coordinates:
  * lon      (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
  * time     (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
    air      (time, lon) float64 1MB 279.4 279.7 279.7 ... 279.4 280.0 280.5
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...

xarray.Dataset

Dimensions:
- time: 2920
- lon: 53

Coordinates: (2)

lon

(lon)

float32

200.0 202.5 205.0 ... 327.5 330.0

standard_name :: longitude
long_name :: Longitude
units :: degrees_east
axis :: X

array([200. , 202.5, 205. , 207.5, 210. , 212.5, 215. , 217.5, 220. , 222.5,
       225. , 227.5, 230. , 232.5, 235. , 237.5, 240. , 242.5, 245. , 247.5,
       250. , 252.5, 255. , 257.5, 260. , 262.5, 265. , 267.5, 270. , 272.5,
       275. , 277.5, 280. , 282.5, 285. , 287.5, 290. , 292.5, 295. , 297.5,
       300. , 302.5, 305. , 307.5, 310. , 312.5, 315. , 317.5, 320. , 322.5,
       325. , 327.5, 330. ], dtype=float32)

time

(time)

datetime64[ns]

2013-01-01 ... 2014-12-31T18:00:00

standard_name :: time
long_name :: Time

array(['2013-01-01T00:00:00.000000000', '2013-01-01T06:00:00.000000000',
       '2013-01-01T12:00:00.000000000', ..., '2014-12-31T06:00:00.000000000',
       '2014-12-31T12:00:00.000000000', '2014-12-31T18:00:00.000000000'],
      dtype='datetime64[ns]')

Data variables: (1)

air

(time, lon)

float64

279.4 279.7 279.7 ... 280.0 280.5

long_name :: 4xDaily Air temperature at sigma level 995
units :: degK
precision :: 2
GRIB_id :: 11
GRIB_name :: TMP
var_desc :: Air temperature
dataset :: NMC Reanalysis
level_desc :: Surface
statistic :: Individual Obs
parent_stat :: Other
actual_range :: [185.16 322.1 ]

array([[279.398 , 279.6664, 279.6612, ..., 279.9508, 280.3152, 280.6624],
       [279.0572, 279.538 , 279.7296, ..., 279.7756, 280.27  , 280.7976],
       [279.0104, 279.2808, 279.5508, ..., 279.682 , 280.1976, 280.814 ],
       ...,
       [279.63  , 279.934 , 280.534 , ..., 279.802 , 280.346 , 280.778 ],
       [279.398 , 279.666 , 280.318 , ..., 279.766 , 280.342 , 280.834 ],
       [279.27  , 279.354 , 279.882 , ..., 279.426 , 279.97  , 280.482 ]])

Indexes: (2)

lon

PandasIndex

PandasIndex(Index([200.0, 202.5, 205.0, 207.5, 210.0, 212.5, 215.0, 217.5, 220.0, 222.5,
       225.0, 227.5, 230.0, 232.5, 235.0, 237.5, 240.0, 242.5, 245.0, 247.5,
       250.0, 252.5, 255.0, 257.5, 260.0, 262.5, 265.0, 267.5, 270.0, 272.5,
       275.0, 277.5, 280.0, 282.5, 285.0, 287.5, 290.0, 292.5, 295.0, 297.5,
       300.0, 302.5, 305.0, 307.5, 310.0, 312.5, 315.0, 317.5, 320.0, 322.5,
       325.0, 327.5, 330.0],
      dtype='float32', name='lon'))

time

PandasIndex

PandasIndex(DatetimeIndex(['2013-01-01 00:00:00', '2013-01-01 06:00:00',
               '2013-01-01 12:00:00', '2013-01-01 18:00:00',
               '2013-01-02 00:00:00', '2013-01-02 06:00:00',
               '2013-01-02 12:00:00', '2013-01-02 18:00:00',
               '2013-01-03 00:00:00', '2013-01-03 06:00:00',
               ...
               '2014-12-29 12:00:00', '2014-12-29 18:00:00',
               '2014-12-30 00:00:00', '2014-12-30 06:00:00',
               '2014-12-30 12:00:00', '2014-12-30 18:00:00',
               '2014-12-31 00:00:00', '2014-12-31 06:00:00',
               '2014-12-31 12:00:00', '2014-12-31 18:00:00'],
              dtype='datetime64[ns]', name='time', length=2920, freq=None))

Attributes: (5)
Conventions :
COARDS
title :
4x daily NMC reanalysis (1948)
description :
Data is from NMC initialized reanalysis (4x/day). These are the 0.9950 sigma level values.
platform :
Model
references :
http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis.html

This works because the attributes standard_name: "latitude" and units: "degrees_north" are present on ds.latitude

ds.lat.attrs

{'standard_name': 'latitude',
 'long_name': 'Latitude',
 'units': 'degrees_north',
 'axis': 'Y'}

Tip

For a list of criteria used to identify the “latitude” variable (for e.g.) see Coordinate Criteria.

Similarly we could use ds.cf.mean("Y") because the attribute axis: "Y" is present.

Tip

For best results, we recommend you tell xarray to preserve attributes as much as possible using xr.set_options(keep_attrs=True) but be warned, this can preserve out-of-date metadata.

Tip

Sometimes datasets don’t have all the necessary attributes. Use guess_coord_axis() and add_canonical_attributes() to automatically add attributes to variables that match some heuristics.

Indexing¶

We can use these “CF names” to index into the dataset

ds.cf["latitude"]

<xarray.DataArray 'lat' (lat: 25)> Size: 100B
75.0 72.5 70.0 67.5 65.0 62.5 60.0 57.5 ... 30.0 27.5 25.0 22.5 20.0 17.5 15.0
Coordinates:
  * lat      (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
Attributes:
    standard_name:  latitude
    long_name:      Latitude
    units:          degrees_north
    axis:           Y

xarray.DataArray

'lat'

lat: 25

75.0 72.5 70.0 67.5 65.0 62.5 60.0 ... 27.5 25.0 22.5 20.0 17.5 15.0

array([75. , 72.5, 70. , 67.5, 65. , 62.5, 60. , 57.5, 55. , 52.5, 50. , 47.5,
       45. , 42.5, 40. , 37.5, 35. , 32.5, 30. , 27.5, 25. , 22.5, 20. , 17.5,
       15. ], dtype=float32)

Coordinates: (1)

lat

(lat)

float32

75.0 72.5 70.0 ... 20.0 17.5 15.0

standard_name :: latitude
long_name :: Latitude
units :: degrees_north
axis :: Y

array([75. , 72.5, 70. , 67.5, 65. , 62.5, 60. , 57.5, 55. , 52.5, 50. , 47.5,
       45. , 42.5, 40. , 37.5, 35. , 32.5, 30. , 27.5, 25. , 22.5, 20. , 17.5,
       15. ], dtype=float32)

Indexes: (1)

lat

PandasIndex

PandasIndex(Index([75.0, 72.5, 70.0, 67.5, 65.0, 62.5, 60.0, 57.5, 55.0, 52.5, 50.0, 47.5,
       45.0, 42.5, 40.0, 37.5, 35.0, 32.5, 30.0, 27.5, 25.0, 22.5, 20.0, 17.5,
       15.0],
      dtype='float32', name='lat'))

Attributes: (4)
standard_name :
latitude
long_name :
Latitude
units :
degrees_north
axis :
Y

This is particularly useful if a standard_name attribute is present. For demonstration purposes lets add one:

ds.air.attrs["standard_name"] = "air_temperature"
ds.cf["air_temperature"]

<xarray.DataArray 'air' (time: 2920, lat: 25, lon: 53)> Size: 31MB
array([[[241.2 , 242.5 , ..., 235.5 , 238.6 ],
        [243.8 , 244.5 , ..., 235.3 , 239.3 ],
        ...,
        [295.9 , 296.2 , ..., 295.9 , 295.2 ],
        [296.29, 296.79, ..., 296.79, 296.6 ]],

       [[242.1 , 242.7 , ..., 233.6 , 235.8 ],
        [243.6 , 244.1 , ..., 232.5 , 235.7 ],
        ...,
        [296.2 , 296.7 , ..., 295.5 , 295.1 ],
        [296.29, 297.2 , ..., 296.4 , 296.6 ]],

       ...,

       [[245.79, 244.79, ..., 243.99, 244.79],
        [249.89, 249.29, ..., 242.49, 244.29],
        ...,
        [296.29, 297.19, ..., 295.09, 294.39],
        [297.79, 298.39, ..., 295.49, 295.19]],

       [[245.09, 244.29, ..., 241.49, 241.79],
        [249.89, 249.29, ..., 240.29, 241.69],
        ...,
        [296.09, 296.89, ..., 295.69, 295.19],
        [297.69, 298.09, ..., 296.19, 295.69]]])
Coordinates:
  * lat      (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
  * lon      (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
  * time     (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00
Attributes:
    long_name:      4xDaily Air temperature at sigma level 995
    units:          degK
    precision:      2
    GRIB_id:        11
    GRIB_name:      TMP
    var_desc:       Air temperature
    dataset:        NMC Reanalysis
    level_desc:     Surface
    statistic:      Individual Obs
    parent_stat:    Other
    actual_range:   [185.16 322.1 ]
    standard_name:  air_temperature

xarray.DataArray

'air'

time: 2920
lat: 25
lon: 53

241.2 242.5 243.5 244.0 244.1 243.9 ... 297.4 297.2 296.5 296.2 295.7

array([[[241.2 , 242.5 , ..., 235.5 , 238.6 ],
        [243.8 , 244.5 , ..., 235.3 , 239.3 ],
        ...,
        [295.9 , 296.2 , ..., 295.9 , 295.2 ],
        [296.29, 296.79, ..., 296.79, 296.6 ]],

       [[242.1 , 242.7 , ..., 233.6 , 235.8 ],
        [243.6 , 244.1 , ..., 232.5 , 235.7 ],
        ...,
        [296.2 , 296.7 , ..., 295.5 , 295.1 ],
        [296.29, 297.2 , ..., 296.4 , 296.6 ]],

       ...,

       [[245.79, 244.79, ..., 243.99, 244.79],
        [249.89, 249.29, ..., 242.49, 244.29],
        ...,
        [296.29, 297.19, ..., 295.09, 294.39],
        [297.79, 298.39, ..., 295.49, 295.19]],

       [[245.09, 244.29, ..., 241.49, 241.79],
        [249.89, 249.29, ..., 240.29, 241.69],
        ...,
        [296.09, 296.89, ..., 295.69, 295.19],
        [297.69, 298.09, ..., 296.19, 295.69]]])

Coordinates: (3)

lat

(lat)

float32

75.0 72.5 70.0 ... 20.0 17.5 15.0

standard_name :: latitude
long_name :: Latitude
units :: degrees_north
axis :: Y

array([75. , 72.5, 70. , 67.5, 65. , 62.5, 60. , 57.5, 55. , 52.5, 50. , 47.5,
       45. , 42.5, 40. , 37.5, 35. , 32.5, 30. , 27.5, 25. , 22.5, 20. , 17.5,
       15. ], dtype=float32)

lon

(lon)

float32

200.0 202.5 205.0 ... 327.5 330.0

standard_name :: longitude
long_name :: Longitude
units :: degrees_east
axis :: X

array([200. , 202.5, 205. , 207.5, 210. , 212.5, 215. , 217.5, 220. , 222.5,
       225. , 227.5, 230. , 232.5, 235. , 237.5, 240. , 242.5, 245. , 247.5,
       250. , 252.5, 255. , 257.5, 260. , 262.5, 265. , 267.5, 270. , 272.5,
       275. , 277.5, 280. , 282.5, 285. , 287.5, 290. , 292.5, 295. , 297.5,
       300. , 302.5, 305. , 307.5, 310. , 312.5, 315. , 317.5, 320. , 322.5,
       325. , 327.5, 330. ], dtype=float32)

time

(time)

datetime64[ns]

2013-01-01 ... 2014-12-31T18:00:00

standard_name :: time
long_name :: Time

array(['2013-01-01T00:00:00.000000000', '2013-01-01T06:00:00.000000000',
       '2013-01-01T12:00:00.000000000', ..., '2014-12-31T06:00:00.000000000',
       '2014-12-31T12:00:00.000000000', '2014-12-31T18:00:00.000000000'],
      dtype='datetime64[ns]')

Indexes: (3)

lat

PandasIndex

PandasIndex(Index([75.0, 72.5, 70.0, 67.5, 65.0, 62.5, 60.0, 57.5, 55.0, 52.5, 50.0, 47.5,
       45.0, 42.5, 40.0, 37.5, 35.0, 32.5, 30.0, 27.5, 25.0, 22.5, 20.0, 17.5,
       15.0],
      dtype='float32', name='lat'))

lon

PandasIndex

PandasIndex(Index([200.0, 202.5, 205.0, 207.5, 210.0, 212.5, 215.0, 217.5, 220.0, 222.5,
       225.0, 227.5, 230.0, 232.5, 235.0, 237.5, 240.0, 242.5, 245.0, 247.5,
       250.0, 252.5, 255.0, 257.5, 260.0, 262.5, 265.0, 267.5, 270.0, 272.5,
       275.0, 277.5, 280.0, 282.5, 285.0, 287.5, 290.0, 292.5, 295.0, 297.5,
       300.0, 302.5, 305.0, 307.5, 310.0, 312.5, 315.0, 317.5, 320.0, 322.5,
       325.0, 327.5, 330.0],
      dtype='float32', name='lon'))

time

PandasIndex

PandasIndex(DatetimeIndex(['2013-01-01 00:00:00', '2013-01-01 06:00:00',
               '2013-01-01 12:00:00', '2013-01-01 18:00:00',
               '2013-01-02 00:00:00', '2013-01-02 06:00:00',
               '2013-01-02 12:00:00', '2013-01-02 18:00:00',
               '2013-01-03 00:00:00', '2013-01-03 06:00:00',
               ...
               '2014-12-29 12:00:00', '2014-12-29 18:00:00',
               '2014-12-30 00:00:00', '2014-12-30 06:00:00',
               '2014-12-30 12:00:00', '2014-12-30 18:00:00',
               '2014-12-31 00:00:00', '2014-12-31 06:00:00',
               '2014-12-31 12:00:00', '2014-12-31 18:00:00'],
              dtype='datetime64[ns]', name='time', length=2920, freq=None))

Attributes: (12)
long_name :
4xDaily Air temperature at sigma level 995
units :
degK
precision :
2
GRIB_id :
11
GRIB_name :
TMP
var_desc :
Air temperature
dataset :
NMC Reanalysis
level_desc :
Surface
statistic :
Individual Obs
parent_stat :
Other
actual_range :
[185.16 322.1 ]
standard_name :
air_temperature

Finding variable names¶

Sometimes it is more useful to extract the actual variable names associated with a given “CF name”. cf_xarray exposes these variable names under a few properties:

Dataset.cf.axes,
Dataset.cf.bounds,
Dataset.cf.cell_measures,
Dataset.cf.cf_roles,
Dataset.cf.coordinates,
Dataset.cf.formula_terms,
Dataset.cf.grid_mapping_names, and
Dataset.cf.standard_names.

These properties all return dictionaries mapping a standard key name to a list of matching variable names in the Dataset or DataArray.

ds.cf.axes

{'X': ['lon'], 'Y': ['lat'], 'T': ['time']}

ds.cf.coordinates

{'longitude': ['lon'], 'latitude': ['lat'], 'time': ['time']}

ds.cf.standard_names

{'latitude': ['lat'],
 'air_temperature': ['air'],
 'longitude': ['lon'],
 'time': ['time']}