Quickstart¶
cf_xarray
allows you to write code that works on many datasets by interpreting CF-compliant attributes (.attrs
) present on xarray DataArray
or Dataset
objects. First, let’s load a dataset.
import cf_xarray as cfxr
import xarray as xr
xr.set_options(keep_attrs=True)
ds = xr.tutorial.open_dataset("air_temperature")
ds
<xarray.Dataset> Size: 31MB Dimensions: (lat: 25, time: 2920, lon: 53) Coordinates: * lat (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0 * lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0 * time (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00 Data variables: air (time, lat, lon) float64 31MB ... Attributes: Conventions: COARDS title: 4x daily NMC reanalysis (1948) description: Data is from NMC initialized reanalysis\n(4x/day). These a... platform: Model references: http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...
Finding CF information¶
cf_xarray
registers an “accessor” named cf
on import. For a quick overview of attributes that cf_xarray
can interpret use .cf
This will display the “repr” or a representation of all detected CF information.
ds.cf
Coordinates:
CF Axes: * X: ['lon']
* Y: ['lat']
* T: ['time']
Z: n/a
CF Coordinates: * longitude: ['lon']
* latitude: ['lat']
* time: ['time']
vertical: n/a
Cell Measures: area, volume: n/a
Standard Names: * latitude: ['lat']
* longitude: ['lon']
* time: ['time']
Bounds: n/a
Grid Mappings: n/a
Data Variables:
Cell Measures: area, volume: n/a
Standard Names: n/a
Bounds: n/a
Grid Mappings: n/a
The plain text repr can be a little hard to read. In a Jupyter environment simply install rich
and
use the Jupyter extension with %load_ext rich
. Then ds.cf
will automatically use the rich
representation.
See the rich docs for more.
%load_ext rich
ds.cf
Using attributes¶
Now instead of the usual xarray names on the right, you can use the “CF names” on the left.
ds.cf.mean("latitude") # identical to ds.mean("lat")
<xarray.Dataset> Size: 1MB Dimensions: (time: 2920, lon: 53) Coordinates: * lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0 * time (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00 Data variables: air (time, lon) float64 1MB 279.4 279.7 279.7 ... 279.4 280.0 280.5 Attributes: Conventions: COARDS title: 4x daily NMC reanalysis (1948) description: Data is from NMC initialized reanalysis\n(4x/day). These a... platform: Model references: http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...
This works because the attributes standard_name: "latitude"
and units: "degrees_north"
are present on ds.latitude
ds.lat.attrs
{'standard_name': 'latitude',
'long_name': 'Latitude',
'units': 'degrees_north',
'axis': 'Y'}
Tip
For a list of criteria used to identify the “latitude” variable (for e.g.) see Coordinate Criteria.
Similarly we could use ds.cf.mean("Y")
because the attribute axis: "Y"
is present.
Tip
For best results, we recommend you tell xarray to preserve attributes as much as possible using xr.set_options(keep_attrs=True)
but be warned, this can preserve out-of-date metadata.
Tip
Sometimes datasets don’t have all the necessary attributes. Use guess_coord_axis()
and add_canonical_attributes()
to automatically add attributes to variables that match some heuristics.
Indexing¶
We can use these “CF names” to index into the dataset
ds.cf["latitude"]
<xarray.DataArray 'lat' (lat: 25)> Size: 100B 75.0 72.5 70.0 67.5 65.0 62.5 60.0 57.5 ... 30.0 27.5 25.0 22.5 20.0 17.5 15.0 Coordinates: * lat (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0 Attributes: standard_name: latitude long_name: Latitude units: degrees_north axis: Y
This is particularly useful if a standard_name
attribute is present. For demonstration purposes lets add one:
ds.air.attrs["standard_name"] = "air_temperature"
ds.cf["air_temperature"]
<xarray.DataArray 'air' (time: 2920, lat: 25, lon: 53)> Size: 31MB array([[[241.2 , 242.5 , ..., 235.5 , 238.6 ], [243.8 , 244.5 , ..., 235.3 , 239.3 ], ..., [295.9 , 296.2 , ..., 295.9 , 295.2 ], [296.29, 296.79, ..., 296.79, 296.6 ]], [[242.1 , 242.7 , ..., 233.6 , 235.8 ], [243.6 , 244.1 , ..., 232.5 , 235.7 ], ..., [296.2 , 296.7 , ..., 295.5 , 295.1 ], [296.29, 297.2 , ..., 296.4 , 296.6 ]], ..., [[245.79, 244.79, ..., 243.99, 244.79], [249.89, 249.29, ..., 242.49, 244.29], ..., [296.29, 297.19, ..., 295.09, 294.39], [297.79, 298.39, ..., 295.49, 295.19]], [[245.09, 244.29, ..., 241.49, 241.79], [249.89, 249.29, ..., 240.29, 241.69], ..., [296.09, 296.89, ..., 295.69, 295.19], [297.69, 298.09, ..., 296.19, 295.69]]]) Coordinates: * lat (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0 * lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0 * time (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00 Attributes: long_name: 4xDaily Air temperature at sigma level 995 units: degK precision: 2 GRIB_id: 11 GRIB_name: TMP var_desc: Air temperature dataset: NMC Reanalysis level_desc: Surface statistic: Individual Obs parent_stat: Other actual_range: [185.16 322.1 ] standard_name: air_temperature
Finding variable names¶
Sometimes it is more useful to extract the actual variable names associated with a given “CF name”. cf_xarray
exposes these variable names under a few properties:
These properties all return dictionaries mapping a standard key name to a list of matching variable names in the Dataset or DataArray.
ds.cf.axes
{'X': ['lon'], 'Y': ['lat'], 'T': ['time']}
ds.cf.coordinates
{'longitude': ['lon'], 'latitude': ['lat'], 'time': ['time']}
ds.cf.standard_names
{'latitude': ['lat'],
'air_temperature': ['air'],
'longitude': ['lon'],
'time': ['time']}