Custom Criteria#

Fundamentally, cf_xarray uses rules or “criteria” to interpret user input using the attributes of an Xarray object (.attrs). These criteria are simple dictionaries. For example, here are the criteria used for identifying a “latitude” variable:

coordinate_criteria = {
    "latitude": {
        "standard_name": ("latitude",),
        "units": (
            "degree_north",
            "degree_N",
            "degreeN",
            "degrees_north",
            "degrees_N",
            "degreesN",
        ),
        "_CoordinateAxisType": ("Lat",),
    },
}

This dictionary maps the user input ("latitude") to another dictionary which in turn maps an attribute name to a tuple of acceptable values for that attribute. So any variable with either standard_name: latitude or _CoordinateAxisType: Lat_ or any of the units listed above will match the user-input "latitude".

cf_xarray lets you provide your own custom criteria in addition to those built-in. Here’s an example:

import cf_xarray as cfxr
import numpy as np
import xarray as xr

ds = xr.Dataset({
    "salt1": ("x", np.arange(10), {"standard_name": "sea_water_salinity"}),
    "salt2": ("x", np.arange(10), {"standard_name": "sea_water_practical_salinity"}),
})

# first define our criteria
salt_criteria = {
    "sea_water_salinity": {
        "standard_name": "sea_water_salinity|sea_water_practical_salinity"
        }
}

Now we apply our custom criteria temporarily using set_options() as a context manager. The following sets "sea_water_salinity" as an alias for variables that have either "sea_water_salinity" or "sea_water_practical_salinity" (note the use of regular expressions as a value). Here’s how that works in practice

with cfxr.set_options(custom_criteria=salt_criteria):
    salty = ds.cf[["sea_water_salinity"]]
salty
<xarray.Dataset> Size: 160B
Dimensions:  (x: 10)
Dimensions without coordinates: x
Data variables:
    salt1    (x) int64 80B 0 1 2 3 4 5 6 7 8 9
    salt2    (x) int64 80B 0 1 2 3 4 5 6 7 8 9

Note that salty contains both salt1 and salt2. Without setting these criteria, we would only get salt1 by default

ds.cf[["sea_water_salinity"]]
<xarray.Dataset> Size: 80B
Dimensions:  (x: 10)
Dimensions without coordinates: x
Data variables:
    salt1    (x) int64 80B 0 1 2 3 4 5 6 7 8 9

We can also use set_options() to set the criteria globally.

cfxr.set_options(custom_criteria=salt_criteria)
ds.cf[["sea_water_salinity"]]
<xarray.Dataset> Size: 160B
Dimensions:  (x: 10)
Dimensions without coordinates: x
Data variables:
    salt1    (x) int64 80B 0 1 2 3 4 5 6 7 8 9
    salt2    (x) int64 80B 0 1 2 3 4 5 6 7 8 9

Again we get back both salt1 and salt2. To limit side effects of setting criteria globally, we recommend that you use set_options as a context manager.

Tip

To reset your custom criteria use cfxr.set_options(custom_criteria=())

You can also match on the variable name, though be careful!

salt_criteria = {
    "salinity": {"name": "salt*"}
}
cfxr.set_options(custom_criteria=salt_criteria)

ds.cf[["salinity"]]
<xarray.Dataset> Size: 160B
Dimensions:  (x: 10)
Dimensions without coordinates: x
Data variables:
    salt1    (x) int64 80B 0 1 2 3 4 5 6 7 8 9
    salt2    (x) int64 80B 0 1 2 3 4 5 6 7 8 9

More complex matches with regex#

Here is an example of a more complicated custom criteria, which requires the package regex to be installed since a behavior (allowing global flags like “(?i)” for matching case insensitive) was recently deprecated in the re package. The custom criteria, called “vocab”, matches – case insensitive – to the variable alias “sea_ice_u” a variable whose name includes “sea” and “ice” and “u” but not “qc” or “status”, or “sea” and “ice” and “x” and “vel” but not “qc” or “status”.

import cf_xarray as cfxr
import xarray as xr

vocab = {"sea_ice_u": {"name": "(?i)^(?!.*(qc|status))(?=.*sea)(?=.*ice)(?=.*u)|(?i)^(?!.*(qc|status))(?=.*sea)(?=.*ice)(?=.*x)(?=.*vel)"}}
ds = xr.Dataset()
ds["sea_ice_velocity_x"] = [0,1,2]

with cfxr.set_options(custom_criteria=vocab):
    seaiceu = ds.cf["sea_ice_u"]
seaiceu
<xarray.DataArray 'sea_ice_velocity_x' (sea_ice_velocity_x: 3)> Size: 24B
0 1 2
Coordinates:
  * sea_ice_velocity_x  (sea_ice_velocity_x) int64 24B 0 1 2