Selecting DataArrays#

A second powerful feature of cf_xarray is the ability select DataArrays using special “CF names” like the “latitude”, or “longitude” coordinate names, “X” or “Y” axes names, oreven using the standard_name attribute if present.

To demonstrate this, let’s load a few datasets

from cf_xarray.datasets import airds, anc, multiple, popds as pop

By axis and coordinate name#

Lets select the "X" axis on airds.

# identical to airds["lon"]
airds.cf["X"]
<xarray.DataArray 'lon' (lon: 50)> Size: 200B
200.0 202.5 205.0 207.5 210.0 212.5 ... 310.0 312.5 315.0 317.5 320.0 322.5
Coordinates:
  * lon      (lon) float32 200B 200.0 202.5 205.0 207.5 ... 317.5 320.0 322.5
Attributes:
    standard_name:  longitude
    long_name:      Longitude
    units:          degrees_east
    axis:           X

This works because airds.lon.attrs contains axis: "X"

airds.cf
Coordinates:
             CF Axes: * X: ['lon']
                      * Y: ['lat']
                      * T: ['time']
                        Z: n/a

      CF Coordinates: * longitude: ['lon']
                      * latitude: ['lat']
                      * time: ['time']
                        vertical: n/a

       Cell Measures:   area: ['cell_area']
                        volume: n/a

      Standard Names: * latitude: ['lat']
                      * longitude: ['lon']
                      * time: ['time']

              Bounds:   n/a

       Grid Mappings:   n/a

Data Variables:
       Cell Measures:   area, volume: n/a

      Standard Names:   air_temperature: ['air']

              Bounds:   n/a

       Grid Mappings:   n/a

By standard name#

The variable airds.air has standard_name: "air_temperature", so we can use that to pull it out:

airds.cf["air_temperature"]
<xarray.DataArray 'air' (time: 4, lat: 25, lon: 50)> Size: 20kB
[5000 values with dtype=float32]
Coordinates:
  * lat        (lat) float32 100B 75.0 72.5 70.0 67.5 ... 22.5 20.0 17.5 15.0
  * lon        (lon) float32 200B 200.0 202.5 205.0 207.5 ... 317.5 320.0 322.5
  * time       (time) datetime64[ns] 32B 2013-01-01 ... 2013-01-01T18:00:00
    cell_area  (lat, lon) float32 5kB 2.989e+09 2.989e+09 ... 1.116e+10
Attributes: (12/13)
    long_name:      4xDaily Air temperature at sigma level 995
    units:          degK
    precision:      2
    GRIB_id:        11
    GRIB_name:      TMP
    var_desc:       Air temperature
    ...             ...
    level_desc:     Surface
    statistic:      Individual Obs
    parent_stat:    Other
    actual_range:   [185.16 322.1 ]
    cell_measures:  area: cell_area
    standard_name:  air_temperature

By cf_role#

cf_xarray supports identifying variables by the cf_role attribute.

ds = xr.Dataset(
    {"temp": ("x", np.arange(10))},
    coords={"cast": ("x", np.arange(10), {"cf_role": "profile_id"})}
)
ds.cf["profile_id"]
<xarray.DataArray 'cast' (x: 10)> Size: 80B
0 1 2 3 4 5 6 7 8 9
Dimensions without coordinates: x
Attributes:
    cf_role:  profile_id

Associated variables#

.cf[key] will return a DataArray or Dataset containing all variables associated with the key including ancillary variables and bounds variables (if possible).

In the following, note that the “ancillary variables” q_error_limit and q_detection_limit were also returned

anc.cf["specific_humidity"]
<xarray.DataArray 'q' (x: 10, y: 20)> Size: 2kB
0.8571 0.8271 0.09654 1.279 0.2147 ... -2.251 0.9507 0.8362 -0.6667 -0.478
Coordinates:
    q_error_limit      (x, y) float64 2kB 0.8947 0.5252 1.505 ... 1.178 0.9302
    q_detection_limit  float64 8B 0.001
Dimensions without coordinates: x, y
Attributes:
    standard_name:        specific_humidity
    units:                g/g
    ancillary_variables:  q_error_limit q_detection_limit

even though they are “data variables” and not “coordinate variables” in the original Dataset.

anc
<xarray.Dataset> Size: 3kB
Dimensions:            (x: 10, y: 20)
Dimensions without coordinates: x, y
Data variables:
    q                  (x, y) float64 2kB 0.8571 0.8271 ... -0.6667 -0.478
    q_error_limit      (x, y) float64 2kB 0.8947 0.5252 1.505 ... 1.178 0.9302
    q_detection_limit  float64 8B 0.001

Selecting multiple variables#

Sometimes a Dataset may contain multiple X or multiple longitude variables. In that case a simple .cf["X"] will raise an error. Instead follow Xarray convention and pass a list .cf[["X"]] to receive a Dataset with all available "X" variables

multiple.cf[["X"]]
<xarray.Dataset> Size: 320B
Dimensions:  (x2: 10, x1: 30)
Coordinates:
  * x2       (x2) int64 80B 0 1 2 3 4 5 6 7 8 9
  * x1       (x1) int64 240B 0 1 2 3 4 5 6 7 8 9 ... 21 22 23 24 25 26 27 28 29
Data variables:
    *empty*
pop.cf[["longitude"]]
<xarray.Dataset> Size: 10kB
Dimensions:  (nlat: 20, nlon: 30)
Coordinates:
    TLONG    (nlat, nlon) float64 5kB 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0
    ULONG    (nlat, nlon) float64 5kB 0.5 0.5 0.5 0.5 0.5 ... 0.5 0.5 0.5 0.5
  * nlon     (nlon) int64 240B 0 1 2 3 4 5 6 7 8 ... 21 22 23 24 25 26 27 28 29
  * nlat     (nlat) int64 160B 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Data variables:
    *empty*

Mixing names#

cf_xarray aims to be as friendly as possible, so it is possible to mix “CF names” and normal variable names. Here we select UVEL and TEMP by using the standard_name of TEMP (which is sea_water_potential_temperature)

pop.cf[["sea_water_potential_temperature", "UVEL"]]
<xarray.Dataset> Size: 29kB
Dimensions:  (nlat: 20, nlon: 30)
Coordinates:
    TLONG    (nlat, nlon) float64 5kB 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0
    TLAT     (nlat, nlon) float64 5kB 2.0 2.0 2.0 2.0 2.0 ... 2.0 2.0 2.0 2.0
    ULONG    (nlat, nlon) float64 5kB 0.5 0.5 0.5 0.5 0.5 ... 0.5 0.5 0.5 0.5
    ULAT     (nlat, nlon) float64 5kB 2.5 2.5 2.5 2.5 2.5 ... 2.5 2.5 2.5 2.5
  * nlon     (nlon) int64 240B 0 1 2 3 4 5 6 7 8 ... 21 22 23 24 25 26 27 28 29
  * nlat     (nlat) int64 160B 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Data variables:
    TEMP     (nlat, nlon) float64 5kB 15.0 15.0 15.0 15.0 ... 15.0 15.0 15.0
    UVEL     (nlat, nlon) float64 5kB 15.0 15.0 15.0 15.0 ... 15.0 15.0 15.0