Selecting DataArrays#

A second powerful feature of cf_xarray is the ability select DataArrays using special “CF names” like the “latitude”, or “longitude” coordinate names, “X” or “Y” axes names, oreven using the standard_name attribute if present.

To demonstrate this, let’s load a few datasets

from cf_xarray.datasets import airds, anc, multiple, popds as pop

By axis and coordinate name#

Lets select the "X" axis on airds.

# identical to airds["lon"]
airds.cf["X"]
<xarray.DataArray 'lon' (lon: 50)>
200.0 202.5 205.0 207.5 210.0 212.5 ... 310.0 312.5 315.0 317.5 320.0 322.5
Coordinates:
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 315.0 317.5 320.0 322.5
Attributes:
    standard_name:  longitude
    long_name:      Longitude
    units:          degrees_east
    axis:           X

This works because airds.lon.attrs contains axis: "X"

airds.cf
Coordinates:
             CF Axes: * X: ['lon']
                      * Y: ['lat']
                      * T: ['time']
                        Z: n/a

      CF Coordinates: * longitude: ['lon']
                      * latitude: ['lat']
                      * time: ['time']
                        vertical: n/a

       Cell Measures:   area: ['cell_area']
                        volume: n/a

      Standard Names: * latitude: ['lat']
                      * longitude: ['lon']
                      * time: ['time']

              Bounds:   n/a

       Grid Mappings:   n/a

Data Variables:
       Cell Measures:   area, volume: n/a

      Standard Names:   air_temperature: ['air']

              Bounds:   n/a

       Grid Mappings:   n/a

By standard name#

The variable airds.air has standard_name: "air_temperature", so we can use that to pull it out:

airds.cf["air_temperature"]
<xarray.DataArray 'air' (time: 4, lat: 25, lon: 50)>
[5000 values with dtype=float32]
Coordinates:
  * lat        (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
  * lon        (lon) float32 200.0 202.5 205.0 207.5 ... 315.0 317.5 320.0 322.5
  * time       (time) datetime64[ns] 2013-01-01 ... 2013-01-01T18:00:00
    cell_area  (lat, lon) float32 2.989e+09 2.989e+09 ... 1.116e+10 1.116e+10
Attributes: (12/13)
    long_name:      4xDaily Air temperature at sigma level 995
    units:          degK
    precision:      2
    GRIB_id:        11
    GRIB_name:      TMP
    var_desc:       Air temperature
    ...             ...
    level_desc:     Surface
    statistic:      Individual Obs
    parent_stat:    Other
    actual_range:   [185.16 322.1 ]
    cell_measures:  area: cell_area
    standard_name:  air_temperature

By cf_role#

cf_xarray supports identifying variables by the cf_role attribute.

ds = xr.Dataset(
    {"temp": ("x", np.arange(10))},
    coords={"cast": ("x", np.arange(10), {"cf_role": "profile_id"})}
)
ds.cf["profile_id"]
<xarray.DataArray 'cast' (x: 10)>
0 1 2 3 4 5 6 7 8 9
Dimensions without coordinates: x
Attributes:
    cf_role:  profile_id

Associated variables#

.cf[key] will return a DataArray or Dataset containing all variables associated with the key including ancillary variables and bounds variables (if possible).

In the following, note that the “ancillary variables” q_error_limit and q_detection_limit were also returned

anc.cf["specific_humidity"]
<xarray.DataArray 'q' (x: 10, y: 20)>
-1.129 -0.7901 -0.9964 0.3029 -0.2888 ... -0.4955 0.6418 -0.3906 0.07033 0.3357
Coordinates:
    q_error_limit      (x, y) float64 0.6056 -1.794 -0.468 ... -0.9284 -1.263
    q_detection_limit  float64 0.001
Dimensions without coordinates: x, y
Attributes:
    standard_name:        specific_humidity
    units:                g/g
    ancillary_variables:  q_error_limit q_detection_limit

even though they are “data variables” and not “coordinate variables” in the original Dataset.

anc
<xarray.Dataset>
Dimensions:            (x: 10, y: 20)
Dimensions without coordinates: x, y
Data variables:
    q                  (x, y) float64 -1.129 -0.7901 -0.9964 ... 0.07033 0.3357
    q_error_limit      (x, y) float64 0.6056 -1.794 -0.468 ... -0.9284 -1.263
    q_detection_limit  float64 0.001

Selecting multiple variables#

Sometimes a Dataset may contain multiple X or multiple longitude variables. In that case a simple .cf["X"] will raise an error. Instead follow Xarray convention and pass a list .cf[["X"]] to receive a Dataset with all available "X" variables

multiple.cf[["X"]]
<xarray.Dataset>
Dimensions:  (x2: 10, x1: 30)
Coordinates:
  * x2       (x2) int64 0 1 2 3 4 5 6 7 8 9
  * x1       (x1) int64 0 1 2 3 4 5 6 7 8 9 10 ... 20 21 22 23 24 25 26 27 28 29
Data variables:
    *empty*
pop.cf[["longitude"]]
<xarray.Dataset>
Dimensions:  (nlat: 20, nlon: 30)
Coordinates:
    TLONG    (nlat, nlon) float64 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0
    ULONG    (nlat, nlon) float64 0.5 0.5 0.5 0.5 0.5 ... 0.5 0.5 0.5 0.5 0.5
  * nlon     (nlon) int64 0 1 2 3 4 5 6 7 8 9 ... 20 21 22 23 24 25 26 27 28 29
  * nlat     (nlat) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Data variables:
    *empty*

Mixing names#

cf_xarray aims to be as friendly as possible, so it is possible to mix “CF names” and normal variable names. Here we select UVEL and TEMP by using the standard_name of TEMP (which is sea_water_potential_temperature)

pop.cf[["sea_water_potential_temperature", "UVEL"]]
<xarray.Dataset>
Dimensions:  (nlat: 20, nlon: 30)
Coordinates:
    TLONG    (nlat, nlon) float64 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0
    TLAT     (nlat, nlon) float64 2.0 2.0 2.0 2.0 2.0 ... 2.0 2.0 2.0 2.0 2.0
    ULONG    (nlat, nlon) float64 0.5 0.5 0.5 0.5 0.5 ... 0.5 0.5 0.5 0.5 0.5
    ULAT     (nlat, nlon) float64 2.5 2.5 2.5 2.5 2.5 ... 2.5 2.5 2.5 2.5 2.5
  * nlon     (nlon) int64 0 1 2 3 4 5 6 7 8 9 ... 20 21 22 23 24 25 26 27 28 29
  * nlat     (nlat) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Data variables:
    TEMP     (nlat, nlon) float64 15.0 15.0 15.0 15.0 ... 15.0 15.0 15.0 15.0
    UVEL     (nlat, nlon) float64 15.0 15.0 15.0 15.0 ... 15.0 15.0 15.0 15.0