Selecting DataArrays¶
See also
CF conventions on
A powerful feature of cf_xarray is the ability select DataArrays using special “CF names” like the “latitude”, or “longitude” coordinate names, “X” or “Y” axes names, oreven using the standard_name attribute if present.
To demonstrate this, let’s load a few datasets
from cf_xarray.datasets import airds, anc, multiple, popds as pop
By axis and coordinate name¶
Lets select the "X" axis on airds.
# identical to airds["lon"]
airds.cf["X"]
<xarray.DataArray 'lon' (lon: 50)> Size: 200B
200.0 202.5 205.0 207.5 210.0 212.5 ... 310.0 312.5 315.0 317.5 320.0 322.5
Coordinates:
* lon (lon) float32 200B 200.0 202.5 205.0 207.5 ... 317.5 320.0 322.5
Attributes:
standard_name: longitude
long_name: Longitude
units: degrees_east
axis: XThis works because airds.lon.attrs contains axis: "X"
airds.cf
Coordinates:
CF Axes: * X: ['lon']
* Y: ['lat']
* T: ['time']
Z: n/a
CF Coordinates: * longitude: ['lon']
* latitude: ['lat']
* time: ['time']
vertical: n/a
Cell Measures: area: ['cell_area']
volume: n/a
Standard Names: * latitude: ['lat']
* longitude: ['lon']
* time: ['time']
Bounds: n/a
Grid Mappings: n/a
Data Variables:
Cell Measures: area, volume: n/a
Standard Names: air_temperature: ['air']
Bounds: n/a
Grid Mappings: n/a
By standard name¶
The variable airds.air has standard_name: "air_temperature", so we can use that to pull it out:
airds.cf["air_temperature"]
<xarray.DataArray 'air' (time: 4, lat: 25, lon: 50)> Size: 40kB
[5000 values with dtype=float64]
Coordinates:
* lat (lat) float32 100B 75.0 72.5 70.0 67.5 ... 22.5 20.0 17.5 15.0
* lon (lon) float32 200B 200.0 202.5 205.0 207.5 ... 317.5 320.0 322.5
* time (time) datetime64[ns] 32B 2013-01-01 ... 2013-01-01T18:00:00
cell_area (lat, lon) float32 5kB 2.989e+09 2.989e+09 ... 1.116e+10
Attributes: (12/13)
long_name: 4xDaily Air temperature at sigma level 995
units: degK
precision: 2
GRIB_id: 11
GRIB_name: TMP
var_desc: Air temperature
... ...
level_desc: Surface
statistic: Individual Obs
parent_stat: Other
actual_range: [185.16 322.1 ]
cell_measures: area: cell_area
standard_name: air_temperatureBy cf_role¶
cf_xarray supports identifying variables by the cf_role attribute.
ds = xr.Dataset(
{"temp": ("x", np.arange(10))},
coords={"cast": ("x", np.arange(10), {"cf_role": "profile_id"})}
)
ds.cf["profile_id"]
<xarray.DataArray 'cast' (x: 10)> Size: 80B
0 1 2 3 4 5 6 7 8 9
Dimensions without coordinates: x
Attributes:
cf_role: profile_idAssociated variables¶
.cf[key] will return a DataArray or Dataset containing all variables associated with the key including ancillary variables and bounds variables (if possible).
In the following, note that the “ancillary variables” q_error_limit and q_detection_limit were also returned
anc.cf["specific_humidity"]
<xarray.DataArray 'q' (x: 10, y: 20)> Size: 2kB
-1.269 -0.9759 -1.934 0.6486 -1.598 ... -1.503 0.2928 0.3125 -0.7524 0.1692
Coordinates:
q_error_limit (x, y) float64 2kB 0.04018 0.6908 1.153 ... 0.1423 0.9069
q_detection_limit float64 8B 0.001
Dimensions without coordinates: x, y
Attributes:
standard_name: specific_humidity
units: g/g
ancillary_variables: q_error_limit q_detection_limiteven though they are “data variables” and not “coordinate variables” in the original Dataset.
anc
<xarray.Dataset> Size: 3kB
Dimensions: (x: 10, y: 20)
Dimensions without coordinates: x, y
Data variables:
q (x, y) float64 2kB -1.269 -0.9759 ... -0.7524 0.1692
q_error_limit (x, y) float64 2kB 0.04018 0.6908 1.153 ... 0.1423 0.9069
q_detection_limit float64 8B 0.001Selecting multiple variables¶
Sometimes a Dataset may contain multiple X or multiple longitude variables. In that case a simple .cf["X"] will raise an error. Instead follow Xarray convention and pass a list .cf[["X"]] to receive a Dataset with all available "X" variables
multiple.cf[["X"]]
<xarray.Dataset> Size: 320B
Dimensions: (x2: 10, x1: 30)
Coordinates:
* x2 (x2) int64 80B 0 1 2 3 4 5 6 7 8 9
* x1 (x1) int64 240B 0 1 2 3 4 5 6 7 8 9 ... 21 22 23 24 25 26 27 28 29
Data variables:
*empty*pop.cf[["longitude"]]
<xarray.Dataset> Size: 10kB
Dimensions: (nlat: 20, nlon: 30)
Coordinates:
ULONG (nlat, nlon) float64 5kB 0.5 0.5 0.5 0.5 0.5 ... 0.5 0.5 0.5 0.5
TLONG (nlat, nlon) float64 5kB 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0
* nlon (nlon) int64 240B 0 1 2 3 4 5 6 7 8 ... 21 22 23 24 25 26 27 28 29
* nlat (nlat) int64 160B 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Data variables:
*empty*Mixing names¶
cf_xarray aims to be as friendly as possible, so it is possible to mix “CF names” and normal variable names. Here we select UVEL and TEMP by using the standard_name of TEMP (which is sea_water_potential_temperature)
pop.cf[["sea_water_potential_temperature", "UVEL"]]
<xarray.Dataset> Size: 29kB
Dimensions: (nlat: 20, nlon: 30)
Coordinates:
TLONG (nlat, nlon) float64 5kB 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0
TLAT (nlat, nlon) float64 5kB 2.0 2.0 2.0 2.0 2.0 ... 2.0 2.0 2.0 2.0
ULONG (nlat, nlon) float64 5kB 0.5 0.5 0.5 0.5 0.5 ... 0.5 0.5 0.5 0.5
ULAT (nlat, nlon) float64 5kB 2.5 2.5 2.5 2.5 2.5 ... 2.5 2.5 2.5 2.5
* nlon (nlon) int64 240B 0 1 2 3 4 5 6 7 8 ... 21 22 23 24 25 26 27 28 29
* nlat (nlat) int64 160B 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Data variables:
TEMP (nlat, nlon) float64 5kB 15.0 15.0 15.0 15.0 ... 15.0 15.0 15.0
UVEL (nlat, nlon) float64 5kB 15.0 15.0 15.0 15.0 ... 15.0 15.0 15.0