{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction to `cf_xarray`\n", "\n", "This notebook is a brief introduction to `cf_xarray`'s current capabilities." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-07-28T03:18:03.264499Z", "start_time": "2020-07-28T03:18:01.706628Z" } }, "outputs": [], "source": [ "import numpy as np\n", "import xarray as xr\n", "\n", "import cf_xarray as cfxr\n", "\n", "# For this notebooks, it's nicer if we don't show the array values by default\n", "xr.set_options(display_expand_data=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`cf_xarray` works best when `xarray` keeps attributes by default." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "xr.set_options(keep_attrs=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets read two datasets." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-07-28T03:18:03.355688Z", "start_time": "2020-07-28T03:18:03.266006Z" } }, "outputs": [], "source": [ "ds = xr.tutorial.load_dataset(\"air_temperature\")\n", "ds.air.attrs[\"standard_name\"] = \"air_temperature\"\n", "ds" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This one is inspired by POP model output and illustrates how the coordinates\n", "attribute is interpreted. It also illustrates one way of tagging curvilinear\n", "grids for convenient use of `cf_xarray`" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-07-28T03:18:03.418465Z", "start_time": "2020-07-28T03:18:03.357132Z" } }, "outputs": [], "source": [ "from cf_xarray.datasets import popds as pop\n", "\n", "pop" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This synthetic dataset has multiple `X` and `Y` coords. An example would be\n", "model output on a staggered grid." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-07-28T03:18:03.468916Z", "start_time": "2020-07-28T03:18:03.420053Z" } }, "outputs": [], "source": [ "from cf_xarray.datasets import multiple\n", "\n", "multiple" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This dataset has ancillary variables" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-07-28T03:18:03.496952Z", "start_time": "2020-07-28T03:18:03.470068Z" } }, "outputs": [], "source": [ "from cf_xarray.datasets import anc\n", "\n", "anc" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What attributes have been discovered?\n", "\n", "The criteria for identifying variables using CF attributes are listed\n", "[here](../contributing.rst)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-07-28T03:18:03.528010Z", "start_time": "2020-07-28T03:18:03.498427Z" } }, "outputs": [], "source": [ "ds.lon" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`ds.lon` has attributes `axis: X`. This means that `cf_xarray` can identify the\n", "`'X'` axis as being represented by the `lon` variable.\n", "\n", "It can also use the `standard_name` and `units` attributes to infer that `lon`\n", "is \"Longitude\". To see variable names that `cf_xarray` can infer, use `ds.cf`" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-07-28T03:18:03.546588Z", "start_time": "2020-07-28T03:18:03.529734Z" } }, "outputs": [], "source": [ "ds.cf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For `pop`, only `latitude` and `longitude` are detected, not `X` or `Y`. Please\n", "comment here: https://github.com/xarray-contrib/cf-xarray/issues/23 if you have\n", "opinions about this behaviour." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-07-28T03:18:03.565233Z", "start_time": "2020-07-28T03:18:03.547756Z" } }, "outputs": [], "source": [ "pop.cf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For `multiple`, multiple `X` and `Y` coordinates are detected" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-07-28T03:18:03.595027Z", "start_time": "2020-07-28T03:18:03.566478Z" } }, "outputs": [], "source": [ "multiple.cf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Feature: Accessing coordinate variables\n", "\n", "`.cf` implements `__getitem__` to allow easy access to coordinate and axis\n", "variables." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-07-28T03:26:55.667604Z", "start_time": "2020-07-28T03:26:55.475896Z" } }, "outputs": [], "source": [ "ds.cf[\"X\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Indexing with a scalar key raises an error if the key maps to multiple variables\n", "names" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-07-28T03:27:00.228638Z", "start_time": "2020-07-28T03:27:00.209975Z" }, "tags": [ "raises-exception" ] }, "outputs": [], "source": [ "multiple.cf[\"X\"]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-07-28T03:28:23.253247Z", "start_time": "2020-07-28T03:28:23.213761Z" }, "tags": [ "raises-exception" ] }, "outputs": [], "source": [ "pop.cf[\"longitude\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To get back all variables associated with that key, pass a single element list\n", "instead." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-07-28T03:28:31.727351Z", "start_time": "2020-07-28T03:28:31.676094Z" } }, "outputs": [], "source": [ "multiple.cf[[\"X\"]]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-07-28T03:28:35.927895Z", "start_time": "2020-07-28T03:28:35.867609Z" } }, "outputs": [], "source": [ "pop.cf[[\"longitude\"]]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "DataArrays return DataArrays" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-07-28T03:28:50.626127Z", "start_time": "2020-07-28T03:28:50.526332Z" } }, "outputs": [], "source": [ "pop.UVEL.cf[\"longitude\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Dataset.cf[...]` returns a single `DataArray`, parsing the `coordinates`\n", "attribute if present, so we correctly get the `TLONG` variable and not the\n", "`ULONG` variable" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-07-28T03:28:08.240638Z", "start_time": "2020-07-28T03:28:08.170572Z" }, "scrolled": true }, "outputs": [], "source": [ "pop.cf[\"TEMP\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Dataset.cf[...]` also interprets the `ancillary_variables` attribute. The\n", "ancillary variables are returned as coordinates of a DataArray" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-07-28T03:28:09.922569Z", "start_time": "2020-07-28T03:28:09.866452Z" }, "scrolled": true }, "outputs": [], "source": [ "anc.cf[\"q\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Feature: Accessing variables by standard names" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-07-28T03:18:03.776855Z", "start_time": "2020-07-28T03:18:01.700Z" } }, "outputs": [], "source": [ "pop.cf[[\"sea_water_potential_temperature\", \"UVEL\"]]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that ancillary variables are included as coordinate variables" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-07-28T03:18:03.777981Z", "start_time": "2020-07-28T03:18:01.700Z" } }, "outputs": [], "source": [ "anc.cf[\"specific_humidity\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Feature: Utility functions\n", "\n", "There are some utility functions to allow use by downstream libraries" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-07-28T03:18:03.779098Z", "start_time": "2020-07-28T03:18:01.700Z" } }, "outputs": [], "source": [ "pop.cf.keys()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can test for presence of these keys" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-07-28T03:18:03.780223Z", "start_time": "2020-07-28T03:18:01.800Z" } }, "outputs": [], "source": [ "\"sea_water_x_velocity\" in pop.cf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also get out the available Axis names" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pop.cf.axes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "or available Coordinate names. Same for cell measures (`.cf.cell_measures`) and\n", "standard names (`.cf.standard_names`)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pop.cf.coordinates" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note:** Although it is possible to assign additional coordinates,\n", "`.cf.coordinates` only returns a subset of\n", "`(\"longitude\", \"latitude\", \"vertical\", \"time\")`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Feature: Rewriting property dictionaries\n", "\n", "`cf_xarray` will rewrite the `.sizes` and `.chunks` dictionaries so that one can\n", "index by a special CF axis or coordinate name" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-07-28T03:18:03.781216Z", "start_time": "2020-07-28T03:18:01.800Z" } }, "outputs": [], "source": [ "ds.cf.sizes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note the duplicate entries above:\n", "\n", "1. One for `X`, `Y`, `T`\n", "1. and one for `longitude`, `latitude` and `time`.\n", "\n", "An error is raised if there are multiple `'X'` variables (for example)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-07-28T03:18:03.782014Z", "start_time": "2020-07-28T03:18:01.800Z" }, "tags": [ "raises-exception" ] }, "outputs": [], "source": [ "multiple.cf.sizes" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-07-28T03:18:03.782723Z", "start_time": "2020-07-28T03:18:01.800Z" } }, "outputs": [], "source": [ "multiple.v1.cf.sizes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Feature: Renaming variables\n", "\n", "`cf_xarray` lets you rewrite variables in one dataset to like variables in\n", "another dataset.\n", "\n", "In this example, a one-to-one mapping is not possible and the coordinate\n", "variables are not renamed." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "da = pop.cf[\"TEMP\"]\n", "da.cf.rename_like(ds)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we exclude all axes (variables with `axis` attribute), a one-to-one mapping\n", "is possible. In this example, `TLONG` and `TLAT` are renamed to `lon` and `lat`\n", "i.e. their counterparts in `ds`. Note the the `coordinates` attribute is\n", "appropriately changed." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-07-28T03:18:03.783410Z", "start_time": "2020-07-28T03:18:01.800Z" } }, "outputs": [], "source": [ "da.cf.rename_like(ds, skip=\"axes\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Feature: Rewriting arguments\n", "\n", "`cf_xarray` can rewrite arguments for a large number of xarray functions. By\n", "this I mean that instead of specifying say `dim=\"lon\"`, you can pass `dim=\"X\"`\n", "or `dim=\"longitude\"` and `cf_xarray` will rewrite that to `dim=\"lon\"` based on\n", "the attributes present in the dataset.\n", "\n", "Here are a few examples" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Slicing" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-07-28T03:18:03.783972Z", "start_time": "2020-07-28T03:18:01.800Z" } }, "outputs": [], "source": [ "ds.air.cf.isel(T=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Slicing works will expand a single key like `X` to multiple dimensions if those\n", "dimensions are tagged with `axis: X`" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-07-28T03:18:03.784592Z", "start_time": "2020-07-28T03:18:01.800Z" } }, "outputs": [], "source": [ "multiple.cf.isel(X=1, Y=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Reductions" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-07-28T03:18:03.785151Z", "start_time": "2020-07-28T03:18:01.800Z" } }, "outputs": [], "source": [ "ds.air.cf.mean(\"X\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Expanding to multiple dimensions is also supported" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-07-28T03:18:03.785772Z", "start_time": "2020-07-28T03:18:01.800Z" } }, "outputs": [], "source": [ "# takes the mean along [\"x1\", \"x2\"]\n", "multiple.cf.mean(\"X\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plotting" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-07-28T03:18:03.786551Z", "start_time": "2020-07-28T03:18:01.800Z" } }, "outputs": [], "source": [ "ds.air.cf.isel(time=1).cf.plot(x=\"X\", y=\"Y\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-07-28T03:18:03.787186Z", "start_time": "2020-07-28T03:18:01.800Z" } }, "outputs": [], "source": [ "ds.air.cf.isel(T=1, Y=[0, 1, 2]).cf.plot(x=\"longitude\", hue=\"latitude\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`cf_xarray` can facet" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-07-28T03:18:03.787907Z", "start_time": "2020-07-28T03:18:01.800Z" } }, "outputs": [], "source": [ "seasonal = (\n", " ds.air.groupby(\"time.season\").mean().reindex(season=[\"DJF\", \"MAM\", \"JJA\", \"SON\"])\n", ")\n", "seasonal.cf.plot(x=\"longitude\", y=\"latitude\", col=\"season\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Resample & groupby" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-07-28T03:18:03.788564Z", "start_time": "2020-07-28T03:18:01.800Z" } }, "outputs": [], "source": [ "ds.cf.resample(T=\"D\").mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`cf_xarray` also understands the \"datetime accessor\" syntax for groupby" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-07-28T03:18:03.789223Z", "start_time": "2020-07-28T03:18:01.900Z" } }, "outputs": [], "source": [ "ds.cf.groupby(\"T.month\").mean(\"longitude\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Rolling & coarsen" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-07-28T03:18:03.789803Z", "start_time": "2020-07-28T03:18:01.900Z" } }, "outputs": [], "source": [ "ds.cf.rolling(X=5).mean()" ] }, { "cell_type": "markdown", "metadata": { "ExecuteTime": { "end_time": "2020-06-12T20:31:46.708776Z", "start_time": "2020-06-12T20:31:46.525974Z" } }, "source": [ "`coarsen` works but everything later will break because of xarray bug\n", "https://github.com/pydata/xarray/issues/4120\n", "\n", "`ds.isel(lon=slice(50)).cf.coarsen(Y=5, X=10).mean()`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Feature: mix \"special names\" and variable names" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-07-28T03:18:03.790416Z", "start_time": "2020-07-28T03:18:01.900Z" } }, "outputs": [], "source": [ "ds.cf.groupby(\"T.month\").mean([\"lat\", \"X\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Feature: Weight by Cell Measures\n", "\n", "`cf_xarray` can weight by cell measure variables if the appropriate attribute is\n", "set" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-07-28T03:18:03.791021Z", "start_time": "2020-07-28T03:18:01.900Z" } }, "outputs": [], "source": [ "# Lets make some weights (not sure if this is right)\n", "ds.coords[\"cell_area\"] = (\n", " np.cos(ds.air.cf[\"latitude\"] * np.pi / 180)\n", " * xr.ones_like(ds.air.cf[\"longitude\"])\n", " * 105e3\n", " * 110e3\n", ")\n", "# and set proper attributes\n", "ds[\"cell_area\"].attrs = dict(standard_name=\"cell_area\", units=\"m2\")\n", "ds.air.attrs[\"cell_measures\"] = \"area: cell_area\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-07-28T03:18:03.791751Z", "start_time": "2020-07-28T03:18:01.900Z" } }, "outputs": [], "source": [ "ds.air.cf.weighted(\"area\").mean([\"latitude\", \"time\"]).cf.plot(x=\"longitude\")\n", "ds.air.mean([\"lat\", \"time\"]).cf.plot(x=\"longitude\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Feature: Cell boundaries and vertices\n", "\n", "`cf_xarray` can infer cell boundaries (for rectilinear grids) and convert\n", "CF-standard bounds variables to vertices." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ds_bnds = ds.cf.add_bounds([\"lat\", \"lon\"])\n", "ds_bnds" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also convert each bounds variable independently with the top-level\n", "functions" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "lat_bounds = ds_bnds.cf.get_bounds(\"latitude\")\n", "\n", "lat_vertices = cfxr.bounds_to_vertices(lat_bounds, bounds_dim=\"bounds\")\n", "lat_vertices" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Or we can convert _all_ bounds variables on a dataset\n", "ds_crns = ds_bnds.cf.bounds_to_vertices()\n", "ds_crns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Feature: Add canonical CF attributes\n", "\n", "`cf_xarray` can add missing canonical CF attributes consistent with the official\n", "[CF standard name table](https://cfconventions.org/standard-names.html)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ds_canonical = ds.cf.add_canonical_attributes(verbose=True)\n", "ds_canonical" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.9" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": true, "toc_position": {}, "toc_section_display": true, "toc_window_display": true }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 4 }