Encoding and decoding¶
cf_xarray
aims to support encoding and decoding variables using CF conventions not yet implemented by Xarray.
Geometries¶
See Geometries for more.
Compression by gathering¶
The “compression by gathering”
convention could be used for either pandas.MultiIndex
objects or pydata/sparse
arrays.
MultiIndex¶
cf_xarray
provides encode_multi_index_as_compress()
and decode_compress_to_multi_index()
to encode MultiIndex-ed
dimensions using “compression by gethering”.
Here’s a test dataset
ds = xr.Dataset(
{"landsoilt": ("landpoint", np.random.randn(4), {"foo": "bar"})},
{
"landpoint": pd.MultiIndex.from_product(
[["a", "b"], [1, 2]], names=("lat", "lon")
)
},
)
ds
/tmp/ipykernel_3020/746089171.py:1: FutureWarning: the `pandas.MultiIndex` object(s) passed as 'landpoint' coordinate(s) or data variable(s) will no longer be implicitly promoted and wrapped into multiple indexed coordinates in the future (i.e., one coordinate for each multi-index level + one dimension coordinate). If you want to keep this behavior, you need to first wrap it explicitly using `mindex_coords = xarray.Coordinates.from_pandas_multiindex(mindex_obj, 'dim')` and pass it as coordinates, e.g., `xarray.Dataset(coords=mindex_coords)`, `dataset.assign_coords(mindex_coords)` or `dataarray.assign_coords(mindex_coords)`.
ds = xr.Dataset(
<xarray.Dataset> Size: 128B Dimensions: (landpoint: 4) Coordinates: * landpoint (landpoint) object 32B MultiIndex * lat (landpoint) object 32B 'a' 'a' 'b' 'b' * lon (landpoint) int64 32B 1 2 1 2 Data variables: landsoilt (landpoint) float64 32B 1.389 0.2402 1.15 0.1171
First encode (note the "compress"
attribute on the landpoint
variable)
encoded = cfxr.encode_multi_index_as_compress(ds, "landpoint")
encoded
<xarray.Dataset> Size: 96B Dimensions: (landpoint: 4, lat: 2, lon: 2) Coordinates: * lat (lat) object 16B 'a' 'b' * lon (lon) int64 16B 1 2 * landpoint (landpoint) int64 32B 0 1 2 3 Data variables: landsoilt (landpoint) float64 32B 1.389 0.2402 1.15 0.1171
At this point, we can write encoded
to a CF-compliant dataset using xarray.Dataset.to_netcdf()
for example.
After reading that file, decode using
decoded = cfxr.decode_compress_to_multi_index(encoded, "landpoint")
decoded
<xarray.Dataset> Size: 128B Dimensions: (landpoint: 4) Coordinates: * landpoint (landpoint) object 32B MultiIndex * lat (landpoint) object 32B 'a' 'a' 'b' 'b' * lon (landpoint) int64 32B 1 2 1 2 Data variables: landsoilt (landpoint) float64 32B 1.389 0.2402 1.15 0.1171
We roundtrip perfectly
ds.identical(decoded)
True
Sparse arrays¶
This is unsupported currently but a pull request is welcome!