Encoding and decoding#
cf_xarray
aims to support encoding and decoding variables using CF conventions not yet implemented by Xarray.
Compression by gathering#
The “compression by gathering”
convention could be used for either pandas.MultiIndex
objects or pydata/sparse
arrays.
MultiIndex#
cf_xarray
provides encode_multi_index_as_compress()
and decode_compress_to_multi_index()
to encode MultiIndex-ed
dimensions using “compression by gethering”.
Here’s a test dataset
ds = xr.Dataset(
{"landsoilt": ("landpoint", np.random.randn(4), {"foo": "bar"})},
{
"landpoint": pd.MultiIndex.from_product(
[["a", "b"], [1, 2]], names=("lat", "lon")
)
},
)
ds
<xarray.Dataset> Dimensions: (landpoint: 4) Coordinates: * landpoint (landpoint) object MultiIndex * lat (landpoint) object 'a' 'a' 'b' 'b' * lon (landpoint) int64 1 2 1 2 Data variables: landsoilt (landpoint) float64 -0.7556 0.6229 -0.7909 -0.06848
First encode (note the "compress"
attribute on the landpoint
variable)
encoded = cfxr.encode_multi_index_as_compress(ds, "landpoint")
encoded
<xarray.Dataset> Dimensions: (landpoint: 4, lat: 2, lon: 2) Coordinates: * lat (lat) object 'a' 'b' * lon (lon) int64 1 2 * landpoint (landpoint) int64 0 1 2 3 Data variables: landsoilt (landpoint) float64 -0.7556 0.6229 -0.7909 -0.06848
At this point, we can write encoded
to a CF-compliant dataset using xarray.Dataset.to_netcdf()
for example.
After reading that file, decode using
decoded = cfxr.decode_compress_to_multi_index(encoded, "landpoint")
decoded
<xarray.Dataset> Dimensions: (landpoint: 4) Coordinates: * landpoint (landpoint) object MultiIndex * lat (landpoint) object 'a' 'a' 'b' 'b' * lon (landpoint) int64 1 2 1 2 Data variables: landsoilt (landpoint) float64 -0.7556 0.6229 -0.7909 -0.06848
We roundtrip perfectly
ds.identical(decoded)
True
Sparse arrays#
This is unsupported currently but a pull request is welcome!