Encoding and decoding#

cf_xarray aims to support encoding and decoding variables using CF conventions not yet implemented by Xarray.

Compression by gathering#

The “compression by gathering” convention could be used for either pandas.MultiIndex objects or pydata/sparse arrays.

MultiIndex#

cf_xarray provides encode_multi_index_as_compress() and decode_compress_to_multi_index() to encode MultiIndex-ed dimensions using “compression by gethering”.

Here’s a test dataset

ds = xr.Dataset(
    {"landsoilt": ("landpoint", np.random.randn(4), {"foo": "bar"})},
    {
        "landpoint": pd.MultiIndex.from_product(
            [["a", "b"], [1, 2]], names=("lat", "lon")
        )
    },
)
ds
<xarray.Dataset>
Dimensions:    (landpoint: 4)
Coordinates:
  * landpoint  (landpoint) object MultiIndex
  * lat        (landpoint) object 'a' 'a' 'b' 'b'
  * lon        (landpoint) int64 1 2 1 2
Data variables:
    landsoilt  (landpoint) float64 -0.7556 0.6229 -0.7909 -0.06848

First encode (note the "compress" attribute on the landpoint variable)

encoded = cfxr.encode_multi_index_as_compress(ds, "landpoint")
encoded
<xarray.Dataset>
Dimensions:    (landpoint: 4, lat: 2, lon: 2)
Coordinates:
  * lat        (lat) object 'a' 'b'
  * lon        (lon) int64 1 2
  * landpoint  (landpoint) int64 0 1 2 3
Data variables:
    landsoilt  (landpoint) float64 -0.7556 0.6229 -0.7909 -0.06848

At this point, we can write encoded to a CF-compliant dataset using xarray.Dataset.to_netcdf() for example. After reading that file, decode using

decoded = cfxr.decode_compress_to_multi_index(encoded, "landpoint")
decoded
<xarray.Dataset>
Dimensions:    (landpoint: 4)
Coordinates:
  * landpoint  (landpoint) object MultiIndex
  * lat        (landpoint) object 'a' 'a' 'b' 'b'
  * lon        (landpoint) int64 1 2 1 2
Data variables:
    landsoilt  (landpoint) float64 -0.7556 0.6229 -0.7909 -0.06848

We roundtrip perfectly

ds.identical(decoded)
True

Sparse arrays#

This is unsupported currently but a pull request is welcome!