Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add kwargs to InferenceData.to_netcdf() #2410

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

cowirihy
Copy link

@cowirihy cowirihy commented Jan 17, 2025

Relates to #2298, with solution broadly along the lines of that sketched out in the issue.

Added **kwargs to InferenceData.to_netcdf() method, to allow any of the parameters that can be passed to xarray.Dataset.to_netcdf() to get passed through.

E.g. for my usage case I define an encoding={'var_A' : {"dtype": "int16", "scale_factor" : 0.1}} dict, so that var_A samples get stored via 16-bit integers and to 1 decimal place precision, to economise on file size but with an inconsequential loss of precision. Note this would be done for var_A in any group in which it appears, e.g. both posterior and prior groups if present.

I've put in a placeholder for where a new unittest could be added, but am not so confident in defining this. What I envisage, which I've tested via a seperate script my end, is the following:

  • Load data to define an InferenceData instance, reading from netcdf file as I can see other unittests do already
  • Define some customisation e.g. encoding settings for a couple of the RVs in the model to which the data relates
  • Write to a new netcdf file but passing encoding (and/or other params that would alter the behaviour of Dataset.to_netcdf)
  • Read back in from the 2nd file and verify that approximately the same data is recovered but with the expected loss of precision

Help welcome in setting up the latter! It would also be worthwhile verifying via tests that the handling code I've included (populating the kwargs dict based on compress and engine parameters per previous) is working as intended and in a backwards compatible manner; it should! Perhaps existing tests are adequate to prove this though?

Checklist

  • Follows official PR format
  • [n/a] Includes a sample plot to visually illustrate the changes (only for plot-related functions)
  • [n/a] New features are properly documented (with an example if appropriate)?
  • Includes new or updated tests to cover the new feature
  • Code style correct (follows pylint and black guidelines)
  • Changes are listed in changelog

📚 Documentation preview 📚: https://arviz--2410.org.readthedocs.build/en/2410/

@amaloney
Copy link

Thanks @cowirihy I'll try my best to review this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants