Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support PyTensor deterministic operations as observations #7656

Merged
merged 14 commits into from
Feb 27, 2025

Conversation

wd60622
Copy link
Contributor

@wd60622 wd60622 commented Jan 24, 2025

Description

Will need some help on this implementation and how to best test this.

Related Issue

Checklist

Type of change

  • New feature / enhancement
  • Bug fix
  • Documentation
  • Maintenance
  • Other (please specify):

📚 Documentation preview 📚: https://pymc--7656.org.readthedocs.build/en/7656/

@github-actions github-actions bot added the bug label Jan 24, 2025
@ricardoV94
Copy link
Member

We need to tell InferenceData converter how to get the data values to put in constant_data, and observed_data. It had some hardcode logic assuming you could only due Casts

@ricardoV94
Copy link
Member

I don't consider this a bugfix, it's behavior that was explicitly forbidden for conservative reasons (more like a NotImplementedError)

@wd60622
Copy link
Contributor Author

wd60622 commented Jan 25, 2025

We need to tell InferenceData converter how to get the data values to put in constant_data, and observed_data. It had some hardcode logic assuming you could only due Casts

Where does this happen in the code?

@wd60622
Copy link
Contributor Author

wd60622 commented Jan 25, 2025

I think the MiniBatch tests will still fail. Any thoughts on how that should behave?

@wd60622
Copy link
Contributor Author

wd60622 commented Jan 25, 2025

Also, feel free to choose a better title! I couldn't express it too well

@ricardoV94
Copy link
Member

ricardoV94 commented Jan 25, 2025

We need to tell InferenceData converter how to get the data values to put in constant_data, and observed_data. It had some hardcode logic assuming you could only due Casts

Where does this happen in the code?

obs_data = extract_obs_data(aux_obs)

def extract_obs_data(x: TensorVariable) -> np.ndarray:

We should be able to just use constant_fold (also in pytensor) for it

@ricardoV94 ricardoV94 changed the title Support ops to pm.Data in observed variables Support pytensor deterministic operations as observations Jan 25, 2025
@ricardoV94 ricardoV94 changed the title Support pytensor deterministic operations as observations Support PyTensor deterministic operations as observations Jan 25, 2025
@wd60622
Copy link
Contributor Author

wd60622 commented Jan 25, 2025

We should be able to just use constant_fold (also in pytensor) for it

I have an implementation before this suggestion. I need a little help understanding this. Feel free to put suggestion

@ricardoV94
Copy link
Member

We should be able to just use constant_fold (also in pytensor) for it

I have an implementation before this suggestion. I need a little help understanding this. Feel free to put suggestion

What do you mean? My suggestion is to simply call constant_fold on the observed variable

@wd60622
Copy link
Contributor Author

wd60622 commented Jan 28, 2025

I am not able to make this work with either constant_fold(x) or constant_fold([x]). Both return errors. Is the constant_fold from pymc.pytensorf or from pytensor?

@ricardoV94
Copy link
Member

I am not able to make this work with either constant_fold(x) or constant_fold([x]). Both return errors. Is the constant_fold from pymc.pytensorf or from pytensor?

From pymc.pytensorf. There's a raise_if_not_constant flag you can set, but what cases is it failing?

@ricardoV94
Copy link
Member

Ah if it's a SharedVariable like pm.Data of course constant_fold is not going to work... Dummy me.

@ricardoV94
Copy link
Member

ricardoV94 commented Jan 28, 2025

from pytensor.compile.mode import Mode
import pymc as pm

with pm.Model(coords={"date": [0, 1, 2]}) as m:
    data = pm.Data("data", [0, 1, 2], dims="date")
    x = pm.Normal("x")
    y = pm.Normal("y", x, observed=data, dims="date")

cheap_eval_mode = Mode(linker="py", optimizer=None)
m.rvs_to_values[y].eval(mode=cheap_eval_mode)
# array([0., 1., 2.])

@wd60622
Copy link
Contributor Author

wd60622 commented Jan 28, 2025

Thanks for the suggestion. I've used mode on the random variable directly. Is it different to get the rvs_to_values from the model? it currently isn't accessed from the function at the moment.

@wd60622
Copy link
Contributor Author

wd60622 commented Jan 28, 2025

Could you point me to similar tests for this?

@wd60622
Copy link
Contributor Author

wd60622 commented Jan 28, 2025

I've just added to associated pytensorf test suite

@wd60622
Copy link
Contributor Author

wd60622 commented Jan 29, 2025

There are some failing tests related to the Mini batch also using the is_valid_observed function as well. How should those be handled. Are those test tests no longer valid?

@ricardoV94
Copy link
Member

There are some failing tests related to the Mini batch also using the is_valid_observed function as well. How should those be handled. Are those test tests no longer valid?

Yeah sounds like the test should be changed to the more strict kind of stuff that we still don't allow, like having a pt.random.normal in it.

@wd60622
Copy link
Contributor Author

wd60622 commented Jan 29, 2025

Looks like a random test failure?

Will address rest of feedback when I have chance

@ricardoV94
Copy link
Member

ricardoV94 commented Jan 29, 2025

Looks like a random test failure?

Yeah that one is flaky, we should probably try/except on that error and just skip when it happens

@cetagostini
Copy link

Are we far from merge here?

@wd60622
Copy link
Contributor Author

wd60622 commented Feb 8, 2025

Hi @ricardoV94
Any modifications or additional tests needed for this?

Copy link

codecov bot commented Feb 8, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.65%. Comparing base (358b825) to head (2269bd6).
Report is 18 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #7656      +/-   ##
==========================================
- Coverage   92.70%   92.65%   -0.06%     
==========================================
  Files         107      107              
  Lines       18391    18327      -64     
==========================================
- Hits        17050    16981      -69     
- Misses       1341     1346       +5     
Files with missing lines Coverage Δ
pymc/data.py 84.39% <100.00%> (-4.83%) ⬇️
pymc/pytensorf.py 89.76% <100.00%> (-0.91%) ⬇️

... and 1 file with indirect coverage changes

@ricardoV94 ricardoV94 merged commit 450e7f6 into pymc-devs:main Feb 27, 2025
25 checks passed
@ricardoV94
Copy link
Member

Thanks @wd60622 and apologies for the delay

@wd60622 wd60622 deleted the data-as-observed branch February 27, 2025 10:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: data as observed in RV
3 participants