-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ParetoNBDModel #177
Add ParetoNBDModel #177
Conversation
Codecov Report
@@ Coverage Diff @@
## main #177 +/- ##
==========================================
+ Coverage 94.12% 94.17% +0.05%
==========================================
Files 17 19 +2
Lines 919 1168 +249
==========================================
+ Hits 865 1100 +235
- Misses 54 68 +14
|
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
I could use some help resolving an array broadcasting issue in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good progress @ColtAllen! Some comments for now and I can revisit once you mark this as ready for review
# TODO: Edit docstrings | ||
def expected_purchases( | ||
self, | ||
future_t: Union[float, np.ndarray, pd.Series, TensorVariable], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that we use t
in the BetaGeoModel
. Would future_t
be more explicit? Not opposed to the nomenclature, but it would be good to also change the other methods in corresponding models.
self._customer_id = customer_id | ||
self._frequency = frequency | ||
self._recency = recency | ||
self._T = T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that our other models have the same instance variable names, but without the underscore:
pymc-marketing/pymc_marketing/clv/models/beta_geo.py
Lines 119 to 122 in d489a88
self.customer_id = customer_id | |
self.frequency = frequency | |
self.recency = recency | |
self.T = T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The underscores indicates these are internal class attributes. Not strictly required unless one is a stickler for Python PEP conventions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey! The mypy
errors should be easy to fix. Most of them come from:
- Need to add
None
insideUnion
if they can beNone
- Make sure signatures are consistent, especially if variables can (or should not) be
None
. - Explicitly set the variable types before unpacking them.
😄
@juanitorduz Thanks! I've whittled the
This comes up with
This comes up because the predictive methods call |
Yay! mypy and test are green 🟢 🙌 |
@ColtAllen I changed the order of I included the rewrite that speeds up the gradient evaluation when doing NUTS, making it useable. The developer notebook is now running NUTS (the pymc-experimental one, the manual impl still runs Slice), and I removed the I think your PR may have picked commits from main that do not belong here (looking at the file diff on Github). If that's the case, the best is perhaps to open a new PR with the final changes in a single commit. In the future I suggest you rebase from main instead of merging, to avoid this issue. It's also better practice to keep squashing useless commits instead of letting the PR grow to 100s of commits. Each commit in the end should correspond to one logical self-contained change (eg. all tests should in theory pass after each commit, or in other words, the state of the PR should not be broken in intermediate commits). That's just a suggestion for future PRs of course. For now, do you mind opening a new PR with only the final changes? If I am mistaken and this PR is not picking extra commits from And as always, great work! |
Thanks @ricardoV94! One last thing before I create a new PR - the CodeCov check is failing for the new additions, and I'm not quite sure how to go about writing unit tests for these |
The codecov is sometimes flaky, there's no way the new PyTensor stuff isn't being tested. |
Could this be due to the
Do you know which commits from |
OMG sorry if I messed up 😩! I think the suggested changes where small. |
Added pytest fixtures for cdnow sample and master summaries removed adhoc dataset creation script added uml diagrams to docs/source/_static/ All ParetoNBDModel methods added and TODOs created Added ParetoNBDRV and renamed ParetoNBDAggregate Added ParetoNBD distribution class Tests added for ParetoNBD distro class FAIL flake8: Test boilerplate and TODOs added for draft PR Updated docstring references Removed UML files dev notebook WIP fixed logp in notebook Notebook commit Revert "Updated docstring references" This reverts commit 81aa384. Revert all commits on distributions.py Revert "Tests added for ParetoNBD distro class" This reverts commit 2984459. Reverting all distributions.py commits Revert "Added ParetoNBD distribution class" This reverts commit 05ef483. Revert all distributions.py commits Revert "Added ParetoNBDRV and renamed ParetoNBDAggregate" This reverts commit da6bd1c. Revert all distributions.py commits Update docstring references Updated todos and added future methods removed new methods for future PR WIP test framework Fix missing underscore prefix Rename control coordinate Fix Gamma-Gamma example Implement CLV base method `_check_prior_ndim` and `_process_prior` Add Individual Shifted Beta Geometric (sBG) model Relax matplotlib dependency (#179) * Relax matplotlib dependency * ensure no conflict with numpy * simplify requirements bump version to 0..04 Update Makefile syntax revisions rewrote probability_alive test framework edits Remove unnecessary dependency on pymc test util changed method names and WIP MAP testing MMM `data_df` param renamed to `data` (#186) Fixing minor typos in README (#195) Update README.md (#194) Rename fitting_method to fit_method Speedup BetaGeoModel tests Test MAP convergence for BetaGeoModel Make docs footer smaller and more responsive Use numerically stable logp for BG/NBD rename in gamma-gamma model WIP revised tests and priors min_max_scaler -> max_abs_scaler for target variable Change ROAS scale in MMM example Implement vectorized adstock transformations Add more detail to the guide for contributors (#229) Extend documentation Add rtd-link-preview github action Bump PyMC dependency Bump version to 0.1.0 Removed CDNOW datasets WIP purchase_probability Fix broadcasting issue in purchase_probability Add xarray-einstats as a dependency Add all-checks job to facilitate branch protection rules Test on oldest PyMC version Add links from CLV/MMM intros to relevant notebooks (#238) Updating PyMC requirement in pyproject.toml and ci Fix LaTeX representation tests w.r.t. PyMC and PyTensor updates add contribution curves over time (#247) * add contribution curves over time * remove unused variable * unit test for new breakdown plot * added example * black * another testcase * black add ref (#249) Install via conda-forge package Update installation instructions in docs scale contributions to original scale and allow custom colors add original scale flag mypy init improve hints mmm fix mmm types mmm last fixes update black exclude folders mypy add some clv type hints some clv type hints improvements utils types done base type hints final fixes add more type tests revert doctring changes fix test where param is none Improve installation instructions in README Fix the discrepancy introduced in #253 between README.md and docs/source/index.md fixed xarray-einstats dependency version in toml Added t=0 test case for purchase_probability Reduced redundant code with _process_customers and _logp Added test for unique customer_id mypy check importable from root clv module Added mypy type overrides to distro blocks Notebook testing notebook testing Cleaned up dev notebook WIP docstrings, tests, param names add yearly seasonality to mmm proposal update nb and fix imlpementation add colourful tests ;) fix tests add more test fit cases refactor component plots method re-run example nb fix mypy fix test to avoid unnecessary fitting improve nb fix bug plotting functon improve notebook Improve docstrings on CLV `freq` argument Set `freq="W"` in CLV quickstart notebook changed from _ to - in pypi.yml Fixed test_model_convergence Add PyMC Marketing logo to the repository and documentation (#261) * add logo to README * remove horizontal line * add light and dark logo to readthedocs fix wrong filename for logo in README (#262) Added example Pandas code to dev notebook Docstring edits Notebook and docstring edits, experimental warning Added lifetimes citations Removed resolved TODOs Docs build fix attempt method names, type hinting, docstring and test edits ignored edge case numerical errors in purchase_prob fixed most mypy errors docs code example attempted fix fixed merge conflicts in clv/distributions another docs fix attempt docstring edits and fixed mypy errors changed default prior params Reorder frequency and recency arguments to match other models Add rewrite to speedup Hyp2F1 gradient
Ah, that was my crude resolution of a merge conflict. I can't make any changes until I fix my dev environment per #305, but I'll go ahead and get this PR re-opened. |
Seems I screwed up the rebase 😕 Here's what I did: In the
@ricardoV94 Should I have done something with |
I am not sure what you can do when you get to this stage. In the future to avoid getting here, my suggested workflow is:
When you get to this stage, I suggest you just copy paste the changes in a fresh branch from |
For git reflog: https://stackoverflow.com/a/10099285 |
Let's go with the nuclear option then. I'll open a new PR with the backup branch I created, but if the changes still need to be copy/pasted over manually, I also created |
This PR closes #127.
Outstanding Issues
mypy
CI errorsUML Diagram
Important Changes
I've marked this model as experimental because NUTS is having convergence errors due to
pytensor.Tensor.hyp2f1
in the modellogp
. @ricardoV94 has been trying out some improvements for that function.I renamed the predictive methods in this model to be more concise:
expected_purchases
expected_purchases_new_customer
expected_probability_alive
expected_purchase_probability
However, these are not consistent with the equivalent method names in
BetaGeoModel
. These names need to be standardized for plotting functionality, so it's important we're in agreement on the naming conventions.xarray-einstats>=0.5.1
is now a required library dependency.Customer input data arrays are also assigned to class attributes as default predictive arguments in this model, making life easier for end users unless they want to run predictions on new customers.
I also added some additional utility methods which can be used in other CLV models if moved to
CLVModel
, and would go towards resolving #182.Unlike the other CLV models,
ParetoNBDModel
uses a dedicatedParetoNBD
distribution block for thelogp
. My rationale for this is that the distribution class will append anobserved
column for customer frequency and recency inputs to the returnedarviz.InferenceData
object, and enable additional diagnostic plots like the Bayesian p-value and PPC plots once those methods are added:https://python.arviz.org/en/stable/examples/index.html#Model%20Checking
I've documented some future additions to this model in #183 since this PR already has enough to review.