Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V0.4.2 #61

Merged
merged 9 commits into from
Jan 23, 2019
Merged

V0.4.2 #61

merged 9 commits into from
Jan 23, 2019

Conversation

pzivich
Copy link
Owner

@pzivich pzivich commented Jan 11, 2019

Updating to version 0.4.2

  • TMLE calculates all implemented measures (RD, RR, OR) at once. Saves computation time if the user wants multiple measures and is fitting a time-intensive ML algorithm TMLE additions #39

  • Fixing standard_mean_difference for categorical variables. Will need to use patsy to parse out categorical variables via C(...), otherwise "I" cannot know which variables are categorical Update standardized difference calculator #59

  • Adding stochastic treatments for binary exposures for TimeFixedGFormula via fit_stochastic() Allows for both unconditional and conditional probablities Stochastic Treatment g-formula #58

  • Add continuous outcomes to TimeFixedGFormula but this time actually get it to work as intended. Need to think through the set up for the two potential interventions (threshold vs. shift) delaying this again... Support for continuous treatments in TimeFixedGFormula #49

  • Also learn ReadTheDocs for the Reference papge (still not doing what I hoped it would)

p=g[self._weights]/np.sum(g[self._weights]))
else: # Conditional probabilities based on eval()
# TODO Checking conditionals are mutually exclusive
if self._weights is None:
Copy link
Collaborator

@CamDavidsonPilon CamDavidsonPilon Jan 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I squint as these two pieces of code, they are about identical. I don't know if you have additional plans for these branches, but what about:

treated = []
for c, prop in zip(conditional, p):
    gs = g.loc[eval(c)].copy()
    p = gs[self._weights] / np.sum(gs[self._weights]) if self._weights is not None else None
    tr = np.random.choice(gs.index, size=int(prop*gs.shape[0]), replace=False, p=p)
    treated.extend(tr)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto with the conditional above too

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh good call. I remember now that is how lifelines is set up. It is much more condensed and avoids my needless repitition

if self._weights is None:
treated = []
for c, prop in zip(conditional, p):
gs = g.loc[eval(c)].copy()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts on this: it's going to fail if your conditional doesn't use the magic "g" variable. Maybe you want that? It's not obvious to the user though. An option to consider, if you haven't already, is the native query method in Pandas. That has a) documentation that people can rely on b) probably some cleaner error handling/debugging for users.

There could be a even better solution that doesn't involve passing in strings, but nothing is obvious to me.

Copy link
Owner Author

@pzivich pzivich Jan 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah so the magic-g is something baked into custom treatments for both TimeFixedGFormula and TimeVaryGFormula but I agree that it is not immediately obvious. It can get confusing in TimeVaryGFormula (see the Keil et al 2014 replication for how the magic-g is consistently used across the process).

query looks to be what I am looking for. I was previously unaware of its existence. I think that is useful for what I am doing

There could be a even better solution that doesn't involve passing in strings, but nothing is obvious to me.

I haven't found one especially since I would like to retain maximal flexibility in the potential interventions that could be used in the g-formula. For example, you should be able to specify a treatment that only is applied to females at least 30 years old.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened #63 for this issue. As it currently stands, all the custom treatments/stochastic treatments use this functionality. I plan on looking into better options. My bandage for now is to make the documentation better and try to handle errors better. pd.query does not return exactly what I need. pd.eval looks promising though

This would be updated as part of a more major update (maybe 0.5.0 if pd.eval is what I want)

@pzivich
Copy link
Owner Author

pzivich commented Jan 22, 2019

Standardized differences now calculate for all variables #59. However, the online docs and examples need to be updated....

@pzivich
Copy link
Owner Author

pzivich commented Jan 22, 2019

Build failure is due to table 1 generator function. Not related to IPTW or other updates

@pzivich pzivich merged commit 58d4f98 into master Jan 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants