Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with the estimand generated #167

Open
srtaheri opened this issue Aug 28, 2023 · 4 comments
Open

Issue with the estimand generated #167

srtaheri opened this issue Aug 28, 2023 · 4 comments
Assignees

Comments

@srtaheri
Copy link
Collaborator

srtaheri commented Aug 28, 2023

The estimand formula for a simple graphical example fails. Here is the code:

from y0.dsl import X, Y
from y0.graph import NxMixedGraph
from y0.algorithm.identify import identify, Identification

graph = NxMixedGraph.from_str_edges(
    directed=[
        ("Z", "X"),
        ("Z", "Y"),
        ("X", "M"),
        ("M", "Y"),
    ],
)

graph.draw()

estimand = identify(
    Identification.from_parts(
        graph=graph, outcomes={Y}, treatments={X}
    )
)
estimand

Here is the output:

$$\sum_{M,Z} P(Y|M,X,Z) P(M|X,Z) \sum \sum_{M,X,Y} P(M,X,Y,Z)$$

The issues are,

  1. The value of Y should not be summed over.

  2. The estimand should either use the back-door estimand which does not contain $M$ such as this:

$$\sum_{Z} P(Y|X,Z) P(Z)$$

or the front-door estimand which does not use $Z$. Such as this:

$$\sum_{M} P(M|X) \sum_{X'} P(Y|X', M) P(X')$$

I think it can't contain both Z and M simultaneously in the same formula.

@cthoyt
Copy link
Member

cthoyt commented Aug 29, 2023

@srtaheri I wonder if you are using an old version of y0, we fixed this empty sum problem in #159. My results are:

$\sum_{M, Z} P(Y | M, X, Z) \sum_{M, X, Y} P(M, X, Y, Z) P(M | X, Z)$

I wonder if there are parts of the ID algorithm where we can do some bookkeeping to eliminate intermediate variables. Is it possible to show through symbolic manipulation starting with the equation that I just wrote that it is the same as the one you propose?

@srtaheri
Copy link
Collaborator Author

srtaheri commented Sep 11, 2023

@cthoyt A modified version of the formula that you provided is correct:

$$ \sum_{M,Z} P(Y|M,X,Z) P(M|X,Z) \sum_{M,X,Y} P(Z,M,X,Y) $$

Which is equal to:

$$ \sum_{M,Z} P(Y|M,X,Z) P(M|X,Z) P(Z) = \sum_{Z} P(Y|X,Z) P(Z) $$

The last expression is the back-door estimand.

When M is summed out, we shouldn't put it in the formula and provide a more complex estimand, when in reality the value of M is not important and does not show in the final estimand. I suggest to print out the final, simplified estimand

@cthoyt
Copy link
Member

cthoyt commented Sep 18, 2023

Can you clarify on the rules that you used to collapse that sum down?

Adding canonicalize takes care of the first sum simplification, but not the second. See:

from y0.graph import NxMixedGraph
from y0.algorithm.identify import identify, Identification
from y0.dsl import X, Y
from y0.mutate.canonicalize_expr import canonicalize


graph = NxMixedGraph.from_str_edges(
    directed=[
        ("Z", "X"),
        ("Z", "Y"),
        ("X", "M"),
        ("M", "Y"),
    ],
)

estimand = identify(
    Identification.from_parts(
        graph=graph, outcomes={Y}, treatments={X}
    )
)
canonicalize(estimand)

Gives $\sum\limits_{M, Z} P(M | X, Z) P(Y | M, X, Z) P(Z)$

@srtaheri
Copy link
Collaborator Author

srtaheri commented Sep 20, 2023

So I think we can compute it like this:

$\sum\limits_{M, Z} P(M | X, Z) P(Y | M, X, Z) P(Z)$

$= \sum_{Z} P(Z) \sum_{M} P(M|X,Z) P(Y|M,X,Z)$

$= \sum_{Z} P(Z) \sum_{M} \frac{P(M,X,Z)}{P(X,Z)} \frac{P(Y,M,X,Z)}{P(M,X,Z)}$

$= \sum_{Z} \frac{P(Z)}{P(X,Z)} \sum_{M} P(Y,M,X,Z)$

$= \sum_{Z} \frac{P(Z)}{P(X,Z)} P(Y,X,Z)$

$= \sum_{Z} \frac{P(Z)}{P(X,Z)} P(Y|X,Z) P(X,Z)$

$= \sum_{Z} P(Z) P(Y|X,Z)$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants