-
Notifications
You must be signed in to change notification settings - Fork 733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calculate CATE by using Q and F models from DML #946
Comments
Answering on behalf of Solis V: Your approach makes sense in terms of computational efficiency, but there are a few important considerations to keep in mind: Strengths of Your Approach Potential Issues & Limitations ⚠ Dependence on Correct Residualization – The performance of your simple regression (Y_res ~ T_res) depends on how well the residualization process removes confounding. If the full-sample models don’t account well for slice-specific effects, you might get misleading CATE estimates. ⚠ Weak Instrumentation in Some Slices – Some slices may have weak treatment variation (e.g., low overlap in propensity scores), leading to poor estimates of the residuals. You may want to check the strength of residualized treatment variation within each slice before proceeding. Suggestions for Improvement Final Verdict |
Hi Team,
I am working with an observational dataset and aim to estimate the causal impact. However, since I need to compute the causal effects for various data slices, re-training the DML (Double Machine Learning) model for each slice is not computationally feasible.
To address this, I am considering the following approach and would appreciate your feedback:
Train the DML model on the full dataset by specifying all relevant features in the W argument. The X argument will be set to None as no confounders are explicitly defined.
From the trained DML model, extract the G and Q models as described in the methodology outlined (https://econml.azurewebsites.net/spec/estimation/dml.html#overview-of-formal-methodology).
For each data slice, leverage the pre-trained G and Q models to compute the Conditional Average Treatment Effect (CATE). Specifically, I plan to fit a simple regression model, regressing the Y_res on the T_res for the respective slice to estimate θ (the causal effect).
Does this workflow seem sound to you? Are there any potential issues or limitations with this approach?
Thank you in advance for your insights!
The text was updated successfully, but these errors were encountered: