Calculate CATE by using Q and F models from DML #946

vnwbd · 2025-01-17T15:31:48Z

Hi Team,

I am working with an observational dataset and aim to estimate the causal impact. However, since I need to compute the causal effects for various data slices, re-training the DML (Double Machine Learning) model for each slice is not computationally feasible.

To address this, I am considering the following approach and would appreciate your feedback:

Train the DML model on the full dataset by specifying all relevant features in the W argument. The X argument will be set to None as no confounders are explicitly defined.
From the trained DML model, extract the G and Q models as described in the methodology outlined (https://econml.azurewebsites.net/spec/estimation/dml.html#overview-of-formal-methodology).

For each data slice, leverage the pre-trained G and Q models to compute the Conditional Average Treatment Effect (CATE). Specifically, I plan to fit a simple regression model, regressing the Y_res on the T_res for the respective slice to estimate θ (the causal effect).
Does this workflow seem sound to you? Are there any potential issues or limitations with this approach?

Thank you in advance for your insights!

kosiew · 2025-01-20T08:40:43Z

Answering on behalf of Solis V:

Your approach makes sense in terms of computational efficiency, but there are a few important considerations to keep in mind:

Strengths of Your Approach
✅ Computational Efficiency – Training the DML model once and reusing the nuisance models (G, Q) across slices avoids redundant training, which is crucial when working with large datasets.
✅ Conceptual Validity – If the G (outcome model) and Q (treatment model) capture the relevant heterogeneity well, using them for CATE estimation across slices should, in principle, work.

Potential Issues & Limitations
⚠ Misspecification Risk – If treatment effects vary significantly across slices and your pre-trained G and Q models do not sufficiently capture this heterogeneity, you may introduce bias in CATE estimates. The models are trained on the full dataset, so they might not adapt well to specific slices.

⚠ Dependence on Correct Residualization – The performance of your simple regression (Y_res ~ T_res) depends on how well the residualization process removes confounding. If the full-sample models don’t account well for slice-specific effects, you might get misleading CATE estimates.

⚠ Weak Instrumentation in Some Slices – Some slices may have weak treatment variation (e.g., low overlap in propensity scores), leading to poor estimates of the residuals. You may want to check the strength of residualized treatment variation within each slice before proceeding.

Suggestions for Improvement
🔹 Diagnostics & Validation – Before fully committing, you might want to compare the slice-specific CATE estimates obtained via your method to those from a fully re-trained DML model (for a few key slices) to check for discrepancies.
🔹 Flexible G & Q Models – If feasible, consider using models that allow for interactions between covariates and treatment effects to better capture heterogeneity.
🔹 Alternative Estimation Methods – Instead of simple regression (Y_res ~ T_res), you could explore non-parametric approaches like kernel regression or local linear regression to ensure robustness.

Final Verdict
Your approach is reasonable given computational constraints, but it comes with risks related to model generalization across slices. If you ensure that your residual models (G, Q) sufficiently capture heterogeneity and validate your estimates against a baseline, it could be a practical solution. 🚀

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculate CATE by using Q and F models from DML #946

Calculate CATE by using Q and F models from DML #946

vnwbd commented Jan 17, 2025 •

edited

Loading

kosiew commented Jan 20, 2025 •

edited

Loading

Calculate CATE by using Q and F models from DML #946

Calculate CATE by using Q and F models from DML #946

Comments

vnwbd commented Jan 17, 2025 • edited Loading

kosiew commented Jan 20, 2025 • edited Loading

vnwbd commented Jan 17, 2025 •

edited

Loading

kosiew commented Jan 20, 2025 •

edited

Loading