DoubleGen - Debiased Generative Modeling of Counterfactuals

Best AI papers explained - A podcast by Enoch H. Kang

Categories:

The academic paper introduces **DoubleGen**, a novel, doubly robust framework designed to adapt standard generative models—such as diffusion models, flow matching, and autoregressive language models—to generate **counterfactual data**. Unlike existing methods that are only singly robust and susceptible to bias if auxiliary models are misspecified, DoubleGen remains valid if either the propensity score or the outcome model is correctly specified. The research addresses the challenge of **confounding** in observational data, where models trained naively might internalize skewed relationships, leading to inaccurate counterfactual predictions (e.g., predicting outcomes if everyone received a new treatment). The authors provide **theoretical guarantees**, including minimax rate optimality for DoubleGen diffusion models, and demonstrate the framework's effectiveness and **robustness to misspecification** through experiments generating counterfactual celebrity faces and product reviews.