Patrick Burauel

post-doctoral scholar

California Institute of Technology

Biography

I’m a postdoc at Caltech working on causality with Frederick Eberhardt. I’m interested in how to address confounding in non-experimental data and how to aggregate causal variables. Though the methods we develop are domain-general, I focus on applications in economics. In my doctoral dissertation I explore synergies between research on causality in machine learning and economics. You can reach me at pburauel [at] caltech [dot] edu.

You can download my CV here.

Interests

Machine Learning
for Causal Inference
Causal Structure Discovery
Causal Representation Learning

Education

PhD in Economics, 2020
Free University Berlin / Berlin School of Economics
MSc in Economics, 2015
Paris School of Economics and Ecole Polytechnique

Current Projects

Deconfounding using the Information Bottleneck principle

Developing methods to disentangle causal from spurious factors of variation in observational data by leveraging machine learning tools (artificial neural networks and the information bottleneck principle) and known markers of confounding in observational data.

Novel Formalizations of the Principle of Independent Mechanisms

Developing a novel, formal interpretation of the Principle of Independent Mechanisms (causal, unlike spurious, relations are modular) to measure confounding in observational data. Joint with Michel Besserve (Max Planck Institute for Intelligent Systems)

Job Market Paper

Evaluating Instrument Validity using the Principle of Independent Mechanisms

published at Journal of Machine Learning Research

Abstract: The validity of instrumental variables to estimate causal effects is typically justified narratively and often remains controversial. Critical assumptions are difficult to evaluate since they involve unobserved variables. Building on Janzing and Schölkopf’s (2018) method to quantify a degree of confounding in multivariate linear models, we develop a test that evaluates instrument validity without relying on Balke and Pearl’s (1997) inequality constraints. Instead, our approach is based on the Principle of Independent Mechanisms, which states that causal models have a modular structure. Monte Carlo studies show a high accuracy of the procedure. We apply our method to two empirical studies: first, we can corroborate the narrative justification given by Card (1995) for the validity of college proximity as an instrument for educational attainment to estimate financial returns to education. Second, we cannot reject the validity of past savings rates as an instrument for economic development to estimate its causal effect on democracy (Acemoglu et al., 2008).

Working Papers

Controlling for discrete unmeasured confounding in nonlinear causal models

with Frederick Eberhardt, Michel Besserve

Abstract: Unmeasured confounding is a major challenge for identifying causal relationships from non-experimental data. Here, we propose a method that can accommodate unmeasured discrete confounding. Extending recent identifiability results in deep latent variable models, we show theoretically that confounding can be detected and corrected under the assumption that the observed data is a piecewise affine transformation of a latent Gaussian mixture model and that the identity of the mixture components is confounded. We provide a flow-based algorithm to estimate this model and perform deconfounding. Experimental results on synthetic and real-world data provide support for the effectiveness of our approach.

preprint available here

Data-driven definitions of macro-economic concepts: causal representation learning applied to economic complexity

with Frederick Eberhardt

Hidalgo and Hausmann (H&H) developed the Economic Complexity Index (ECI) as an aggregate measure of the diversity of the productive capabilities of a country. Based on the data from the Atlas of Economic Complexity, ECI is used to predict an economy’s long-run growth performance. We make three points: (1) Far simpler data-driven models achieve similar or better in- and out-of-sample predictive accuracy, while grouping economies differently than ECI does. (2) These simple models also generalize to predictions of inequality, with similar predictive accuracy as ECI. (3) To the extent that ECI should be used to underpin policy advice, it fails to distinguish between the cause of growth and the gradient of the cause of growth. We provide an analysis using simple data-driven techniques that could provide the basis for policy recommendations. Overall, our analysis calls into question the the standards by which ECI is considered to be a genuine macro economic quantity, and illustrates a framework that could be used to identify and evaluate data-driven macro economic quantities.

preprint available here

Testability of Reverse Causality Without Exogenous Variation

with Christoph Breunig, submitted to The Econometrics Journal

Abstract: This paper shows that testability of reverse causality is possible even in the absence of exogenous variation, such as in the form of instrumental variables. Instead of relying on exogenous variation, we achieve testability by imposing relatively weak model restrictions and exploiting that a dependence of residual and purported cause is informative about the causal direction. Our main assumption is that the true functional relationship is nonlinear and that error terms are additively separable. We extend previous results by incorporating control variables and allowing heteroskedastic errors. We build on reproducing kernel Hilbert space (RKHS) embeddings of probability distributions to test conditional independence and demonstrate the efficacy in detecting the causal direction in both Monte Carlo simulations and an application to German survey data.

link to arxiv version

The German Minimum Wage and Wage Growth: Heterogeneous Treatment Effects Using Causal Forests

with Carsten Schröder

Abstract: Previous research suggests that minimum wages induce heterogeneous treatment effects on wages across different groups of employees. This research usually defines groups ex ante. We analyze to what extent effect heterogeneities can be discerned in a data-driven manner by adapting the generalized random forest implementation of Athey et al (2019) in a difference-in-differences setting. Such a data-driven methodology allows detecting the potentially spurious nature of heterogeneities found in subgroups chosen ex-ante. The 2015 introduction of a minimum wage in Germany is the institutional background, with data of the Socio-economic Panel serving as our empirical basis. Our analysis not only reveals considerable treatment heterogeneities, it also shows that previously documented effect heterogeneities can be explained by interactions of other covariates.

link to SSRN