## How to know the unknowable in observational studies

1. Introduction
2. Problem Setup
2.1. Causal Graph
2.2. Model With and Without Z
2.3. Strength of Z as a Confounder
3. Sensitivity Analysis
3.1. Goal
3.2. Robustness Value
4. PySensemakr
5. Conclusion
6. Acknowledgements
7. References

The specter of unobserved confounding (aka omitted variable bias) is a notorious problem in observational studies. In most observational studies, unless we can reasonably assume that treatment assignment is as-if random as in a natural experiment, we can never be truly certain that we controlled for all possible confounders in our model. As a result, our model estimates can be severely biased if we fail to control for an important confounder–and we wouldn’t even know it since the unobserved confounder is, well, unobserved!

Given this problem, it is important to assess how sensitive our estimates are to possible sources of unobserved confounding. In other words, it is a helpful exercise to ask ourselves: how much unobserved confounding would there have to be for our estimates to drastically change (e.g., treatment effect no longer statistically significant)? Sensitivity analysis for unobserved confounding is an active area of research, and there are several approaches to tackling this problem. In this post, I will cover a simple linear method [1] based on the concept of partial that is widely applicable to a large spectrum of cases.

## 2.1. Causal Graph

Let us assume that we have four variables:

• Y: outcome
• D: treatment
• X: observed confounder(s)
• Z: unobserved confounder(s)

This is a common setting in many observational studies where the researcher is interested in knowing whether the treatment of interest has an effect on the outcome after controlling for possible treatment-outcome confounders.

In our hypothetical setting, the relationship between these variables are such that X and Z both affect D and Y, but D has no effect on Y. In other words, we are describing a scenario where the true treatment effect is null. As will become clear in the next section, the purpose of sensitivity analysis is being able to reason about this treatment effect when we have no access to Z, as we normally won’t since it’s unobserved. Figure 1 visualizes our setup.

Figure 1: Problem Setup

## 2.2. Model With and Without Z

To demonstrate the problem that our unobserved Z can cause, I simulated some data in line with the problem setup described above. You can refer to this notebook for the details of the simulation.

Since Z would be unobserved in real life, the only model we can normally fit to data is Y~D+X. Let us see what results we get if we run that regression.

Fuente