How to know the unknowable in observational studies
- Problem Setup
2.1. Causal Graph
2.2. Model With and Without Z
2.3. Strength of Z as a Confounder
- Sensitivity Analysis
3.2. Robustness Value
The specter of unobserved confounding (aka omitted variable bias) is a notorious problem in observational studies. In most observational studies, unless we can reasonably assume that treatment assignment is as-if random as in a natural experiment, we can never be truly certain that we controlled for all possible confounders in our model. As a result, our model estimates can be severely biased if we fail to control for an important confounder–and we wouldn’t even know it since the unobserved confounder is, well, unobserved!
Given this problem, it is important to assess how sensitive our estimates are to possible sources of unobserved confounding. In other words, it is a helpful exercise to ask ourselves: how much unobserved confounding would there have to be for our estimates to drastically change (e.g., treatment effect no longer statistically significant)? Sensitivity analysis for unobserved confounding is an active area of research, and there are several approaches to tackling this problem. In this post, I will cover a simple linear method  based on the concept of partial R² that is widely applicable to a large spectrum of cases.
2.1. Causal Graph
Let us assume that we have four variables:
- Y: outcome
- D: treatment
- X: observed confounder(s)
- Z: unobserved confounder(s)
This is a common setting in many observational studies where the researcher is interested in knowing whether the treatment of interest has an effect on the outcome after controlling for possible treatment-outcome confounders.
In our hypothetical setting, the relationship between these variables are such that X and Z both affect D and Y, but D has no effect on Y. In other words, we are describing a scenario where the true treatment effect is null. As will become clear in the next section, the purpose of sensitivity analysis is being able to reason about this treatment effect when we have no access to Z, as we normally won’t since it’s unobserved. Figure 1 visualizes our setup.
Figure 1: Problem Setup
2.2. Model With and Without Z
To demonstrate the problem that our unobserved Z can cause, I simulated some data in line with the problem setup described above. You can refer to this notebook for the details of the simulation.
Since Z would be unobserved in real life, the only model we can normally fit to data is Y~D+X. Let us see what results we get if we run that regression.
Based on these results, it seems like D has a statistically significant effect of 0.2686 (p<0.001) per one unit change on Y, which we know isn’t true based on how we generated the data (no D effect).
Now, let’s see what happens to our D estimate when we control for Z as well. (In real life, we of course won’t be able to run this additional regression since Z is unobserved but our simulation setting allows us to peek behind the curtain into the true data generation process.)
As expected, controlling for Z correctly removes the D effect by shrinking the estimate towards zero and giving us a p-value that is no longer statistically significant at the 𝛼=0.05 threshold (p=0.059).
2.3. Strength of Z as a Confounder
At this point, we have established that Z is strong enough of a confounder to eliminate the spurious D effect since the statistically significant D effect disappears when we control for Z. What we haven’t discussed yet is exactly how strong Z is as a confounder. For this, we will utilize a useful statistical concept called partial R², which quantifies the proportion of variation that a given variable of interest can explain that can’t already be explained by the existing variables in a model. In other words, partial R² tells us the added explanatory power of that variable of interest, above and beyond the other variables that are already in the model. Formally, it can be defined as follows
where RSS_reduced is the residual sum of squares from the model that doesn’t include the variable(s) of interest and RSS_full is the residual sum of squares from the model that includes the variable(s) of interest.
In our case, the variable of interest is Z, and we would like to know what proportion of the variation in Y and D that Z can explain that can’t already be explained by the existing variables. More precisely, we are interested in the following two partial R² values
where (1) quantifies the proportion of variance in Y that can be explained by Z that can’t already be explained by D and X (so the reduced model is Y~D+X and the full model is Y~D+X+Z), and (2) quantifies the proportion of variance in D that can be explained by Z that can’t already be explained by X (so the reduced model is D~X and the full model is D~X+Z).
Now, let us see how strongly associated Z is with D and Y in our data in terms of partial R².
It turns out that Z explains 16% of the variation in Y that can’t already be explained by D and X (this is partial R² equation #1 above), and 20% of the variation in D that can’t already be explained by X (this is partial R² equation #2 above).
As we discussed in the previous section, unobserved confounding poses a problem in real research settings precisely because, unlike in our simulation setting, Z cannot be observed. In other words, we are stuck with the model Y~D+X, having no way to know what our results would have been if we could run the model Y~D+X+Z instead. So, what can we do?
Intuitively, a reasonable sensitivity analysis approach should be able to tell us that if a Z such as the one we have in our data were to exist, it would nullify our results. Remember that our Z explains 16% of the variation in Y and 20% of the variation in D that can’t be explained by observed variables. Therefore, we expect sensitivity analysis to tell us that a hypothetical Z-like confounder of similar strength would be enough to eliminate the statistically significant D effect.
But how can we calculate that the unobserved confounder’s strength should be in this 16–20% range in the partial R² scale without ever having access to it? Enter robustness value.
3.2. Robustness Value
Robustness value (RV) formalizes the idea we mentioned above of determining the necessary strength of a hypothetical unobserved confounder that could nullify our results. The usefulness of RV emanates from the fact that we only need our observable model Y~D+X and not the unobservable model Y~D+X+Z to be able to calculate it.
Formally, we can write down as follows the RV that quantifies how strong unobserved confounding needs to be to change our observed statistical significance of the treatment effect (if the notation is too much to follow, just remember the key idea that the RV is a measure of the strength of confounding needed to change our results)
- 𝛼 is our chosen significance level (generally set to 0.05 or 5%),
- q determines the percent reduction q*100% in significance that we care about (generally set to 1, since we usually care about confounding that would reduce statistical significance by 1*100%=100% hence rendering it not statistically significant),
- t_betahat_treat is the observed t-value of our treatment from the model Y~D+X (which is 8.389 in this case as can be seen from the regression results above),
- df is our degrees of freedom (which is 1000–3=997 in this case since we simulated 1000 samples and are estimating 3 parameters including the intercept), and
- t*_alpha,df-1 is the t-value threshold associated with a given 𝛼 and df-1 (1.96 if 𝛼 is set to 0.05).
We are now ready to calculate the RV in our own data using only the observed model Y~D+X (res_ydx).
It is by no struck of luck that our RV (18%) falls right in the range of the partial R² values we calculated for Y~Z|D,X (16%) and D~Z|X (20%) above. What the RV is telling us here is that, even without any explicit knowledge of Z, we can still reason that any unobserved confounder needs, on average, at least 18% strength in the partial R² scale vis-à-vis both the treatment and the outcome to be able to nullify our statistically significant result.
The reason why the RV isn’t 16% or 20% but falls somewhere in between (18%) is that it is designed to be a single number that summarizes the necessary strength of the confounder with both the outcome and the treatment, so 18% makes perfect sense given what we know about the data. You can think about it like this: since the method doesn’t have access to the actual numbers 16% and 20% when calculating the RV, it is doing its best to quantify the strength of the confounder by assigning 18% to both partial R² values (Y~Z|D,X and D~Z|X), which isn’t too far off from the truth at all and actually does a great job summarizing the strength of the confounder.
Of course, in real life we won’t have the Z variable to double check that our RV is correct, but seeing how the two results align here should at least give you some confidence in the method. Finally, once we calculate the RV, we should think about whether an unobserved confounder of that strength is plausible. In our case, the answer is ‘yes’ because we have access to the data generation process, but for your specific real-life application, the existence of such a strong confounder might be an unreasonable assumption. This would be good news for you since no realistic unobserved confounder could drastically change your results.
The sensitivity analysis technique described above has already been implemented with all of its bells and whistles as a Python package under the name PySensemakr (R, Stata, and Shiny App versions exist as well). For example, to get the exact same result that we manually calculated in the previous section, we can simply run the following code chunk.
Note that “Robustness Value, q = 1 alpha = 0.05” is 0.184, which is exactly what we calculated above. In addition to the RV for statistical significance, the package also provides the RV that is needed for the coefficient estimate itself to shrink to 0. Not surprisingly, unobserved confounding needs to be even larger for this to happen (0.233 vs 0.184).
The package also provides contour plots for the two partial R² values, which allows for an intuitive visual display of sensitivity to possible levels of confounding with the treatment and the outcome (in this case, it shouldn’t be surprising to see that the x/y-axis value pairs that meet the red dotted line include 0.18/0.18 as well as 0.20/0.16).
One can even add benchmark values to the contour plot as proxies for possible amounts of confounding. In our case, since we only have one observed covariate X, we can set our benchmarks to be 0.25x, 0.5x and 1x as strong as that observed covariate. The resulting plot tells us that a confounder that is half as strong as X should be enough to nullify our statistically significant result (since the “0.5x X” value falls right on the red dotted line).
Finally, I would like to note that while the simulated data in this example used a continuous treatment variable, in practice the method works for any kind of treatment variable including binary treatments. On the other hand, the outcome variable technically needs to be a continuous one since we are operating in the OLS framework. However, the method can still be used even with a binary outcome if we model it using OLS (this is called a LPM ).
The possibility that our effect estimate may be biased due to unobserved confounding is a common danger in observational studies. Despite this potential danger, observational studies are a vital tool in data science because randomization simply isn’t feasible in many cases. Therefore, it is important to know how we can address the issue of unobserved confounding by running sensitivity analyses to see how robust our estimates are to potential such confounding.
The robustness value method by Cinelli and Hazlett discussed in this post is a simple and intuitive approach to sensitivity analysis formulated in a familiar linear model framework. If you are interested in learning more about the method, I highly recommend taking a look at the original paper and the package documentation where you can learn about many more interesting applications of the method such as ‘extreme scenario’ analysis.
There are also many other approaches to sensitivity analysis for unobserved confounding, and I would like briefly mention some of them here for readers who would like to continue learning more on this topic. One versatile technique is the E-value developed by VanderWeele and Ding that formulates the problem in terms of risk ratios  (implemented in R here). Another technique is the Austen plot developed by Veitch and Zaveri based on the concepts of partial R² and propensity score  (implemented in Python here), and yet another recent approach is by Chernozhukov et al  (implemented in Python here).
I would like to thank Chad Hazlett for answering my question related to using the method with binary outcomes and Xinyi Zhang for providing a lot of valuable feedback on the post. Unless otherwise noted, all images are by the author.
 C. Cinelli and C. Hazlett, Making Sense of Sensitivity: Extending Omitted Variable Bias (2019), Journal of the Royal Statistical Society
 J. Murray, Linear Probability Model, Murray’s personal website
 T. VanderWeele and P. Ding, Sensitivity Analysis in Observational Research: Introducing the E-Value (2017), Annals of Internal Medicine
 V. Veitch and A. Zaveri, Sense and Sensitivity Analysis: Simple Post-Hoc Analysis of Bias Due to Unobserved Confounding (2020), NeurIPS
 V. Chernozhukov, C. Cinelli, W. Newey, A. Sharma, and V. Syrgkanis, Long Story Short: Omitted Variable Bias in Causal Machine Learning (2022), NBER