Chapter 5 Observational Studies

We occasionally try to estimate causal effects using data from non-randomized, observational sources.

When we do so, we seek to embody the principle of “design before analysis” and the techniques in Rubin (2007). These techniques include designing with no outcomes in sight and pre-registering analyses where possible.

5.1 Matching

There are many algorithms for producing matched samples.⁷ By default, we implement several matching procedures and select the one that yields the best balance for our analysis. We developed this approach for our evaluation of a high school internship program; see here for details of that application.

5.1.1 Methods

By default, we implement

optimal full nearest-neighbor matching
optimal full nearest-neighbor matching with calipers
(if the optimal methods above cannot be implemented, we instead use generalized full matching)
genetic matching using the Mahalanobis distance, without replacement, and a 1-1 treatment-control ratio,
genetic matching using the Mahalanobis distance, without replacement, and a 1-2 treatment-control ratio,
genetic matching including the propensity score as a covariate, using the Mahalanobis distance, without replacement, calipers, and a 1-1 treatment-control ratio,
genetic matching including the propensity score as a covariate, using the Mahalanobis distance, without replacement, calipers, and a 1-2 treatment-control ratio,

If we encounter issues estimating the propensity score, where needed, we will exclude variables that are highly correlated with others. Our default procedure for doing so is to

include all of the covariates of interest;
if the propensity score model does not estimate due to highly-correlated variables, calculate the pairwise correlations between all covariates;
using that pairwise correlation, identify the most correlated pair, randomly exclude one of the two covariates, and re-estimate the propensity score model;
continue until enough variables are excluded to successfully estimate the propensity score.

5.1.2 Selecting the Method for Analysis

We select a matching method and parameters for that method for primary analysis based on which method minimizes the worst imbalance on a covariate. We use the standardized mean difference to assess balance in each covariate, with the control group standard deviation as the denominator.

For example, comparing two methods for three covariates with imbalance scores (where lower scores indicate better balance) of

Table 5.1: Comparing Matching Methods using Variables’ Balance Scores
Method	x1	x2	x3
Method 1	0.6	0.9	0.5
Method 2	0.8	0.7	0.6

we select Method 2. Its worst imbalance (0.8 on x1) is better than the worst balance of Method 1 (0.9 on x2).

5.1.3 Calipers

5.1.4 Alternatives

Depending on our data structures, we may consider

propensity score matching (Rosenbaum and Rubin 1983),
Mahalanobis distance matching (Rubin 1980),
genetic matching (Diamond and Sekhon 2013), or
coarsened exact matching (Iacus, King, and Porro 2012).

References

Diamond, Alexis, and Jasjeet S Sekhon. 2013. “Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies.” Review of Economics and Statistics 95 (3): 932–45.

Iacus, Stefano M., Gary King, and Giuseppe Porro. 2012. “Causal Inference Without Balance Checking: Coarsened Exact Matching.” Political Analysis 20 (1): 1–24.

Rosenbaum, Paul R, and Donald B Rubin. 1983. “The Central Role of the Propensity Score in Observational Studies for Causal Effects.” Biometrika 70 (1): 41–55.

Rubin, Donald B. 1980. “Bias Reduction Using Mahalanobis-Metric Matching.” Biometrics, 293–98.

———. 2007. “The Design Versus the Analysis of Observational Studies for Causal Effects: Parallels with the Design of Randomized Trials.” Statistics in Medicine 26: 20–36.

There are also many ways to estimate the conditional expectation function, of which ordinary least squares is only one.↩︎