Causal Mediation: A Brief Overview

Oisín Fitzgerald, March 2021

I recently worked on the meta-analysis of a set of randomised control trials (RCT), and as part of this carried out a series of causal mediation analyses. I'd never thought about mediation in an RCT setting before. It was interesting to realise that even in a perfectly designed experiment with full randomisation of the treatment there is a need to think carefully about risk of confounding.

Introduction

As described by Judea Pearl causal mediation is an attempt to explain how nature works. It attempts to quantify the extent to which the effect of an action (the treatment) on a outcome of interest can be explained by a particular mechanism (the mediator). Of course, the extent to which we are "explaining nature" is necessarily limited by the data available. As an example exercise may reduce an individuals risk of dementia because it reduces blood pressure. A directed acyclic graph that encodes this structure is shown below. This graph says that exercise impacts risk of dementia indirectly through changes in blood pressure and also directly, possibly through some other unmeasured mechanism.

 Example of mediation.
Example of mediation.

There are several effects of interest in a mediation analysis, relating to which pathway (direct/indirect) and node (treatment/mediator) we wish to consider intervening on and if we want to imagine keeping some aspect of the treatment fixed at a baseline/control level. Some notation, there are two treatment levels A{0,1}A \in \{0,1\} with an outcome YY, and potential outcome Y(a)Y(a), the outcome observed if we set A=aA = a. By consistency, in our observed data Y=Y(1)Y = Y(1) if A=1A = 1 and similarly for A=0A = 0. The mediator MM also has potential outcomes M(0)M(0) and M(1)M(1). Within mediation analysis there is a second potential outcome Y(a,m)Y(a,m) that arises if we consider setting both AA and MM to particular values. This potential outcome also allows us consider questions such as: what value would the outcome take if an individual is treated A=1A=1 but the treatment-mediator pathway is "broken" M=M(0)M=M(0) denoted as Y(1,M(0))Y(1,M(0)). Other variables will be denoted XX, ZZ, ... as required. I generally assume some level of familiarity with causal inference, a good introduction is Hernan and Robin's book.

Some other examples of the type of questions for which causal mediation analysis is useful:

 Three examples of mediation.
Three examples of mediation.

Quantities of interest

There are several quantities (statistical/causal estimands) of interest in a mediation analysis. The naming conventions are different depending on the literature I've read and here I stick with Pearl (2014).

Total effect

The total effect is the change in the outcome if we flip the treatment switch and aren't concerned with the mechanism of action. It is the usual average treatment effect figure we might expect to see in the headline results of an RCT. We could collapse the graph below into AYA \rightarrow Y.

TE=E(Y(1)Y(0))\text{TE} = E(Y(1) - Y(0))
 Total effect
Total effect

Direct effect

The natural direct effect is the effect of flipping the treatment switch if we imagine that the indirect pathway is no longer operational.

DE=E(Y(1,M(0))Y(0,M(0)))\text{DE} = E(Y(1,M(0)) - Y(0,M(0)))
 Direct effect
Direct effect

Indirect effect

The natural direct effect (or average mediated effect) is the effect of flipping the treatment switch if we imagine that the direct pathway is no longer operational.

IDE=E(Y(0,M(1))Y(0,M(0)))\text{IDE} = E(Y(0,M(1)) - Y(0,M(0)))
 Indirect effect
Indirect effect

Each of these effects may be useful for different purposes. For example, the total effect may guide immediate decision making and policy - if a treatment works and is immediately needed the mechanism of action is less important. The size of the indirect effect is useful information for considering alternative (e.g. cheaper) treatments that target the mediator. The value IDE/TE\text{IDE}/\text{TE} is considered the percentage of the total effect explained by the mediator.

Mediation and exchangeability

One of the important considerations in any causal analysis is exchangeability/ignorability. Also referred to as unconfoundedness, we can think of exchangeability as meaning that individuals in either treatment arm are literally a-priori exchangeable or "swappable", with the conditionaly exchangeability meaning that individuals are swappable within strata of a covariates X. We want our analysis to be comparable to a RCT, you could have ended up in either treatment arm. What exchangeability achieves is a lack of dependence between the treatment assignment and the potential outcome under that treatment Y(a)AY(a) \perp A. Otherwise you end up with flawed analysis.

A more detailed example (if you want)

For example, assume that in truth exercise AA reduces risk of hospitalisations due to asthma YY, and we wish in practise to investigate the link using survey of all asthmatics who have attended a clinic. However, only mild asthmatics do any exercise training (A=1A=1) and already have fairly low risk of hospitalisations. So a naive analysis might find that exercise increases risk of hospitalisations. We have set up a scenario where our naive treatment estimator cannot equal the true treatment effect E(YA=1)E(YA=0)E(Y(1)Y(0))E(Y|A=1) - E(Y|A=0) \ne E(Y(1) - Y(0)). Clearly severity of asthma XX would be an important adjustment and we might be happy to consider treatment assigment random within levels of an asthma severity measure XX leading to conditional exchangeability Y(a)AX=xY(a) \perp A | X = x.

Confounding in RCTs

Now that we've recapped exchangeability in general lets consider it for mediation. In particular I'm going to talk about RCTs and so will assume the initial treatment AA is fully randomised and unconfounded. An issue here is rather simply that we've randomised only the treatment and not the mediator, and so any mediation analysis can still be confounded. For example, we could randomise exercise training to assess if that reduces asthma hospitalisations, with the potential mechanism of interest being a reduction in inflammation. However, maybe our study is in a district with poor industrial pollution controls. Some individuals in our study happen to live near a factory that is unbeknownst to them leaking a pollutant that raises lung inflammtion and increasing their our risk of asthma hospitalisation. As a result we have a partially confounded analysis, there is an unrecorded factor - proximity to the factory - that we won't account for in the analysis. What will happen then is that estimates of the indirect and direct effects will be biased away from the true effect.

 Mediation with confounding
Mediation with confounding

Simulations

Let's investigate this issue around confounding and mediation analysis using some simple linear forms for our data generation process and models. Feel free to skim the maths, all that matters is that the effects of interest turn out to be coefficients we can easily extract from a linear model.

(0) An unconfounded mediation model

First lets clarify what we are attempting to estimate.

 Linear mediation model (structural equation model): scenario 0
Linear mediation model (structural equation model): scenario 0

Assuming no confounding in our generative model we can estimate the total, direct and indirect effects using estimators of the following quantities, for the total effect we have

TE=E(Y(1)Y(0))=E(YA=1)E(YA=0)=ME(YM=m,A=1)p(MA=1)ME(YM=m,A=0)p(MA=0)=γE(MA=1)+βγE(MA=0)=γα+β\begin{aligned} \text{TE} &= E(Y(1) - Y(0)) \\ &= E(Y|A=1) - E(Y|A=0) \\ &= \int_{\mathcal{M}} E(Y|M=m,A=1)p(M|A=1) - \int_{\mathcal{M}} E(Y|M=m,A=0)p(M|A=0) \\ &= \gamma*E(M|A=1) + \beta - \gamma*E(M|A=0) \\ &= \gamma*\alpha + \beta \end{aligned}

And the direct effect we have

DE=E(Y(1,M(0))Y(0,M(0)))=E(Y(1,M(0)))E(Y(0,M(0)))=ME(YM=m,A=1)p(MA=0)dmME(YM=m,A=0)p(MA=0)dm=M(γm+β)p(MA=0)dmM(γm)p(MA=0)dm=β\begin{aligned} \text{DE} &= E(Y(1,M(0)) - Y(0,M(0))) \\ &= E(Y(1,M(0))) - E(Y(0,M(0))) \\ &= \int_{\mathcal{M}} E(Y|M=m,A=1)p(M|A=0) dm - \int_{\mathcal{M}} E(Y|M=m,A=0)p(M|A=0) dm \\ &= \int_{\mathcal{M}} (\gamma*m + \beta) p(M|A=0) dm - \int_{\mathcal{M}} (\gamma*m) p(M|A=0) dm \\ &= \beta \end{aligned}

And the indirect effect we have

IDE=E(Y(0,M(1))Y(0,M(0)))=E(Y(0,M(1)))E(Y(0,M(0)))=ME(YM=m,A=0)p(MA=1)dmME(YM=m,A=0)p(MA=0)dm=M(γm)p(MA=1)dmM(γm)p(MA=0)dm=γE(MA=1)=γα\begin{aligned} \text{IDE} &= E(Y(0,M(1)) - Y(0,M(0))) \\ &= E(Y(0,M(1))) - E(Y(0,M(0))) \\ &= \int_{\mathcal{M}} E(Y|M=m,A=0)p(M|A=1) dm - \int_{\mathcal{M}} E(Y|M=m,A=0)p(M|A=0) dm \\ &= \int_{\mathcal{M}} (\gamma*m) p(M|A=1) dm - \int_{\mathcal{M}} (\gamma*m) p(M|A=0) dm \\ &= \gamma * E(M|A=1) \\ &= \gamma * \alpha \end{aligned}

We'll estimate these coefficients using R's lm function. Obviously in reality we'd need to worry about whether a linear model is the appropriate functional form for our analyses, see Pearl (2014) for details more general versions of these formulas. From the graph below we see that in the unconfounded case we have unbiased estimates of our parameters - as expected!

mediation_scen0 <- function(N,alpha,beta,gamma) {
  A <- rbinom(N,1,0.5)
  M <- alpha*A + rnorm(N)
  Y <- beta*A + gamma*M + rnorm(N)
  
  alpha_ <- mean(M[A==1]) - mean(M[A==0])
  mod <- lm(Y ~ A + M)
  beta_ <- as.numeric(coef(mod)["A"])
  gamma_ <- as.numeric(coef(mod)["M"])
  tau_ <-  mean(Y[A==1]) - mean(Y[A==0])
  
  c("total" = beta_ + alpha_*gamma_,
    "direct" = beta_,
    "indirect" = alpha_*gamma_)
}

(1) A confounded mediation model

We now make M and Y to be shared caused of another variable UU. This results in biased estimates of the direct and indirect effect as seen in the graph below.

 Linear mediation model with confounding (structural equation model): scenario 1
Linear mediation model with confounding (structural equation model): scenario 1

In this case the our estimate of the indirect effect is biased, too high, i.e. generally α^γ^>αγ\hat{\alpha}\hat{\gamma} > \alpha\gamma while the direct effect is too low. If UU is >0 (<0) then both M and Y are more likely to take a higher (lower) value which gets absorbed into the α^γ^\hat{\alpha}\hat{\gamma} estimate.

mediation_scen1 <- function(N,alpha,beta,gamma) {
  A <- rbinom(N,1,0.5)
  U <- rnorm(N)
  M <- alpha*A + U + rnorm(N)
  Y <- beta*A + gamma*M + U + rnorm(N)
  
  alpha_ <- mean(M[A==1]) - mean(M[A==0])
  mod <- lm(Y ~ A + M)
  beta_ <- as.numeric(coef(mod)["A"])
  gamma_ <- as.numeric(coef(mod)["M"])
  tau_ <-  mean(Y[A==1]) - mean(Y[A==0])
  
  c("total" = beta_ + alpha_*gamma_,
    "direct" = beta_,
    "indirect" = alpha_*gamma_)
}

(2) Measured confounding

If we knew there was confounding of MM and YY by UU and we measured UU we could estimate the direct and indirect effects while controlling for UU. This would fix our biases - see the graph!

mediation_scen2 <- function(N,alpha,beta,gamma) {
  A <- rbinom(N,1,0.5)
  U <- rnorm(N)
  M <- alpha*A + U + rnorm(N)
  Y <- beta*A + gamma*M + U + rnorm(N)
  
  alpha_ <- mean(M[A==1]) - mean(M[A==0])
  mod <- lm(Y ~ A + M + U)
  beta_ <- as.numeric(coef(mod)["A"])
  gamma_ <- as.numeric(coef(mod)["M"])
  tau_ <-  mean(Y[A==1]) - mean(Y[A==0])
  
  c("total" = beta_ + alpha_*gamma_,
    "direct" = beta_,
    "indirect" = alpha_*gamma_)
}

Conclusion

This post was a quick introduction to mediation analysis and one of the potential issues that can crop up - confounding of the mediator and outcome. Measure your confounders, achieve anything. Thanks for reading.

References

  • Pearl, J. (2014). Interpretation and identification of causal mediation. Psychological methods, 19(4), 459: https://ftp.cs.ucla.edu/pub/stat_ser/r389.pdf