Stata code for performing the Preacher and Hayes bootstrapped test of mediation

The well-known Preacher and Hayes macros for performing a bootstrapped test of mediation are designed to work only with SPSS or SAS. Those statistics packages are fine, but I prefer Stata. In this post, I provide Stata code for performing the Preacher and Hayes test.

Among journal reviewers in my field, there was a time that the most accepted test for mediation was Baron and Kenny (1986). That method is now passé (Zhao et al. 2010). The currently acceptable method is to directly test the total effect, direct effect, and indirect effect (i.e., mediated effect) of an independent variable. Regression-based tests of a, b, the direct effect, and the total effect are acceptable. However, the statistical tests of the indirect effect described in Baron and Kenny (i.e., Sobel and its variants) are now known to be inappropriate. A commonly cited solution is the bootstrapping method of Preacher and Hayes (2004); they provide SPSS and SAS macros that perform a bootstrapping test of the indirect effect. As described in Zhao et al. (2010), the Preacher and Hayes test is the most acceptable approach nowadays.

However, the Preacher and Hayes macros will only work with SPSS and SAS. How can an equivalent test be performed with Stata?

A partial answer to the problem can be found in the sgmediation program, written by Philip Ender at UCLA. In its basic form, sgmediation reports only the normal-based tests (e.g., Sobel), but Stata can be readily instructed to bootstrap the estimates. However, as described here, a wide variety of Stata’s options and approaches to bootstrapping can be applied to sgmediation. What options and approaches should be taken to replicate the Preacher and Hayes test?

In essence, to replicate Preacher and Hayes, Stata should be programmed to do bootstrapping with case resampling and a percentile estimate of the confidence interval. After installing the sgmediation program from within Stata (using a findit sgmediation command from within Stata as described here), the following Stata code will perform a Preacher and Hayes test:

sgmediation DEP_VAR, iv(INDEP_VAR) mv(MED_VAR)
bootstrap r(ind_eff), reps(5000): sgmediation DEP_VAR, iv(INDEP_VAR) mv(MED_VAR)
estat bootstrap, percentile

The words in ALL CAPS are variable names. The first statement performs the regression-based tests of a, b, direct effect, and total effect. The second statement performs a bootstrap analysis, and the third statement reports the bootstrapping results. See here for an annotated example of the output.

Given the variety of ways that bootstrapping can be implemented in Stata, I wanted to be sure the approach I’ve described here yields the same end result as the Preacher and Hayes test. To verify the code, I created a synthetic dataset (n = 100) with three variables: var1, var2, and var3. I drew the sample from a synthetic population with known relationships: the total effect of var1 on var3 was positive in the population, the direct effect of var1 on var3 was zero, and the indirect effect of var1 on var3 was positive. Therefore, the population effect of var1 on var3 is indirect only (i.e., fully mediated through var2).

This was not a full-blown Monte Carlo study; only one sample from the population was necessary to verify that the Stata code yields the same results as the Preacher and Hayes test. However, comparing bootstrapping results can be tricky because differences in estimates could be observed simply due to sampling error. Indeed, if one were to run the Preacher and Hayes SPSS macro more than once, one would probably observe slightly different results each time. To reduce the effect of this problem, I specified 500,000 sampling replications rather than the often-used 5,000. I ran the bootstrapping analysis with the Preacher and Hayes SPSS macro and again with the Stata code. Because the bootstrapping sample was so large, each of them was a “run it overnight”-sort of analysis.

The results showed that the Stata code above yielded estimates that were practically the same as the Preacher and Hayes SPSS code. The Stata code yielded an estimate of 0.0257 for the lower limit of the 95% confidence interval, and the SPSS macro estimated it to be 0.0259. The Stata code yielded an estimate of 0.2469 for the upper limit, and the SPSS macro estimated it to be 0.2467. The slight difference can be attributed to the effects of sampling error inherent in bootstrapping.

To make sure this success could be replicated, I repeated this process with another synthetic dataset, this one drawn from a population of three orthogonal variables. The estimates (lower limit 95% confidence interval = -0.0469, upper limit = 0.0254) were the same (to four digits) from both Stata and SPSS.

The similarity of these results indicates that the Stata code shown above does, indeed, yield the same bootstrapping analysis as the Preacher and Hayes SPSS macro.

– Eric DeRosia

References

Baron, Reuben M., and David A. Kenny (1986), “Moderator-Mediator Variables Distinction in Social Psychological Research: Conceptual, Strategic, and Statistical Considerations,” Journal of Personality and Social Psychology, 15 (6), 1173-1182.

Preacher, Kristopher J., and Andrew F. Hayes (2004), “SPSS and SAS Procedures for Estimating Indirect Effects in Simple Mediation Models,” Behavior Research Methods, Instruments, & Computers, 36 (4), 717-731.

Zhao, Xinshu, John G. Lynch Jr., and Qimei Chen (2010), “Reconsidering Baron and Kenny: Myths and Truths about Mediation Analysis,” Journal of Consumer Research, 37 (Aug), 197-206.