SPSS code for calculating the confidence interval of a Pearson correlation

For some reason, SPSS does not offer an option to calculate the confidence interval of an observed value of a Pearson correlation. SAS does it, and so does Stata. But SPSS doesn’t do it. Of course, SPSS will calculate the correlation itself. However, it will not calculate the confidence interval of the correlation.

The SPSS syntax below calculates the confidence interval. I’ve drawn on some code from the SPSS website; I’ve made the code easier to use and the results easier to interpret. I have verified the calculations against what I get using Stata, and the syntax calculates the confidence intervals correctly.

This code is not a convenient workaround. The user must enter — by hand! — the correlations and sample sizes into the code. Ridiculous! SPSS should be embarrassed for not providing a way to calculate r’s CI directly from a dataset.

Before the SPSS code, a comment: If you see an asymmetric confidence interval, there’s nothing wrong. Unlike a confidence interval for a univariate mean (where the confidence interval is in the “center” of the confidence interval), the confidence interval for a bivariate correlatoin is not necessarily the same above and below the observed correlation.

* This syntax is based on syntax from the SPSS website: * http://www-01.ibm.com/support/docview.wss?uid=swg21478368.


data list free / r n conflev .
* Below, after "begin data," enter descriptions of the observed correlations.

* Each row describes one correlation. A confidence interval will be reported

*      for each row you enter.

* The first number in the row should be the correlation level you have observed.

* The second number in the row should be the sample size.

* The third number in the row should be the confdence level (typically .95).

* In this example, four correlations (r = .75, r = .50, r = .25, and r = 0)

*      were observed with n = 500.
begin data .

.75  500  .95

.50  500  .95

.25  500  .95

0    500  .95

end data.
compute fz = .5*ln((1+r)/(1-r)).

compute sez = 1/sqrt(n-3).

compute critz = abs(idf.normal((1 - conflev)/2,0,1)).

compute lo_fz = fz - critz*sez .

compute hi_fz = fz + critz*sez .

compute lo_r = (exp(2*lo_fz) - 1)/(exp(2*lo_fz) + 1).

compute hi_r = (exp(2*hi_fz) - 1)/(exp(2*hi_fz) + 1).

formats r conflev to hi_r (f10.4) / n (f8).
list r n conflev lo_r hi_r .

* In the listing above, * "r" and "n" and "conflev" repeat the information that was * provided by the user in the syntax code. * "lo_r" indicates the lower bound of the calculated confidence interval, and * "hi_r" indicates the upper bound of the calculated confidence interval.

– Eric DeRosia