2025

Introduction

  • Correlation: Measures the strength and direction of the relationship between two variables.

  • Partial Correlation: Measures the strength and direction of the relationship between two variables while controlling for the effect of one or more additional variables.

Use-case examples

  • Medicine: Understanding the relationship between a treatment and outcome, controlling for age or other risk factors.

  • Economics: Analyzing the impact of one economic variable on another while controlling for other influencing factors.

  • Psychology: Studying the relationship between two psychological traits while controlling for external variables.

Mathematical definition

  • The partial correlation between X and Y given a controlling variables Z is denoted as:

\[ r_{XY \cdot Z} = \frac{r_{XY} - r_{XZ}r_{YZ}}{\sqrt{(1-r_{XZ}^2)(1-r_{YZ}^2)}} \]

Relationship to correlation

  • The partial correlation can be smaller, equal to, or even larger than the simple correlation, depending on the relationships between the variables involved.

  • If the third variable (or controlling variable) has a strong influence on both of the variables being correlated, the partial correlation can be significantly different from the simple correlation.

  • If the third variable is not strongly related to the two variables being correlated, the partial correlation may be similar to the simple correlation.

  • Depending on the sign and magnitude of the relationships between the controlling variable and the other two variables, the partial correlation can be either higher or lower than the simple correlation.

  • In some cases, controlling for a third variable might reveal a stronger direct relationship between the two variables, and therefore lead to a higher partial correlation.

\(r_{XY \cdot Z}\) smaller than \(r_{XY}\)

  • If \(X\) and \(Y\) have a strong positive correlation and \(Z\) is positively correlated with both \(X\) and \(Y\), the partial correlation between \(X\) and \(Y\) controlling for \(Z\) will typically be smaller than the simple correlation between \(X\) and \(Y\). This is because \(Z\) explains some of the variation in both \(X\) and \(Y\).

\(r_{XY \cdot Z}\) smaller than \(r_{XY}\)

\(r_{XY \cdot Z}\) equal to \(r_{XY}\)

  • If \(Z\) is uncorrelated with both \(X\) and \(Y\), then controlling for \(Z\) will not affect the correlation between \(X\) and \(Y\). In this case, the partial correlation will be equal to the simple correlation.

\(r_{XY \cdot Z}\) equal to \(r_{XY}\)

\(r_{XY \cdot Z}\) larger than \(r_{XY}\)

  • In most cases, if \(Z\) is related to both \(X\) and \(Y\), controlling for it will reduce the observed correlation between them. But in some situations, especially when \(Z\) adds unrelated noise to \(X\) but is not related to \(Y\), controlling for \(Z\) can actually reveal a stronger connection between \(X\) and \(Y\) than was visible at first.

\(r_{XY \cdot Z}\) larger than \(r_{XY}\)

Getting partial correlation in R

  • The ppcor package provides a convenient function pcor() to compute partial correlations.

Getting partial correlation in R

  • Suppose we have a data.table:
##                x           y            z
##            <num>       <num>        <num>
##   1: -0.71524219 -0.43117711 -1.175102496
##   2: -0.75268897 -1.54499591 -2.142541029
##   3: -0.93853870 -1.10401762  0.005506896
##   4: -1.05251328 -0.55509819 -0.052744433
##   5: -0.43715953  0.45211620 -1.501688203
##   6:  0.33117917 -1.48495696 -0.672036343
##   7: -2.01421050 -1.35685949 -2.581482816
##   8:  0.21198043  0.86239666 -1.533562526
##   9:  1.23667505  0.07952836  0.808221836
##  10:  2.03757402  1.24607893  1.562614765
##  11:  1.30117599  1.14281657  1.124627012
##  12:  0.75677476  0.64622240  0.917651122
##  13: -1.72673040 -0.21010752 -0.085953795
##  14: -0.60150671 -0.42346201 -0.306886857
##  15: -0.35204646 -0.58969974 -1.087308942
##  16:  0.70352390 -2.29138700 -1.528730801
##  17: -0.10567133 -0.14577669 -0.257526803
##  18: -1.25864863 -0.19903962 -0.418827136
##  19:  1.68443571  1.37761669  0.491345849
##  20:  0.91139129 -0.09958271  0.221595424
##  21:  0.23743027  1.89821805  2.035091419
##  22:  1.21810861  0.89547872  0.948513576
##  23: -1.33877429 -0.54307129 -1.639343454
##  24:  0.66082030  1.60267693  0.855803445
##  25: -0.52291238 -0.97992241  0.363231152
##  26:  0.68374552 -0.10846586  0.837683791
##  27: -0.06082195  2.36704150  2.389785574
##  28:  0.63296071  0.32760954  0.619382986
##  29:  1.33551762  2.30132723  2.228697518
##  30:  0.00729009 -1.43486160 -1.272242667
##  31:  1.01755864  0.31826252  1.273281246
##  32: -1.18843404 -0.21579311 -1.208447117
##  33: -0.72160444 -0.06076368 -1.811749562
##  34:  1.51921771 -0.24602740  0.764588119
##  35:  0.37738797  0.20795326  2.238521835
##  36: -2.05222282 -2.10353206 -1.276963102
##  37: -1.36403745  0.03068460  0.498576964
##  38: -0.20078102  0.98438458  0.750657506
##  39:  0.86577940 -1.79209799 -1.071716473
##  40: -0.10188326  1.18475183  0.339193434
##  41:  0.62418747 -0.92895076 -0.425629751
##  42:  0.95900538  0.93427196  0.477938888
##  43:  1.67105483  1.49543005  2.287409725
##  44:  0.05601673 -0.17188146 -1.255295866
##  45: -0.05198191 -0.67110491  0.504822724
##  46: -1.75323736 -0.71129766 -0.368115023
##  47:  0.09932759  0.48848250 -0.904717313
##  48: -0.57185006  0.59737779  0.652255864
##  49: -0.97400958 -2.53934178  0.673550973
##  50: -0.17990623 -1.72633238 -1.510334789
##  51:  1.01494317  1.93787393  2.321312792
##  52: -1.99274849  0.05025460 -1.753448789
##  53: -0.42727929  0.22164931  1.007896427
##  54:  0.11663728  0.77349705  0.694891886
##  55: -0.89320757  0.47057113  1.440597173
##  56:  0.33390294 -2.49397133 -2.539004921
##  57:  0.41142992  1.31599206  0.812413103
##  58: -0.03303616 -0.50150568 -0.794196094
##  59: -2.46589819 -1.00233227 -1.931380099
##  60:  2.57145815  0.99057127  1.151435966
##  61: -0.20529926  0.76931533 -0.551835547
##  62:  0.65119328 -0.02287581  0.892881111
##  63:  0.27376649  0.65538701 -0.623003963
##  64:  1.02467323  0.12165164  2.057193368
##  65:  0.81765945 -0.68395749 -1.119355605
##  66: -0.20979317  1.10511392  0.548739528
##  67:  0.37816777  0.92998390  1.187065122
##  68: -0.94540883  1.25155782  0.739809835
##  69:  0.85692301  0.49361544  0.373522561
##  70: -0.46103834  0.89448358  0.296224614
##  71:  2.41677335  3.18380573  3.761553694
##  72: -1.65104890 -1.10700656 -2.835493646
##  73: -0.46398724 -1.55494473 -1.791205700
##  74:  0.82537986  0.17333836  0.819761428
##  75:  0.51013255  0.04102503 -0.169203188
##  76: -0.58948104 -0.14306001  1.003733467
##  77: -0.99678074  1.21391461  0.781820795
##  78:  0.14447570 -0.25390604  0.017451585
##  79: -0.01430741  0.36585095 -1.331985550
##  80: -1.79028124 -1.12282468 -1.430452735
##  81:  0.03455107  0.03772624 -0.280277214
##  82:  0.19023032  0.40917282  0.197355055
##  83:  0.17472640  1.41557789 -0.386407081
##  84: -1.05501704 -0.40619014 -0.231945549
##  85:  0.47613328  0.95090896 -0.325435322
##  86:  1.37857014  1.46814510  1.197135636
##  87:  0.45623640  1.14289147  1.180989767
##  88: -1.13558847 -1.14218879 -1.922404418
##  89: -0.43564547  1.40905848  1.069697911
##  90:  0.34610362 -0.20790493 -1.247411057
##  91: -0.64704563 -0.42930698 -3.347950986
##  92: -2.15764634  0.32522710 -0.451241627
##  93:  0.88425082  1.73620932  2.150769895
##  94: -0.82947761 -1.50473068 -1.452949569
##  95: -0.57356027 -1.15985114 -0.362579449
##  96:  1.50390061 -0.60612875 -0.707030598
##  97: -0.77414493 -0.20522527 -0.616833708
##  98:  0.84573154  0.58770664 -1.224799287
##  99: -1.26068288 -0.26622675  0.417726076
## 100: -0.35454240  0.37488651  1.870082916
##                x           y            z

Getting partial correlation in R

  • We can compute the partial correlation between x and y controlling for z:
pcor(d)

Getting partial correlation in R

## $estimate
##           x         y         z
## x 1.0000000 0.1550061 0.3128204
## y 0.1550061 1.0000000 0.6004240
## z 0.3128204 0.6004240 1.0000000
## 
## $p.value
##             x            y            z
## x 0.000000000 1.255277e-01 1.618733e-03
## y 0.125527685 0.000000e+00 5.063869e-11
## z 0.001618733 5.063869e-11 0.000000e+00
## 
## $statistic
##          x        y        z
## x 0.000000 1.545310 3.243719
## y 1.545310 0.000000 7.394803
## z 3.243719 7.394803 0.000000
## 
## $n
## [1] 100
## 
## $gp
## [1] 1
## 
## $method
## [1] "pearson"

Getting partial correlation in R

  • estimate: A matrix of the partial correlation coefficient between two variables

  • p.value: A matrix of the p value of the test

  • statistic: A matrix of the value of the test statistic

  • n: The number of samples

  • gn: The number of given variables

  • method: The correlation method used

Partial correlation from a regression perspective

  • Partial correlation between \(X\) and \(Y\) controlling for \(Z\) can be calculated by fitting two simple linear regressions:

\[ X = \beta_{0} + \beta_{1}Z + \epsilon \]

\[ Y = \beta_{0} + \beta_{1}Z + \epsilon \]

  • The partial correlation between \(X\) and \(Y\) controlling for \(Z\) is the correlation between the residuals of these two regressions.

Partial correlation from a regression perspective

fm_xz <- lm(x ~ z, data=d)
fm_yz <- lm(y ~ z, data=d)

resid_xz <- resid(fm_xz)
resid_yz <- resid(fm_yz)

cor(resid_xz, resid_yz)
## [1] 0.1550061
pcor(d)
## $estimate
##           x         y         z
## x 1.0000000 0.1550061 0.3128204
## y 0.1550061 1.0000000 0.6004240
## z 0.3128204 0.6004240 1.0000000
## 
## $p.value
##             x            y            z
## x 0.000000000 1.255277e-01 1.618733e-03
## y 0.125527685 0.000000e+00 5.063869e-11
## z 0.001618733 5.063869e-11 0.000000e+00
## 
## $statistic
##          x        y        z
## x 0.000000 1.545310 3.243719
## y 1.545310 0.000000 7.394803
## z 3.243719 7.394803 0.000000
## 
## $n
## [1] 100
## 
## $gp
## [1] 1
## 
## $method
## [1] "pearson"

Pearson correlation and simple linear regression

  • The Pearson correlation between \(X\) and \(Y\) tells you how strongly the two variables are linearly related. If we fit a simple linear regression:

\[ Y = \beta_0 + \beta_1 X + \epsilon \]

  • Then the standardized regression coefficient (\(\beta_1\)) is equal to the Pearson correlation \(r_{XY}\) — if both \(X\) and \(Y\) are standardized.

  • Pearson \(r\): A symmetric measure of association between \(X\) and \(Y\).

  • Regression \(\beta_1\): A directional prediction from \(X\) to \(Y\).

Partial correlation and multiple regression

  • The partial correlation between \(X\) and \(Y\), controlling for \(Z\), tells you how strongly \(X\) and \(Y\) are related after removing the effects of \(Z\) from both. In multiple regression:

\[ Y = \beta_0 + \beta_1 X + \beta_2 Z + \epsilon \]

  • The standardized regression coefficient \(\beta_1\) reflects the unique contribution of \(X\) to predicting \(Y\), after accounting for \(Z\).

  • Partial correlation \(r_{XY \cdot Z}\) tells you how \(X\) and \(Y\) relate when \(Z\) is held constant for both. It is a symmetric measure.

  • Multiple regression \(\beta_1\): A directional prediction from \(X\) to \(Y\), after accounting for \(Z\).

Regression coefficients in terms of correlation

  • Simple linear regression with standardized variables:

\[ Y = \beta_0 + \beta_1 X + \epsilon \]

\[ \beta_1 = r_{XY} \]

  • Multiple linear regression with standardized variables:

\[ Y = \beta_0 + \beta_1 X + \beta_2 Z + \epsilon \]

\[ \beta_1 = r_{XY \cdot Z} \cdot \sqrt{ \frac{1 - r_{YZ \cdot X}^2 }{1 - r_{XZ \cdot Y}^2} } \]