logo

johzu

About

Linear correlation


Instructions

Write the points in the table with their \(x\) and \(y\) coordinates. Click the boxes to display the elements of the linear regression.


Explanation

The linear equation is

$$y = mx + b$$

where \(x\) is the independent variable, \(y\) is the dependent variable, \(m\) is the slope, and \(b\) is the \(y\)-intercept. In a linear regression, the slope can be calculated as:

$$m = \frac{\sum_{i=1}^{n} \left( x_{i} - \bar{x} \right) \left( y_{i} - \bar{y} \right)}{\sum_{i=1}^{n} \left( x_{i} - \bar{x} \right)^{2} }$$

The expressions on the right are equivalent to those on the left and are often easier to calculate:

$$\sum_{i=1}^{n} \left( x_{i} - \bar{x} \right)^{2} = \sum_{i=1}^{n} x_{i}^{2} - n\bar{x}^{2}$$

$$\sum_{i=1}^{n} \left( y_{i} - \bar{y} \right)^{2} = \sum_{i=1}^{n} y_{i}^{2} - n\bar{y}^{2}$$

$$\sum_{i=1}^{n} \left( x_{i} - \bar{x} \right) \left( y_{i} - \bar{y} \right) = \sum_{i=1}^{n} x_{i}y_{i} - n\bar{x}\bar{y}$$

Then, we make the substitution

$$\begin{split} m &= \frac{\sum_{i=1}^{n} x_{i}y_{i} - n\bar{x}\bar{y}}{\sum_{i=1}^{n} x_{i}^{2} - n\bar{x}^{2}} \\ &= \frac{\sum_{i=1}^{n} x_{i}y_{i} - \frac{1}{n}\sum_{i=1}^{n}x_{i}\sum_{i=1}^{n}y_{i}}{\sum_{i=1}^{n} x_{i}^{2} - \frac{1}{n} \left(\sum_{i=1}^{n}x_{i}\right)^{2}} \cdot \frac{n}{n} \\ &= \frac{ n \sum_{i=1}^{n} x_{i}y_{i} - \sum_{i=1}^{n}x_{i}\sum_{i=1}^{n}y_{i}}{ n \sum_{i=1}^{n} x_{i}^{2} - \left(\sum_{i=1}^{n}x_{i}\right)^{2}} \end{split}$$

$$\boxed{ \therefore m = \frac{ n \sum_{i=1}^{n} x_{i}y_{i} - \sum_{i=1}^{n}x_{i}\sum_{i=1}^{n}y_{i}}{ n \sum_{i=1}^{n} x_{i}^{2} - \left(\sum_{i=1}^{n}x_{i}\right)^{2}} }$$

where the \(y\)-intercept \(b\) is calculated as

$$\begin{split}b &= \bar{y} - m \bar{x} \\ &= \bar{y} - \left( \frac{\sum_{i=1}^{n} x_{i}y_{i} - n\bar{x}\bar{y}}{\sum_{i=1}^{n} x_{i}^{2} - n\bar{x}^{2}} \right) \bar{x} \\ &= \bar{y} - \frac{\bar{x}\sum_{i=1}^{n} x_{i}y_{i} - n\bar{x}^{2}\bar{y}}{\sum_{i=1}^{n} x_{i}^{2} - n\bar{x}^{2}} \\ &= \frac{\bar{y}\sum_{i=1}^{n} x_{i}^{2} - n\bar{x}^{2}\bar{y} + n\bar{x}^{2}\bar{y} - \bar{x} \sum_{i=1}^{n}x_{i}y_{i}}{\sum_{i=1}^{n} x_{i}^{2} - n\bar{x}^{2}} \\ &= \frac{ \frac{1}{n}\sum_{i=1}^{n} y_{i}\sum_{i=1}^{n} x_{i}^{2} - \frac{1}{n}\sum_{i=1}^{n}x_{i} \sum_{i=1}^{n}x_{i}y_{i}}{\sum_{i=1}^{n} x_{i}^{2} - n\bar{x}^{2}} \cdot \frac{n}{n} \\ &= \frac{\sum_{i=1}^{n} y_{i} \sum_{i=1}^{n} x^{2}_{i} - \sum_{i=1}^{n} x_{i} \sum_{i=1}^{n} x_{i}y_{i}}{n \sum_{i=1}^{n} x^{2}_{i} - \left( \sum_{i=1}^{n} x_{i} \right)^{2}} \end{split}$$

$$\boxed {\therefore b = \frac{\sum_{i=1}^{n} y_{i} \sum_{i=1}^{n} x^{2}_{i} - \sum_{i=1}^{n} x_{i} \sum_{i=1}^{n} x_{i}y_{i}}{n \sum_{i=1}^{n} x^{2}_{i} - \left( \sum_{i=1}^{n} x_{i} \right)^{2}} }$$

The Pearson correlation coefficient is calculated by dividing the covariance by the square root of the product of the variances of both variables.

$$\begin {split}r &= \frac{\operatorname{cov}(X,Y)}{\sigma_x \sigma_y} \\ &= \frac{\sum_{i=1}^{n} \left[ \left( x_{i} - \bar{x} \right) \left( y_{i} - \bar{y} \right) \right]}{\sqrt{\sum_{i=1}^{n} \left( x_{i} - \bar{x} \right)^{2}\sum_{i=1}^{n} \left( y_{i} - \bar{y} \right)^{2}}}\end{split}$$

Any negative value will result from the product of factors with different signs in the covariance, leading to a negative correlation.