Why does a correlation of 0.20 only explain 4% of the observed variance?

The correlation coefficient (r) measures the strength and direction of a linear relationship between two variables. However, to determine the proportion of the variance in the dependent variable that is predictable from the independent variable(s), we use the coefficient of determination (r-squared).

The r-squared value is found by squaring the correlation coefficient. So if we have a correlation of 0.20, we square this value (0.20 * 0.20 = 0.04 or 4%) to get our r-squared value. This means that only 4% of the total variation in y can be explained by the linear relationship between x and y.

It is important to note that a low r-squared value does not necessarily mean there is no relationship between the variables or that a model is not useful. It simply means that most of the variation in y remains unexplained by x using a linear model.

Also, while r-squared gives us an indication of how well our model fits our data, it should not be used alone for assessing model fit or comparing models as it does not take into account overfitting or complexity of models.