This is the 2007 version. Click here for the 2017 chapter 01 table of contents.

Correlation and Prediction

The evidence produced by observational research is called correlational data. Correlations are patterns in the data. For example, Stern found that mothers who synchronized their movements with a baby's movements had better rapport with the babies, defined as less crying and more laughing and smiling by the baby. Mothers with serious mental problems such as depression often did not show the mother-baby dance. They were literally "out of synch" with their own children. Their children were more likely to be irritable and to suffer from developmental problems.

What is a correlation?

The technical term for such a coincidence is a correlation. "Co-relation" means essentially the same thing as "co-incidence" or things occurring together. Correlations, observed patterns in the data, are the only type of data produced by observational research. Correlations make it possible to use the value of one variable to predict the value of another. For example, using Stern's finding, one might predict that mothers who fail to do the mother-baby dance will have babies who fuss and cry significantly more than other babies.

Why is correlational data so useful?

The predictive power of correlations can be very great, if the correlation is a strong one. Consider this figure, from data produced by a 1992 study at the University of Illinois. Researchers asked 56,000 students about their drinking habits and grades, to see how drinking might correlate with performance in school. The results seem to be clear. The more a student drank, the worse was the student's grade-point average. B students (those with a 3.0 GPA) averaged 5 drinks per week, while D students (with a 1.0 GPA) averaged 10 drinks per week. Using the correlation shown in this graph, you could predict that a person who drank a six pack of beer every day would be likely to flunk out of school.

Negative Correlation

This is a negative correlation, which means that one variable goes up as the other goes down. As the amount of alcohol consumed goes up on the graph, the corresponding GPA goes down. (A positive correlation is one in which variables go up or down together, producing an uphill slope.)

Why should the label on the X axis be changed?

To be more accurate, we should change the label on the X-axis of the graph to "Self-report of drinks consumed per week," because this was undoubtedly self-report data (nobody was observing the drinking habits of 56,000 people). We do not know if self-reports of drinking accurately reflect real levels of drinking. Perhaps people who get good grades do not want to admit they drink, so they lie about it. The data does not rule out such an explanation. All we really know is what people said about how much they drank.

Must you have an accurate cause-effect analysis to make predictions?

Any type of correlation can be used to make a prediction. However, a correlation does not tell us about the underlying cause of a relationship. We do not know from the Illinois data whether drinking was correlated with lower grades because (1) alcohol makes people stupid, or (2) the students who tend to drink tend to be poorer students to begin with, or (3) people who are hung-over from a drinking binge tend to skip class, or (4) students in academic trouble drink in order to drown their sorrows, or some other reason. There can be hundreds of possible explanations for a correlation: the number is limited only by your imagination and ingenuity in thinking up possible reasons for a relationship between two variables.

For purposes of making a prediction, the underlying reason for a correlation may not matter. As long as the correlation is stable -lasting into the future-one can use it to make predictions. One does not need an accurate cause-effect explanation to make a prediction. As long as a "pattern" (correlation) continues into the future, we can use it to make a prediction, whether or not we understand it. What a correlation does not tell you is why two things tend to go together. Maybe alcohol consumption is not the root cause of bad grades, in the Illinois study. Perhaps the students who drank never opened their books and never studied, and that is why they got bad grades. We do not know. The study did not seek to explore underlying causes of the correlation.

A similar study at Virginia Tech ound a negative correlation between drinking and grades, but it was somewhat smaller than the correlation found at the University of Illinois a few years earlier. The Virginia Tech group also gathered some self-report data indicating that students who engaged in drinking binges were sometimes too hung over the next day to attend class. Their data supported this with a correlation between drinking and absenteeism. So the Virginia Tech study began to investigate the factors underlying the correlation between drinking and low grades.

How can replication help to clarify factors underlying a correlation?

It is also possible that different factors are important at different schools, or in different countries. In France, drinking might not correlate with bad grades at all. A typical report of a correlation is based on one group of people, at one time, in one place. It does not necessarily reveal a universal truth. This is another reason replication is important. When an important finding is replicated at different places and times, or with different groups of people, we find out how robust or dependable is the correlation. We may also get hints about the factors that underlie a correlation.

For example, we might find that students in France like a glass of wine with dinner, so most of them categorize themselves as drinkers. We might also find (in this fictional example) that very few of them engage in binge drinking or suffer from hangovers. If their grades do not suffer, this might support the suggestion of Virginia Tech researchers that drunkenness or absenteeism, not drinking per se, is the cause of academic difficulties among students who drink. This type of detail is likely to be revealed by attempts to replicate interesting correlations in new situations or among new groups of people.

What "clear distinction" should students make?

One should make a clear distinction between the usefulness of correlations for predictions (which requires no theory) and the testing of speculations or theories about why the correlations exist (which may require many research studies). Correlations are very important, because they allow prediction. However, a correlation does not tell you about causality. In other words, it does not tell you about the underlying events that created the relationship.

Which is more important for making a prediction, validity or reliability?

Correlations are useful even if we have no theory to explain them. In the example of drinking and grades, we might not know why the correlation exists. That does not matter if all we want to do is make predictions. All we need is a reliable correlation. We do not even care if the self-reports of drinking are accurate or valid, as long as people are consistent in their self-reports. If the observed relationship between self-reported drinking and grades lasts into the future (if it is reliable) then we can make a prediction based on people's self-reports. "People who say they drink X amount will end up with a grade-point average of about Y."

Write to Dr. Dewey at

Don't see what you need? Psych Web has over 1,000 pages, so it may be elsewhere on the site. Do a site-specific Google search using the box below.

Custom Search

Copyright © 2007-2011 Russ Dewey