I originally wrote this article in 2014 to discuss whether correlation implies causation. Not that I expect everyone to read and remember this one article, but it’s frustrating when I see a conversation where people who deny science and accept science both misuse correlation and how it relates to causation. So, I decided to update this article and republish it as a reminder that the relationship of correlation to causation isn’t as easy as a simple trope or meme.
One of the foundations of biomedical science is whether correlation implies causation. Anti-vaccine activists often conflate or misunderstand the two, rejecting or accepting correlation as it fits its narrative. The “correlation implies causation” story is often abused, misused and confused by many writers.
One thing we do know about correlation is that if you can’t establish correlation, despite numerous attempts, it is nearly impossible to claim causation. Also, if you do observe correlation, it also doesn’t imply causation.
But there are methods, grounded in powerful science, to establish causation from observations of correlation. So sometimes correlation does not imply causation. But sometimes correlation implies causation.
This article will help show how we may be able to establish causation from observations of correlation. And, like all science, this is hard stuff.
A little background
Like I mentioned above, before we can discuss causation, we first need to establish correlation. For example, we have looked in hundreds of different ways to see if there is a correlation between vaccines and autism – there is no correlation.
So no matter what the vaccine deniers claim, without establishing correlation, they cannot establish causation. They can claim thimerosal is linked to autism. They can claim aluminum is linked to autism. They can claim that too many injections are linked to autism. They can claim anything they want, but we cannot see any association between vaccines and autism.
The pro-science/pro-vaccine world should dismiss a claim that correlation implies causation unless a list of supporting points can be checked off. In that case, we can start to accept that correlation, in fact, does imply causation.
Conflating causation and correlation is somewhat different than the logical fallacy of post hoc ergo propter hoc, where an observer believes a subsequent event follows the first event simply because of the existence of the first event.
I’m sure all good luck charms and superstitions, like walking under a ladder, are related to the post hoc fallacy. So if I walk under a ladder, then trip on a black cat, then crash into a mirror, I don’t immediately blame the initial act of walking under the ladder. I just assume I’m clumsy.
In science, we may be able to show a correlation statistically. For example, there may be an increase in broken bones in children after vaccination. But does that mean the vaccine caused the broken bone? If we show that the rate of broken bones is the same with or without vaccines, there’s no causation. If we cannot show a plausible physiological reason why vaccines would have some influence over bone strength, we reject causation. In other words, showing correlation gives us only half the story. Real science is necessary to show us causality.
Parents may observe autism after a vaccination, then blame the vaccine for it. But the events are almost always coincidental. Humans aren’t very good at overlooking coincidence – the post hoc fallacy seems to be ingrained into our behavior, possibly as a result of evolution – there may have been positive natural selection to see patterns in nature may have been useful in avoiding danger.
However, correlation and causation are very critical parts of scientific research. Basically, correlation is the statistical relationship between two random sets of data. The closer the relationship, the higher the correlation. However, without further data, the correlation may not imply causation, that the one set of data has some influence over the other.
Example of correlation implies causation. Or not.
Let’s invent a massive study to investigate car accidents after vaccinations. In our imaginary study, we find that the rate of automobile accidents with a child in the back seat after a child is vaccinated is higher than the background rate of automobile accidents with children in the back seat who aren’t vaccinated.
Does the vaccination itself cause a higher rate of accidents? Well, I suppose you could make an argument that a post-vaccinated child is still screaming or something, distracting the driver, but that variable could happen with unvaccinated children just screaming because they didn’t get their GMO-free, organic, free-range ice cream cone.
But did the vaccine itself cause the accident? Or is it some other factor? Like the driver being stressed because of going to the pediatrician for the vaccine because she read all those lies from the anti-vaccine groupies? Or because her child is a bit fussy after vaccination because it happens? In other words, we have data that seems to show correlation, but it really has no meaning without establishing a reasonable level of causality.
So when you read an article in one of the anti-vaccine websites that X number of girls died because of the HPV vaccine, you might assume that correlation implies causation. Except real science says they are not causally related.
Or if you read that because the rate of autism has increased while the number of vaccines has increased, obviously, correlation implies causation. Except we have no evidence that vaccines are correlated with autism, let alone causally related.
The problem with these cases is that stating correlation implies causation oversimplifies the complex science of epidemiology, which essentially determines whether there are correlation and causation for certain factors and the risk of certain medical conditions. Like everything in science, there is more to the understanding of the relationship between correlation and causation than simply dismissing it.
Evaluating correlation implies causation
So how do we know if correlation does or does not imply causation? There are seven additional tests of observational data that could be used to move a finding of a correlation to one of actual causation.
To determine whether correlation implies causation, and many scientists rely upon the Bradford Hill criteria. English statistician Sir Austin Bradford Hill was interested in developing a set of objective criteria that could be used to provide epidemiological evidence of causality between a cause and effect. It serves as a sort of checklist for scientists who can take data that establishes correlation and then logically determine if that supports causality.
Let’s take a look at those seven tests for causation:
1. The data must be strong
If one observes correlation, the next step is to establish whether the data is strong enough to support a causal link. For example, if we’re examining an increase in the risk of something, the numbers have to be substantial.
In the early research in the links between smoking and lung cancer, it was found that the risk of cancer was 5-10X higher in smokers. If we’re looking at vaccinations, to begin to show causation, you’d need to show a substantial increase in the risk of some event compared to a control population. The larger the increase in risk, the better your data supports causation. In addition, the data must show rates of risk that far surpass the background (general population) risk.
2. The data must be consistent
If one shows data that could imply causation, it must be consistent across a number of studies with different populations (gender, ethnicity, income, age). Again, going back to the early research in smoking and lung cancer, the first two studies looking at the link were done separately in two different continents (and this being the 1940’s and 50’s, information sharing was limited at best) but showed nearly identical results.
3. The data must be specific
The data must predict causality, very precisely. Again, back to lung cancer and smoking, the data showed that smoking was linked to one type of cancer, in the lungs, at the precise location where smoking enters the body. One cannot show causality with general data, simply because the data is too imprecise.
If we’re going to look at vaccines, personal anecdotes, dumpster diving in VAERS, or bad research are not a replacement for large, high-quality case-control studies that show a statistically significant correlation between vaccines and some claimed adverse event.
4. The data must be temporal
To show causality, the medical event must (whether adverse or beneficial) follow the proposed cause within a relatively short period of time. As the length of time grows between the cause and the event, confounding variables become more and more difficult to separate from the original cause. Moreover, the background rate of an adverse event may be attributed to a vaccine or some other medical procedure, even though there is no relationship.
5. The data must possess a dose-response effect
That is, one can show that as you consume or receive more of X, the specific Y response increases. Back to smoking –the more cigarettes smoked, the higher the risk of lung cancer. The temporal effect mentioned above, and the dose-response effect are often interrelated. So, if we were to examine a particular specific adverse event to vaccines, then the rate of that specific risk must increase in a linear fashion with additional numbers of vaccines.
6. The implication of causality must be biologically plausible
If we look at smoking again, the early researchers could show a plausible mechanistic link between an inhaled carcinogen and malignant changes in the lung cells. This is an important facet of determining causality in biological systems. When one argues that GMO foods may cause cancer, how plausible is that? Is there some plausible mechanism between the GMO food and one of 250 different cancers? To be factual, there are precious few environmental factors that cause cancers, and those that are known are not implausible.
When someone says that vaccines cause autism, is it even biologically plausible? Is there some biological mechanism between the vaccine, its ingredients, or the vaccination itself that can be plausibly linked to an adverse event, like autism? It is clear that this is where the anti-vaccine religion fails to make its case.
They have attempted, over and over, to show a biologically plausible connection between vaccines and autism, but they fail to do so without making the observer stand on their head and squint to see biological plausibility. Occam’s Razor has a purpose in science, which is to tell us that the best science requires the simplest explanation – not inventing a long, complex, convoluted process to get from a vaccine to autism.
Plausibility doesn’t mean we take the easy way and just say, “well, just because we don’t know of a mechanism doesn’t mean there isn’t one.” Actually, we shouldn’t say that. We know a lot about human physiology. It’s not a giant mystery wrapped around an enigma. Human physiology is complex and detailed, but it is possible to determine what may or may not be plausible.
7. The data must be coherent.
Other types of evidence, like experimental ones in other models, ought to support the causality. Going back to smoking, the tar from cigarettes was painted on the back of mice, which induced tumors. Moreover, there was other evidence being found in the 1950s and 1960s that smoking was associated with increases in cancers of the lip, throat, tongue, and esophagus.
So correlation by itself does not imply causation. But when one gathers other evidence, that requires separate studies and analysis, the correlation becomes one of the fundamental pieces of evidence that establish causality.
Like I wrote previously, research isn’t easy. One just can’t state that they see an observed correlation, then immediately state that one causes the other. Assuming that correlation implies causation without supporting data is simply foolish. On the other hand, rejecting causation from observations of correlation is also wrong.
One can’t see an increase in the autism rate, alongside an increase in the number of vaccinations, then state, after looking at those numbers for an hour, that vaccines cause autism. Without further, more complex, research data, we cannot state that correlation implies causation.
In fact, one needs each (or at least most) of those seven additional data points listed above to show causality.
Again, those who try to oversimplify the process are the ones with the agenda. Those who try to make it easy are the ones who a trying to find data that proves their dogma and beliefs, rather than trying to determine what the data actually states. The data should drive the conclusion, as opposed to taking the easy course–searching for data to establish a preconceived conclusion.
Research is hard work. And if a researcher, or some random person on the internet, wants to establish causation from correlation, then they need to provide a lot more evidence than some numbers. It’s not easy, but it can be done.
Correlation and causation aren’t related just because they both start with the letter “C.” They are related only if you can show that relationship.
Editor’s note: This article was originally published in May 2014. It has been substantially revised and updated to include more comprehensive information, to improve readability and to add current research.
- LEVIN ML, GOLDSTEIN H, GERHARDT PR. Cancer and tobacco smoking; a preliminary report. J Am Med Assoc. 1950 May 27;143(4):336-8. PubMed PMID: 15415261.
- Mukherjee S. The emperor of all maladies : a biography of cancer. Scribner, New York. 2010. ISBN=978-1-4391-0795-9.
Please help me out by Tweeting out this article or posting it to your favorite Facebook group.
There are two ways you can help support this blog. First, you can use Patreon by clicking on the link below. It allows you to set up a monthly donation, which will go a long way to supporting the Skeptical Raptor
Finally, you can also purchase anything on Amazon, and a small portion of each purchase goes to this website. Just click below, and shop for everything.