Hierarchy of scientific evidence – keys to scientific skepticism

I am  a scientific skeptic. It means that I pursue published scientific evidence to support or refute a scientific or medical principle. I am not a cynic, often conflated with skepticism. I don’t have an opinion about these ideas. Scientific skepticism depends on the quality and quantity of evidence that supports a scientific idea. And examining the hierarchy of scientific evidence can be helpful in deciding what is good data and what is bad. What can be used to form a conclusion, and what is useless.

That’s how science is done. And I use the hierarchy of scientific evidence to weigh the quality along with the quantity of evidence in reaching a conclusion. I am generally offended by those who push pseudoscience – they generally try to find evidence that supports their predetermined beliefs. That’s not science, that’s the opposite of good science.

Unfortunately, today’s world of instant news, with memes and 140 character analyses flying across social media, can be overwhelming. Sometimes we create an internal false balance,  assuming that headlines (often written to be clickbait) on one side are somehow equivalent to another side. So, we think there’s a scientific debate, when there isn’t one.

I attempt to write detailed, thoughtful and nuanced articles about scientific ideas. I know they can be complex and long-winded, but I know science is hard. It’s difficult. Sorry about that, but if it were so easy, everyone on the internet would be doing science. Unfortunately, there are too many people writing on the internet who think they are talking science, but they fail to differentiate between good and bad evidence.

But there is a way to make this easier. Not easy, just easier. This is my guide to amateur (and if I do a good job, professional) method to evaluating scientific research quality across the internet.

hierarchy of scientific evidence
©Compound Interest, 2015. compoundchem.com. Click to enlarge.


The hierarchy of scientific evidence

I’ve written about this topic before, but I ran across an article (and the accompanying image above) in an interesting website called CompoundChem. Lots of good stuff in there, and this infographic (certainly isn’t a meme) convinced me to revisit my own ideas about scientific evidence. I need to add a few things, like completely  useless sources, but it is a good graphic to separate good evidence from bad.

It’s important to know what is at the top (and conversely, at the bottom) of what constitutes good quality scientific evidence. Again, some people tend to make all evidence published in a medical or scientific journal as equivalent. Nothing could be further from the truth.

Cherry-picking a source to confirm your pre-ordained beliefs, without critiquing it for quality, is simply not good science. In fact, it’s one of the sure signs you’re engaging in pseudoscience.

If you’re pushing an agenda denying global warming or claiming GMO dangers, then it makes for better reading to attempt to show that there’s some equivalence in all of the “scientific” sources. The writer can then proclaim “the debate is still ongoing. Next up, have we found sasquatch?”

The BBC has stated that it “wishes to emphasise the importance of attempting to establish where the weight of scientific agreement may be found and make that clear to audiences.” They want to give due weight to the scientific consensus and move away from perpetrating the belief that being impartial means giving equal credence to both “sides” of an issue. One win for science.

In fact, some evidence is better than others. Some of the evidence is actually a lot better than others.  So let’s start at the bottom, and work our way up. (Please note that most of this discussion focuses on medical research, so if the discussion is about anthropogenic climate change, the hierarchy is somewhat different.)


13. Sources of no quality whatsoever

We’ll start with Natural News written by Mike Adams, aka the Health Ranger. Let me make this clear, Natural News is a foul, fetid, putrid, nuclear sewer of information, which befouls any real science with lies.  There is almost nothing published there that would qualify as scientifically accurate or responsible. It’s too bad that Google’s banning of Natural News was only temporary.

The antisemitic, anti-science, anti-knowledge website, Whale.to, similarly reeks of malodorous smells of pure garbage. GreenMedInfo, run by the junk medicine pushing Sayer Ji, is also a popular website that belongs in this group.

Anyone who uses these types of sites as a source for anything in a discussion about medical science loses the discussion without any further deliberation. In other words, if you ever use a link or quote from these types of websites, there is no further discussion. We are done.


12. Anecdotes

Simply, anecdotes are stories that are used to support a belief. They may be truthful and factual, but they don’t represent scientifically valid information. People might claim that something happened after a specific vaccine, a post hoc fallacy, but one person’s experience cannot and does not represent real evidence.

One of the sources that are used to describe the dangers of vaccines is dumpster diving through the VAERS (Vaccine Adverse Event Reporting System) database, a system used to capture patient and physician reports about vaccines. The problem with the database is that it is not scientific. Anyone can report anything, and the system cannot distinguish between causality, correlation and randomness. As the VAERS website itself states:

VAERS is a passive reporting system, meaning that reports about adverse events are not automatically collected, but require a report to be filed to VAERS. VAERS reports can be submitted voluntarily by anyone, including healthcare providers, patients, or family members. Reports vary in quality and completeness. They often lack details and sometimes can have information that contains errors.

In simple scientific terms, these are observations, or anecdotes. To be fair, VAERS could be used as an observational tool to uncover potential issues, but it can only be used to form a hypothesis, which needs to be tested, and then confirmed or refuted, by controlled clinical or epidemiological studies. And just because there are 100 anecdotes that say something might be wrong with vaccines, there could be 1 million anecdotes that say nothing is wrong.

Real research needs to be completed, if there’s some observed trend in VAERS anecdotes, before stating some sort of evidence-based change in vaccination policy. Real scientists do examine the database, and so far, little conclusive information has been found.


11. Appeal to authority

In the world of scientific discussion, a lot of depends on statements from a leading authority or, more frequently than I would like, false authority.

For example, Tetyana Obukhanych is the darling of the anti-vaccine crowd. Although she appears to have sterling credentials in immunology, she hasn’t published anything in any journal that would support her negative opinions about vaccines. But she uses her credentials to continue to argue about vaccines, rather than evidence.

Another example is Peter Doshi, an HIV-AIDS denier is a non-epidemiologist, non-virologist, and non-immunologist critic of flu vaccines. He is used by antivaccine crowd as their “authority” on the flu vaccine because he has published in medical journals, but he isn’t a scientist, so his research borders on opinion rather than real science.

Both Obukhanych and Doshi are perfect examples of the logical fallacy of Appeal to False Authority. People use their opinions in an argument about vaccines as if their authority and credentials matter more than published evidence. The only thing that matters is evidence. So if an “authority” provides high quality evidence that contradicts the scientific consensus, then the evidence matters, not the authority.

But there are good experts whose public comments can provide weight to a discussion. In an outstanding account in the blog, Science or Not, the writer gives us a checklist of what makes a good expert, or authority, in published research.

A real expert is one that:

  • has superior factual knowledge in the field and exceptional understanding of how to apply the facts to produce concrete results.
  • has a better ability than her peers to differentiate between similar, but not identical cases.
  • shows consistency in judgements that is better than her peers.
  • has been identified by her peers in the field as an expert.

And generally, this expert or authority has a long list of published articles that provide the backdrop for their statements. They don’t invent knowledge out of thin air, but from evidence derived from the  scientific method.

Science or Not also provided a “checklist” of Red Flags or warning signs of a non-expert:

But in general, we shouldn’t waste time with “experts.” There are at least 10 better kinds of evidence that are far superior to authorities.


10. Press releases or news reports.

Conclusions stated in a press release, even when they are from a major research university, are very weak evidence, because they haven’t been peer reviewed. I have a feed filled with press releases from major research universities, and I’ve found errors in interpretation from the university’s public relations office relative to the real research. However, it is possible to use a press release to chase down the actual published, and peer reviewed study – it’s not completely worthless. But some press releases don’t mention the journal, publication date, title of the paper, or anything to help track down the paper.


9. Meeting abstract or poster presentation.

These are presentations made at meetings/conventions of a scientific society. There are hundreds done every year, and preliminary research is presented in poster sessions or formal presentations. Usually, there are a lot of question and answers, that of course doesn’t show up in a link to the abstract or poster, that can help explain data. Moreover, these types of presentations vary from nearly ready for publication to pure speculation.

They are generally not peer-reviewed, that is, the system where scientific peers (usually well-known experts in the field) review the article for scientific accuracy and quality. They are not published formally in any journal. And they often do not contain enough explanation of methods, data, analysis, and other issues to evaluate properly.

In an article in JAMA, the authors found that within 3 years after abstracts were presented at meetings, only about 50% were published in high-impact journals, 25% in low-impact journals, and 25% remained unpublished.

Interestingly, the 39 abstracts that received front-page coverage in newspapers had a publication rate almost identical to the overall publication rate. The authors concluded that, “abstracts at scientific meetings receive substantial attention in the high-profile media. A substantial number of the studies remain unpublished, precluding evaluation in the scientific community.” In other words, until the research is published in peer reviewed journals, it’s hard to evaluate their quality and importance.

Mostly abstracts give us an insight into where research is going in a particular field. But one needs to wait until it is published with a full explanation of methods, results, statistics, and conclusion.


8. Case reports.

I generally dislike these types of articles, although I know physicians rely upon them to assist in diagnosis or treatment. This type of research often show up in high quality medical journals, but they are usually observations of one or two patients. They are generally considered to be anecdotal reports, and generally lack statistical analysis. That being said, occasionally a case report will lead to new insight into a disease or condition that, with further research, can be added to the body of science that becomes a part of evidence- or science-based medicine.

Unfortunately, science deniers will cherry pick a published case report, and use it to support their pre-established conclusions. But a true scientific skeptic should never use them as evidence, and as a result, case reports sit near the bottom of the hierarchy of scientific evidence.


7. Animal or cell culture studies.

This is basic primary research which may or may not be applicable to clinical medicine. There is too much credence is given to these studies, which may be 20 or more years away from having any clinical importance, if any at all.

Someone publishes that cannabis “has some effect on” breast cancer in mice, and that is used support a claim that smoking pot can “cure cancer.” Except there’s no evidence of a significant human clinical effect.

There is an old joke in medical research circles: science has cured cancer in mice for decades, so the value of these types of studies is limited, until a significant clinical effect can be shown in real studies. Furthermore, the same standards of evaluation must be applied to animal or cell culture studies that are used for clinical studies–real experts who publish in peer reviewed, high impact factor journals.

There is a tendency to over-dramatize results from animal studies, when only a small percentage of compounds that are tested in animals or cell cultures ever make it into human clinical trials (let alone are approved by the FDA). The National Cancer Institute has screened over 400,000 compounds for treating cancer, and maybe 20-30,000 have even made it to early clinical trials, and of those just a handful are used in modern medicine.

Be extremely skeptical of any news source, press release, or meme that refers to an animal or cell culture study–they rarely lead to real clinical results. And this is why these primary studies are at the lower end of the hierarchy of scientific evidence.


6. Cross-sectional studies.

Cross sectional studies (or sometimes surveys) are one of the basic tools of epidemiology. Essentially, these are surveys where the research observes one single point of data in a large population. It provides little information about correlation or causality, but is used to determine what proportion of an observed population may have a disease at a single point in time.

The major problem with these type of studies is that they cannot eliminate confounding factors (that is, factors that were not assessed but may have had influence over the results), can be highly biased depending on how the data was gathered, and generally cannot answer a specific hypothesis.


5. Case-control studies. 

This is a type of analytical study which compares individuals who have a specific disease (“cases”) with a group of individuals without the disease (“controls”). The proportion of each group having a history of a particular exposure or characteristic of interest is then compared. An association between the hypothesized exposure and the disease being studied will be reflected in a greater proportion of the cases being exposed to the factor of interest.

For example, one could take a group of individuals that have lung cancer and compare them to a group that do not. The researchers would pick a hypothesized factor, say smoking, and determine the rate of smoking in each group. From that data, the researchers could get a determine of the differences in risk between smokers and non-smokers for lung cancer.

At this point in the hierarchy of scientific evidence, we are beginning to receive evidence of such a quality that it could be used to support or refute a clinical claim. But case-control studies have a few weaknesses that keep them from being at the top of the hierarchy – the studies are observational in nature and thus do not provide the same level of evidence as randomized controlled trials. Moreover, the results may not identify confounding factors.

There are examples of individual case-control studies providing evidence of a claim, but were eventually contradicted by new research, or by a systematic review (which we’ll discuss below).


4. Cohort studies (retrospective studies).

These studies utilize one or more samples (called cohorts) which are followed prospectively over time. Subsequent status evaluations with respect to a disease or outcome are conducted to determine which initial participants exposure characteristics (risk factors) are associated with it.

In more simple terms, the researchers follow one group who may be exposed to something and compare them to a group that does not. From this data, the study can tell us what the absolute risk may be from exposure for certain diseases.  A cohort study is often undertaken to obtain evidence to try to refute the existence of a suspected association between cause and effect.

For example, it is inappropriate to perform a double-blind randomized controlled trial with vaccines because it would be unethical to withhold a lifesaving vaccines from the placebo group. One could argue that subjects could volunteer to be in the placebo group, but that would break randomization and double-blind, because there would be some unintentional bias that forms

Thus, cohort studies (and sometimes, case-control studies mentioned above), usually with tens of thousands up to millions of subjects, can provide high quality data refuting or supporting a cause and effect. We have massive cohort studies that have concluded that vaccines do not cause autism.


3. Randomized controlled trials with non-definitive results.

A randomized controlled trial (sometimes called a double-blind clinical trial or randomized clinical trial) randomly places participants in two or more groups to test the effect of a medical treatment against a placebo or standard treatment. This type of study uses a relatively small patient population, usually with large confidence intervals which may only suggest a clinically significant effect. They have value, but the small population of subjects can skew the results. For example, patient selection may be biased, because it’s not large enough to make confounding variables less important.

I found it difficult to place this study at the #3 spot, because the quality of these studies vary all over the place. But randomized clinical trials, even small ones, are more useful than animal studies by a large amount, so they are placed near the top of the hierarchy of scientific evidence.


2. Large randomized controlled trials.

These are randomized clinical trials that include fairly large numbers (in general, I like to see >1,000 subjects in each arm), with confidence intervals (errors) that do not overlap, and show a clinically significant effect. The results are definitive, and are published in high quality journals.


1. Systematic reviews.

These secondary research articles include meta-reviews, review articles. One of the most frequently cited sources of systematic reviews are the Cochrane Collaboration articles. A systematic review is a critical assessment and evaluation of all research studies that address the effect of pharmaceuticals, devices, standards of care, and other issues related to a particular clinical condition. Researcher who produce systematic reviews use an organized method of collecting and analyzing a body of literature on a particular topic using a set of specific criteria.

A systematic review may examine the quality of research in each of the papers, describe the results qualitatively, and find bias and errors. A published systematic review usually includes a description of the findings of the collection of research studies. Many systematic reviews also includes a quantitative pooling of data, which is called a meta analysis. All meta analyses are systematic reviews, but not all systematic reviews contain meta analyses.

Although I, and others, consider meta reviews to be the pinnacle of scientific research, they are not perfect and are not above criticism. For example, Cochrane produces outstanding systematic reviews and meta analyses, but they have occasionally published some awful reviews. One article relied upon authors who have axes to grind about their personal beliefs. The worst thing that can happen is when biased researchers have a predetermined conclusion, then use a systematic review to confirm their conclusions. We see that frequently with Cochrane’s reviews of acupuncture, which seem to act like cheerleaders to the pseudoscience of acupuncture.

Systematic reviews require a fine hand and strong statistical skills. And there are plenty of published systematic reviews that are not from Cochrane. Many other high quality journals publish meta analyses. Systematic reviews are the pinnacle of great science, the top of the hierarchy of scientific evidence, and often are the basis of a scientific consensus, but that does not mean that they get to skate by without a critique.


Impact factor and peer review

One of the better ways to ascertain scientific research quality to examine the quality of the journal where the research was published. Articles in high quality journals are cited more often, because those journals attract the best scientific articles (which are cited more).

Yes, it’s a self-fulfilling system, but that’s not all bad. Some of the most prestigious journals have been around for a century or more, and their reputation is deserved.

Obviously, the best articles are sent to these journals partially because of the prestige of the journal, but also because the peer review is so thorough. Journals use a metric called “impact factor” that essentially states how many times an average article is cited by other articles in an index (in this case for all medical journals).

The impact factor could range from 0 (no one ever cites it) to some huge number, but the largest is in the 50-70 range. One of the highest impact factor journals is the Annual Review of Immunology, which is traditionally in the 50’s. This means that the average article in the Annual Review of Immunology is cited over 50 times by other articles. This becomes a fairly good indicator of the importance of articles published in the journal, because other articles in other journals need to refer to it in advancing the science.

Impact factor does have its weaknesses. For example, a highly esoteric, but high quality journal, may have a moderate impact factor, but it still might still be a prestigious and extraordinary journal. Also a new journal might have an artificially low impact factor, but still be high quality, as more scientists read the article and then cite it. In this case, watching steady growth in the Impact Factor might be a good indication of it’s quality.

Almost every journal provides its impact factor (unless they have reasons to not do so, like the number is so low as to be laughable) in the masthead or “about” page on the web. Just Google “XYZ journal impact factor” to find the impact factor, but be careful about the journal name, since the Journal of XYZ may be completely different than the XYZ Journal. It happens a lot.

As an independent, objective method to judge the quality of published research, impact factors are useful, but flawed. A journal with an impact factor of 5.0 may or may not be better than a journal with an impact factor of 4.5.

However, larger differences can be useful in judging the quality of the underlying article. Frankly, I consider anything over 10 to be high quality, especially in very specialized fields of research in the biomedical sciences. Anything between 5-10 is still high quality, especially if the journal publishes a lot of articles that are widely cited. One of the problems with Impact Factor is that it is an average, so many journals publish popular articles that get hundreds of cites, while also publishing very esoteric and focused articles that don’t get cited often.

Journals with impact factors of 2-5 can be high quality, especially if its new, or it focuses on a very specialized field of science, but you should remain wary of the quality of evidence, especially if there are no other studies confirming their conclusions.

Using journals with impact factors lower than 2 for evidence to answer a question, especially if it’s the only evidence available, is problematic. Maybe the evidence is good, but as discussed previously, it’s the body of evidence that matters, while single studies can be anything from well-done and insightful to poorly designed and biased. Science depends on lots of repetition, and a one-off study is way off of the radar on the hierarchy of scientific evidence.

One last thing–predatory journals, those that claim “open access” (which should not be conflated with public access) and charge the author to publish, populate the lower impact factor layers. These journals have cursory, if even that, peer-review. Many cherry pickers, who look for research that supports their beliefs, will often find “evidence” in these lower levels of impact factors.


Clinical significance

We could do all of the research, finding nothing but meta-analyses in high impact factor journals. But the only thing that matters is if the results show meaningful and significant clinical results.

If one consumes XYZ drug for influenza decreases the course of the disease from 10 days to 9.5 days, is there any value to the drug? Even if it’s statistically significant, is it clinically significant?

David Gorski, one of the editors at Science Based Medicine, made some important points about clinical relevance:

In human studies, the problem appears to be different. There’s another saying in medicine that statistical significance doesn’t necessarily mean that a finding will be clinically significant. In other words, we find small differences in treatment effect or associations between various biomarkers and various diseases that are statistically significant all the time. However, they are often too small to be clinically significant. Is, for example, an allele whose presence means a risk of a certain condition that is increased by 5% clinically significant? It might be if the risk in the population is less than 5%, but if the risk in the population is 50%, much less so.

We ask this question all the time in oncology when considering whether or not a “positive” finding in a clinical trial of adjuvant chemotherapy is clinically relevant. For example, if chemotherapy increases the five year survival by 2% in a tumor that has a high likelihood of survival after surgery clinically relevant? Or is an elevated lab value that is associated with a 5% increase in the risk of a condition clinically relevant? Yes, it’s a bit of a value judgment, but small benefits that are statistically significant aren’t always clinically relevant.

Now, as Gorski states, this is a bit of guesswork and instinct, but none of us should probably accept results as meaningful if they are tiny, even if statistically significant. I see this a lot, especially with alternative medicine studies that try to “prove” that they have some benefit beyond placebo. Often the differences are in single digit percentages, probably no different than random.



I realize it’s really hard work to decide what is and isn’t good science. Maybe that’s why people think that there is a debate about climate change, or vaccines, or evolution. It takes time to determine if an expert really is an expert, or to see if the body of evidence is solidly on one side or another. In a scientific discussion, one must give weight to not only more evidence, but critically to the quality of evidence. If I go into a debate against a science denier, carrying 5 or 10 published systematic reviews that all revolve around a scientific consensus, and the other “side” has poorly done studies or, and this happens a lot, quotes from Natural News, it’s no longer a debate. It is simply a shutout (to use a sports metaphor), with the other side barely able to walk on the same field.

People want simple answers to complex questions. They don’t even want complex questions, they want simple ones. Unfortunately, scientific questions are complex and demand complex evidence. And science doesn’t give up their answers in one hour. It takes years.

It’s been over 150 years since Charles Darwin had an epiphany and gave us the theory that evolution was a result of natural selection. And during that ensuing 150 years, science has accumulated more data, fine-tuned Darwin’s original ideas, discovered DNA and genes – now, we have the modern synthesis of evolutionary biology. That took a lot of time and a lot of work.

And we know what science is not. It is not providing “experts” who deny the vast body of evidence. It is not cherrypicking research to support a predetermined conclusion. It is not giving equal weight to every statement out there. It is not spinning a conspiracy theory about scientific consensus.

When I write here, I try not to have an opinion – instead, I strive to have the scientific evidence on my side. I use the hierarchy of scientific evidence to find the most robust evidence, then seeing where it leads me.

Let’s be clear – science is not some magical, irrational process. In fact, science is actually a logical method or process to answer a question about the natural universe. Science is not a dogma, it constantly evolves as more data is found. Science progresses through constant analysis, criticism and accumulation of data. That does not mean I think there’s a magic piece of data that will show that we were all wrong about evolution or vaccines. But if a real scientists publishes data that shows we were wrong, and it’s confirmed by more research, then a similar mountain of evidence that overturns evolution or vaccines (or almost anything else in science) is provided, time for change.

But using rhetoric, logical fallacies, conspiracy theories, lies, misinformation, and faith in your opinions? That’s not going to change anyone’s mind, unless they have the same level of ignorance of science. If you are stating something that is not factually supported by real evidence, then your statements are your opinion and you are simply wrong. Read that again – you are wrong.

If you continue to claim that vaccines cause autism. You are wrong. If you claim that humans do not cause climate change. You are wrong. If you think that the earth is 6000 years old. You are wrong. If you think that GMOs are dangerous. You are wrong.

I can say that, because I have the scientific evidence on my side. I win on these points before a debate can even start. And that’s not my opinion based on arrogance or conceit. It’s not my belief that is dependent upon ignorance and fallacies. My statements are based on what the scientific consensus says, and that matters more than your opinions and beliefs. You get to attack me and everyone I know about vaccines or climate change or GMOs or evolution if you have evidence that hits the topmost level of the hierarchy of scientific evidence – until then, you are wrong.

It’s important to remember that if the science doesn’t support your point of view, it means your point of view is wrong. Get over it–it’s all right for science to conquer ignorance and belief.


The TL;DR version

  1. Only high quality articles, published in peer reviewed, high impact factor scientific journals count in a discussion about science.
  2. Expertise matters, but an expert is not an expert by acclamation or a few letters after the expert’s names. It’s established by the amount and depth of respect from peers, by the quality of research performed, and by the amount and quality of evidence that supports the expert’s statements.
  3. Not all science is equal. And the best research is a systematic review which shows a clinically significant effect.
  4. In biomedical research, clinical significance of results matters more than simple statistical significance.
  5. Giving false balance or undue weight to fringe beliefs that are unsupported by the vast breadth of research in that area is inappropriate and can be ignored.
  6. If someone thinks their opinion matters more than accumulated scientific evidence, they would be wrong. They should give up.

Editor’s note – This article was first published in August 2015. It has been updated with new links, order, and writing. 


Key citations:

The Original Skeptical Raptor
Chief Executive Officer at SkepticalRaptor
Lifetime lover of science, especially biomedical research. Spent years in academics, business development, research, and traveling the world shilling for Big Pharma. I love sports, mostly college basketball and football, hockey, and baseball. I enjoy great food and intelligent conversation. And a delicious morning coffee!