The hierarchy of scientific evidence – keys to skepticism

The hierarchy of scientific evidence – keys to skepticism

I am  a scientific skeptic. It means that I pursue published scientific evidence to support or refute a scientific or medical principle. I am not a cynic, often conflated with skepticism. I don’t have an opinion about these ideas. I gather the evidence, published worldwide, and try to see where that evidence takes me. That’s how science is done.

I am generally offended by those who push pseudoscience–they generally try to find evidence that supports their predetermined beliefs. That’s not science, not even close.

Unfortunately, today’s world of instant news, with memes and 140 character analyses flying across social media, can be overwhelming. Sometimes we create an internal false balance,  assuming that headlines (often written to be clickbait) on one side are somehow equivalent to another side. So, we think there’s a scientific debate, when there isn’t one.

I attempt to write detailed, thoughtful and nuanced articles about scientific ideas. I know they can be complex and long-winded, but I know science is hard. It’s difficult. Sorry about that, but if it were so easy, everyone on the internet would be doing science. Wait! What?

But there is a way to make this easier. Not easy, just easier. Scientific skepticism depends on the quality and quantity of evidence that supports a scientific idea. And examining the hierarchy of scientific evidence can be helpful in deciding what is good data and what is bad. What can be used to form a conclusion, and what is useless.

This is my guide to amateur (and if I do a good job, professional) method to evaluating scientific research quality across the internet.

©Compound Interest, 2015.
©Compound Interest, 2015. Click to enlarge.


The hierarchy of scientific evidence


I’ve written about this topic before, but I ran across an article (and the accompanying image above) in an interesting website called CompoundChem. Lots of good stuff in there, and this chart (certainly isn’t a meme) convinced me to revisit my own ideas about scientific evidence. I need to add a few things, like completely  useless sources, but it is a good chart to get info quickly.

It’s important to know what is at the top (and conversely, at the bottom) of what constitutes good quality scientific evidence. Again, science and medicine journalists, and the lay reader, tends to think all evidence is equal.

If you’re pushing an agenda denying global warming or claiming GMO dangers, then it makes for better reading to attempt to show that there’s some equivalence in all of the “scientific” sources. The writer can then proclaim “the debate is still ongoing. Next up, have we found sasquatch?”

The BBC has stated that it “wishes to emphasise the importance of attempting to establish where the weight of scientific agreement may be found and make that clear to audiences.” They want to give due weight to the scientific consensus and move away from perpetrating the belief that being impartial means giving equal credence to both “sides” of an issue. One win for science.

In fact, some evidence is better than others. Some of the evidence is actually a lot better than others.  So let’s start at the bottom, and work our way up. (Please note that most of this discussion focuses on medical research, so if the discussion is about anthropogenic climate change, the hierarchy is somewhat different.)


13. Sources of no quality whatsoever


We’ll start with Natural News written by Mike Adams, aka the Health Ranger. Let me make this clear, Natural News is a foul, fetid, putrid sewer of information, which befouls any real science with lies.  There is almost nothing published there that would qualify as scientifically accurate or responsible.

The antisemitic, anti-science, anti-knowledge website,, similarly reeks of malodorous smells of pure garbage. GreenMedInfo, run by the junk medicine pushing Sayer Ji, is also a popular website that belongs in this group.

Anyone who uses these types of sites as a source for anything in a discussion about medical science loses the discussion without any further debate. We’re done at that point.


12. Anecdotes


Simply, anecdotes are stories that are used to support a belief. They may be truthful and factual, but they don’t represent scientifically valid information. People might claim that something happened after a specific vaccine, but one person’s experience does not represent real evidence.

One of the sources that are used to describe the dangers of vaccines is dumpster diving through the VAERS (Vaccine Adverse Event Reporting System) database, a system used to capture patient and physician reports about vaccines.

The problem with the database is that it is not scientific. Anyone can report anything, and the system cannot distinguish between causality, correlation and randomness. As the VAERS website states:

VAERS is a passive reporting system, meaning that reports about adverse events are not automatically collected, but require a report to be filed to VAERS. VAERS reports can be submitted voluntarily by anyone, including healthcare providers, patients, or family members. Reports vary in quality and completeness. They often lack details and sometimes can have information that contains errors.

In simple scientific terms, these are observations, or anecdotes. To be fair, VAERS could be used as a tool to see potential issues, but it can only be used to form a hypothesis, which needs to be confirmed or refuted by controlled clinical or epidemiological studies. And just because there are 100 anecdotes that say something might be wrong with vaccines, there could be 1 million anecdotes that say nothing is wrong.

Real research needs to be completed, if there’s some trend in VAERS anecdotes,  before stating some sort of evidence-based change in vaccination policy. Real scientists do examine the database, and so far, little conclusive information as been found.

11. Appeal to authority


In the world of scientific discussion, a lot of depends on statements from a leading authority or, more frequently than I would like, false authority.

For example, Peter Doshi, an HIV-AIDS denier is a non-epidemiologist, non-virologist, and non-immunologist critic of flu vaccines. He is used by antivaccine crowd as their “authority” on the flu vaccine, despite his total lack of scientific credentials. Doshi is a perfect example of the logical fallacy of Appeal to False Authority.

The only thing that matters is evidence. So if an “authority” provides high quality evidence that contradicts the scientific consensus, then the evidence matters, not the authority.

But there are good experts whose public comments can provide weight to a discussion. In an outstanding account in the blog, Science or Not, the writer gives us a checklist of what makes a good expert, or authority, in published research.

A real expert is one that:

  • has superior factual knowledge in the field and exceptional understanding of how to apply the facts to produce concrete results.
  • has a better ability than her peers to differentiate between similar, but not identical cases.
  • shows consistency in judgements that is better than her peers.
  • has been identified by her peers in the field as an expert.

And generally, this expert or authority has a long list of published articles that provide the backdrop for their statements. They don’t invent knowledge out of thin air, but from evidence derived from the  scientific method.

Science or Not also provided a “checklist” of Red Flags or warning signs of a non-expert:

But in general, we shouldn’t waste time with “experts.” There are at least 10 better kinds of evidence that are far superior to authorities.


10. Press releases or news reports.


Do not accept the conclusions stated by a press release from a major research university, they haven’t been peer reviewed. I have a feed filled with press releases from major research universities, and I’ve found errors in interpretation from the university’s public relations office relative to the real research. However, it is possible to use a press release to chase down the actual published, and peer reviewed study. So, it’s not completely worthless.

In the false debate (but let’s make it happen anyways), one must give weight to not only more evidence, but critically to the quality of evidence. If I go into such a debate with 5 or 10 published systematic reviews that all revolve around a scientific consensus, and the other “side” has poorly done studies or, and this happens a lot, quotes from Natural News, it’s no longer a debate. It is simply a shutout (to use a sports metaphor), with the other side barely able to walk on the same field.

9. Meeting abstract or poster presentation.


These are presentations made at meetings/conventions of a scientific society. There are hundreds done every year, and preliminary research is presented in poster sessions or formal presentations. Usually, there are a lot of question and answers, that of course doesn’t show up in a link to the abstract or poster, that can help explain data. Moreover, these types of presentations vary from nearly ready for publication to pure speculation.

They are generally not peer-reviewed, that is, the system where scientific peers (usually well-known experts in the field) review the article for scientific accuracy and quality. They are not published formally in any journal. And they often do not contain enough explanation of methods, data, analysis, and other issues to evaluate properly.

In an article in JAMA, the authors found that within 3 years after abstracts were presented at meetings, only about 50% were published in high-impact journals, 25% in low-impact journals, and 25% remained unpublished.

Interestingly, the 39 abstracts that received front-page coverage in newspapers had a publication rate almost identical to the overall publication rate. The authors concluded that, “abstracts at scientific meetings receive substantial attention in the high-profile media. A substantial number of the studies remain unpublished, precluding evaluation in the scientific community.” In other words, until the research is published in peer reviewed journals, it’s hard to evaluate their quality and importance.


8. Animal or cell culture studies.


This is basic primary research which may or may not be applicable to clinical medicine. There is too much credence is given to these studies, which may be 20 or more years away from having any clinical importance, if any at all.

Someone publishes that cannabis “has some effect on” breast cancer in mice, and that is used support a claim that smoking pot can “cure cancer.” Except there’s no evidence of a significant human clinical effect.

There is an old joke in medical research circles: science has cured cancer in mice for decades, so the value of these types of studies is limited, until a significant clinical effect can be shown in real studies. Furthermore, the same standards of evaluation must be applied to animal or cell culture studies that are used for clinical studies–real experts who publish in peer reviewed, high impact factor journals.

There is a tendency to over-dramatize results from animal studies, when only a small percentage of compounds that are tested in animals or cell cultures ever make it into human clinical trials (let alone are approved by the FDA). The National Cancer Institute has screened over 400,000 compounds for treating cancer, and maybe 20-30,000 have even made it to early clinical trials, and of those just a handful are used in modern medicine.

Be extremely skeptical of any news source, press release, or meme that refers to an animal or cell culture study–they rarely lead to real clinical results.


7. Case reports.


I generally dislike these types of articles, although I know physicians rely upon them to assist in diagnosis or treatment. This type of research often show up in high quality medical journals, but they are usually observations of one or two patients.

Those in medicine know their purpose, which is to give a “heads up” to an observation. But they don’t provide any evidence in support of anything. They really have no scientific validity beyond observational.

Frequently, science deniers will cherry pick a published case report, and use it to support their preordained conclusions. But a true scientific skeptic should never use them to support a scientific point.


6. Cross-sectional studies.


Cross sectional studies (or sometimes surveys) are one of the basic tools of epidemiology. Essentially, these are surveys where the research observes one single point of data in a large population. It provides little information about correlation or causality, but is used to determine what proportion of an observed population may have a disease at a single point in time.

The major problem with these type of studies is that they cannot eliminate confounding factors (that is, factors that were not assessed but may have had influence over the results), can be highly biased depending on how the data was gathered, and generally cannot answer a specific hypothesis.


5. Case-control studies. 


This is a type of analytical study which compares individuals who have a specific disease (“cases”) with a group of individuals without the disease (“controls”). The proportion of each group having a history of a particular exposure or characteristic of interest is then compared. An association between the hypothesized exposure and the disease being studied will be reflected in a greater proportion of the cases being exposed to the factor of interest.

At this point, we are beginning to have the quality of evidence that could be used to support or refute a clinical claim. But case-control studies have a few weaknesses that keep them from being at the top of the list–the studies are observational in nature and thus do not provide the same level of evidence as randomized controlled trials; and the results may not identify confounding factors.

There are examples of individual case-control studies providing evidence of a claim, but were eventually contradicted by new research, or by a systematic review (which we’ll discuss below).


4. Cohort studies (retrospective studies).


These studies utilize one or more samples (called cohorts) which are followed prospectively. Subsequent status evaluations with respect to a disease or outcome are conducted to determine which initial participants exposure characteristics (risk factors) are associated with it.

In more simple terms, the researchers follow one group who may be exposed to something and compare them to a group that does not. From this data, the study can tell us what the absolute risk may be from exposure for certain diseases.  A cohort study is often undertaken to obtain evidence to try to refute the existence of a suspected association between cause and effect.

For example, it is inappropriate to perform a double-blind randomized controlled trial with vaccines because it would be unethical to withhold a lifesaving vaccines from the placebo group. One could argue that subjects could volunteer to be in the placebo group, but that would break randomization and double-blind, because there would be no blinding of the research.

Thus, cohort studies (and sometimes, case-control studies mentioned above), usually with tens of thousands up to millions of subjects, can provide high quality data refuting or supporting a cause and effect. We have massive cohort studies that have concluded that vaccines do not cause autism.


3. Randomized controlled trials with non-definitive results.


A randomized controlled trial (sometimes called a double-blind clinical trial or randomized clinical trial) randomly places participants in two or more groups to test the effect of a medical treatment against a placebo or standard treatment.

This type of study uses a relatively small patient population, usually with large confidence intervals which may only suggest a clinically significant effect. They have value, but the small population of subjects can skew the results. For example, patient selection may be biased, because it’s not large enough to make confounding variables less important.

I found it difficult to place this study at the #3 spot, because the quality of these studies vary all over the place. But randomized clinical trials, even small ones, are more useful than animal studies by a large amount.


2. Large randomized controlled trials.


These are randomized clinical trials that include fairly large numbers (in general, I like to see >1,000 subjects in each arm), with confidence intervals (errors) that do not overlap, and show a clinically significant effect. The results are definitive, and are published in high quality journals.


1. Systematic reviews.


These secondary research articles include meta-reviews, review articles. One of the most frequently cited sources of systematic reviews are the Cochrane Collaboration articles.

A systematic review is a critical assessment and evaluation of all research studies that address the effect of pharmaceuticals, devices, standards of care, and other issues related to a particular clinical condition. Researcher who produce systematic reviews use an organized method of collecting and analyzing a body of literature on a particular topic using a set of specific criteria.

A systematic review may examine the quality of research in each of the papers, describe the results qualitatively, and find bias and errors.

A published systematic review usually includes a description of the findings of the collection of research studies. Many systematic reviews also includes a quantitative pooling of data, which is called a meta analysis. All meta analyses are systematic reviews, but not all systematic reviews contain meta analyses.

Although I, and others, consider meta reviews to be the pinnacle of scientific research, it is not perfect. For example, Cochrane produces outstanding systematic reviews and meta analyses, but they have occasionally published some awful reviews. One article relied upon authors who have axes to grind about their personal beliefs. The worst thing that can happen is when biased researchers have a predetermined conclusion, then use a systematic review to confirm that conclusion.

Systematic reviews require a fine hand and strong statistical skills. And there are plenty of published systematic reviews that are not from Cochrane. Many other high quality journals publish meta analyses.

Systematic reviews are the pinnacle of great science and often are the basis of a scientific consensus, but that does not mean that they get to skate by without a critique.


Impact factor and peer review


One of the better ways to ascertain scientific research quality to examine the quality of the journal where the research was published. Articles in high quality journals are cited more often, because those journals attract the best scientific articles (which are cited more).

Yes, it’s a self-fulfilling system, but that’s not all bad. Some of the most prestigious journals have been around for a century or more, and their reputation is deserved.

Obviously, the best articles are sent to these journals partially because of the prestige of the journal, but also because the peer review is so thorough. Journals use a metric called “impact factor” that essentially states how many times an average article is cited by other articles in an index (in this case for all medical journals).

The impact factor could range from 0 (no one ever cites it) to some huge number, but the largest is in the 50-70 range. One of the highest impact factor journals is the Annual Review of Immunology, which is traditionally in the 50’s. This means that the average article in the Annual Review of Immunology is cited over 50 times by other articles. This becomes a fairly good indicator of the importance of articles published in the journal, because other articles in other journals need to refer to it in advancing the science.

Impact factor does have its weaknesses. For example, a highly esoteric, but high quality journal, may have a moderate impact factor, but it still might still be a prestigious and extraordinary journal. Also a new journal might have an artificially low impact factor, but still be high quality, as more scientists read the article and then cite it. In this case, watching steady growth in the Impact Factor might be a good indication of it’s quality.

Almost every journal provides its impact factor (unless they have reasons to not do so, like the number is so low as to be laughable) in the masthead or “about” page on the web. Just Google “XYZ journal impact factor” to find the impact factor, but be careful about the journal name, since the Journal of XYZ may be completely different than the XYZ Journal. It happens a lot.

As an independent, objective method to judge the quality of published research, impact factors are useful, but flawed. A journal with an impact factor of 5.0 may or may not be better than a journal with an impact factor of 4.5.

However, larger differences can be useful in judging the quality of the underlying article. Frankly, I consider anything over 10 to be high quality, especially in very specialized fields of research in the biomedical sciences. Anything between 5-10 is still high quality, especially if the journal publishes a lot of articles that are widely cited. One of the problems with Impact Factor is that it is an average, so many journals publish popular articles that get hundreds of cites, while also publishing very esoteric and focused articles that don’t get cited often.

Journals with impact factors of 2-5 can be high quality, especially if its new, or it focuses on a very specialized field of science.

Using journals with impact factors lower than 2 for evidence to answer a question, especially if it’s the only evidence available, is problematic. Maybe the evidence is good, but as discussed previously, it’s the body of evidence that matters, while single studies can be anything from well-done and insightful to poorly designed and biased.

One last thing–predatory journals, those that claim “open access” (which should not be conflated with public access) and charge the author to publish, populate the lower impact factor layers. These journals have cursory, if even that, peer-review. Many cherry pickers, who look for research that supports their beliefs, will often find “evidence” in these lower levels of impact factors.


Clinical significance


We could do all of the research, finding nothing but meta-analyses in high impact factor journals. But the only thing that matters is if the results show meaningful and significant clinical results.

If one consumes XYZ drug for influenza decreases the course of the disease from 10 days to 9.5 days, is there any value to the drug? Even if it’s statistically significant, is it clinically significant?

David Gorski, editor-in-snark at Science Based Medicine, made some important points about clinical relevance:

In human studies, the problem appears to be different. There’s another saying in medicine that statistical significance doesn’t necessarily mean that a finding will be clinically significant. In other words, we find small differences in treatment effect or associations between various biomarkers and various diseases that are statistically significant all the time. However, they are often too small to be clinically significant. Is, for example, an allele whose presence means a risk of a certain condition that is increased by 5% clinically significant? It might be if the risk in the population is less than 5%, but if the risk in the population is 50%, much less so.

We ask this question all the time in oncology when considering whether or not a “positive” finding in a clinical trial of adjuvant chemotherapy is clinically relevant. For example, if chemotherapy increases the five year survival by 2% in a tumor that has a high likelihood of survival after surgery clinically relevant? Or is an elevated lab value that is associated with a 5% increase in the risk of a condition clinically relevant? Yes, it’s a bit of a value judgment, but small benefits that are statistically significant aren’t always clinically relevant.

Now, as Gorski states, this is a bit of guesswork and instinct, but none of us should probably accept results as meaningful if they are tiny, even if statistically significant. I see this a lot, especially with alternative medicine studies that try to “prove” that they have some benefit beyond placebo. Often the differences are in single digit percentages, probably no different than random.




I realize it’s really hard work to decide what is and isn’t good science. Maybe that’s why people think that there is a debate about climate change, or vaccines, or evolution. It takes time to determine if an expert really is an expert, or to see if the body of evidence is solidly on one side or another.

People want simple answers to complex questions. They don’t even want complex questions, they want simple ones. Unfortunately, scientific questions are complex and demand complex evidence. And science doesn’t give up their answers in one hour. It takes years.

It’s been over 150 years since Charles Darwin had an epiphany and gave us the theory that evolution was a result of natural selection. And during that ensuing 150 years, science has accumulated more data, fine-tuned Darwin’s original ideas, discovered DNA and genes–now, we have the modern synthesis of evolutionary biology. That took a lot of time and a lot of work.

And we know what science is not. It is not providing “experts” who deny the vast body of evidence. It is not cherrypicking research to support a predetermined conclusion. It is not giving equal weight to every statement out there. It is not spinning a conspiracy theory about scientific consensus.

When I write here, I try not to have an opinion–instead, I try to have the scientific evidence on my side. I use the hierarchy or scientific evidence to find most robust evidence, then seeing where it leads me.

Let’s be clear–science is not some magical, irrational process. In fact, science is actually a logical method or process to answer a question about the natural universe. Science is not a dogma, it constantly evolves as more data is found. Science progresses through constant analysis, criticism and accumulation of data. That does not mean I think there’s a magic piece of data that will show that we were all wrong about evolution or vaccines. But if a real scientists publishes data that shows we were wrong, and it’s confirmed by more research, then a similar mountain of evidence that overturns evolution or vaccines (or almost anything else in science) is provided, time for change.

But using rhetoric, logical fallacies, conspiracy theories, lies, misinformation, and faith in your opinions? That’s not going to change anyone’s mind, unless they have the same level of ignorance of science. If you are stating something that is not factually supported by real evidence, then your statements are your opinion and you are simply wrong. Read that again–you are wrong.

If you continue to claim that vaccines cause autism. You are wrong. If you claim that humans do not cause climate change. You are wrong. If you think that the earth is 6000 years old. You are wrong. If you think that GMOs are dangerous. You are wrong.

I can say that, because I have the scientific evidence on my side. I win on these points before a debate can even start. And that’s not my opinion based on arrogance or conceit. It’s not my belief that is dependent upon ignorance and fallacies.

My statements are based on what the scientific consensus says, and that matters more than your opinions and beliefs. You get to attack me and everyone I know about vaccines or climate change or GMOs or evolution if you have evidence that hits the topmost level of the hierarchy of scientific evidence – until then, you are wrong.

It’s important to remember that if the science doesn’t support your point of view, it means your point of view is wrong. Get over it–it’s all right for science to conquer ignorance and belief.


The TL;DR version


  1. Only high quality articles, published in peer reviewed, high impact factor scientific journals count in a discussion about science.
  2. Expertise matters, but an expert is not an expert by acclamation or a few letters after the expert’s names. It’s established by the amount and depth of respect from peers, by the quality of research performed, and by the amount and quality of evidence that supports the expert’s statements.
  3. Not all science is equal. And the best research is a systematic review which shows a clinically significant effect.
  4. In biomedical research, clinical significance of results matters more than simple statistical significance.
  5. Giving false balance or undue weight to fringe beliefs that are unsupported by the vast breadth of research in that area is inappropriate and can be ignored.
  6. If someone thinks their opinion matters more than accumulated scientific evidence, they would be wrong. They should give up.


Key citations:


The Original Skeptical Raptor
Chief Executive Officer at SkepticalRaptor
Lifetime lover of science, especially biomedical research. Spent years in academics, business development, research, and traveling the world shilling for Big Pharma. I love sports, mostly college basketball and football, hockey, and baseball. I enjoy great food and intelligent conversation. And a delicious morning coffee!