Skeptical Raptor's Blog hunting pseudoscience in the internet jungle

How to evaluate the quality of scientific research

einstein-science-false-balanceOne of the most tiresome discussions that a scientific skeptic has when debunking and refuting pseudoscience or junk science (slightly different variations of the same theme) is what constitutes real evidence. You’d think that would be easy, “scientific evidence” should be the gold standard, but really, there is a range of evidence from garbage to convincing. So this is my guide to amateur (and if I do a good job, professional) scientific evidence. This is a major update of my original article on this topic, with less emphasis on Wikipedia, and more detail about scientific authority and hierarchy of evidence.

In today’s world of instant news, with memes and 140 character analyses flying across social media pretending to present knowledge in a manner that makes it appear authoritative. Even detailed, 2000 word articles that I write are often considered to be too long, and people only read the title or the concluding paragraph. This happens all the time in the amateur science circles specifically. For example, many people only read the abstract and, even there, only the conclusion of the abstract for scientific articles.

THE most popular article I ever wrote on this blog was one that thoroughly refuted a crazy meme that bananas kill cancer (which is an update of the original banana article). Hysterically, it was based on a complete misunderstanding of a study by “Japanese scientists” (the lead author was, in fact, an American, but whenever you see something that tries to claim authority by using unnamed, but smart sounding, scientists, be wary). Moreover, the conclusion made by the meme-author was based on ignorance about why a growth factor called “tumor necrosis factor” (TNF) that doesn’t do what it sounds like it does, about how a banana would never produce TNF, about how TNF would be broken down in the digestive system, about how it couldn’t be transported from the digestive system to the blood, and about how if you could eat enough bananas, if they did contain TNF, to have a biological effect, it would have to be more bananas than you could possibly eat, and the TNF effects would make you really sick. The banana meme did not have one single accurate assumption. None.

But still, it’s a popular belief. Just go to Facebook, and you’ll find someone promoting it. Like all anti-science memes, it’s a zombie, it reanimates from the dead and spreads its pseudoscience every few months, and I get thousands of hits from people trying to confirm the meme. Well, that’s actually good. I did the hard work of digging into the article and trying to figure out if this whole TNF thing was real.

So, how does one look at a claim skeptically. How does one critique a meme or claim that makes no sense. Maybe there’s a Facebook page that claims marijuana cures cancer or that vaccines kill children. Or another one that says take 14 vitamin C tablets (only $20 for a bottle by clicking on the link) and the flu will disappear. What’s the process to confirm or debunk a claim? Is it hard?

Wikipedia is one place which can either be an outstanding resource for science or medicine, or it can just a horrible mess with citations that link to pseudoscience, and junk medicine pushers. For example, Wikipedia’s article on Alzheimer’s disease is probably one of the best medical articles in the “encyclopedia”. It is laid out in a logical manner, with an excellent summary, a discussion of causes, pathophysiology, mechanisms, treatments, and other issues. It may not be at the level of a medical review meant for a medical student or researcher, but it would be a very good start for a scientifically inclined college researcher or someone who had a family who was afflicted with the disease.

Nearly everything in the article is supported by a recent peer-reviewed journal article. Furthermore, the article does its best to avoid primary sources (ones in which the authors directly participated in the research or documented their personal experiences) while utilizing secondary sources (which summarizes one or more primary or secondary sources, usually to provide an overview of the current understanding of a medical topic, to make recommendations, or to combine the results of several studies). The reason secondary sources are so valuable is that they combine the works of several authors (and presumably locations), eliminating biases of one laboratory or one study. Secondary sources also include repetition of experiments that support or refute a hypothesis. As I’ve said many times, trust your secondary sources over just about anything.

Another method to cut down on research time is to read better blogs on science and medicine. Since I focus almost exclusively on medicine, I tend to go to Science Based Medicine, written by a group of scientist/physicians who use the perfect combination of snark, skepticism, and scientific evidence to expose bad science that pervades medicine and popular beliefs these days. But it is not a primary source or even secondary source, it’s not peer reviewed, and you need to trust the authors. I do, but then again, I’ve interacted with them enough to know that they generally don’t push bad science.

So let’s assume you, the reader, want to do this. Let’s go through a logical process of how to evaluate good scientific evidence.

Impact factor and peer review

One of the better ways to ascertain the quality of published research is to look at the quality of the journal. Articles in high quality journals are cited more often, because those journals attract the best scientific articles (which are cited more). Yes, it’s a self-fulfilling system, but that’s not all bad. Some of the most prestigious journals have been around for a century or more, and their reputation is deserved.

Obviously, the best articles are sent to these journals partially because of the prestige of the journal, but also because the peer review is so thorough. Journals use a metric called “impact factor” that essentially states how many times an average article is cited by other articles in an index (in this case for all medical journals). The impact factor could range from 0 (no one ever cites it) to some huge number, but the largest is in the 50-70 range. One of the highest impact factor journals is the Annual Review of Immunology, which is traditionally in the 50’s. This means that the average article in the Annual Review of Immunology is cited over 50 times by other articles. This becomes a fairly good indicator of the importance of articles published in the journal, because other articles in other journals need to refer to it in advancing the science.

Impact factor does have its weaknesses. For example, a highly esoteric, but high quality journal, may have a moderate Impact Factor, but it still might still be a prestigious and extraordinary journal. Also a new journal might have an artificially low impact factor, but still be high quality, as more scientists read the article and then cite it. In this case, watching steady growth in the Impact Factor might be a good indication of it’s quality. Some journals also have a relatively low Impact Factor, but still are cited 10’s of thousands of times every year. This might be an indication of its quality.

Almost every journal provides its impact factor (unless they have reasons to not do so, like the number is so low as to be laughable) in the masthead or “about” page on the web. Just Google “XYZ journal impact factor” to find the impact factor, but be careful about the journal name, since the Journal of XYZ may be completely different than the XYZ Journal. It happens a lot.

As an independent, objective method to judge the quality of published research, Impact Factor is one of the best available. Frankly, I consider anything over 10 to be high quality, especially in very specialized fields of research in the biomedical sciences. Anything between 5-10 is still high quality, especially if the journal publishes a lot of articles that are widely cited. One of the problems with Impact Factor is that it is an average, so many journals publish popular articles that get hundreds of cites, while also publishing very esoteric and focused articles that don’t get cited often.

Global-warming-second-opinion

Journals with impact factors of 2-5 can be high quality, especially if its new, focuses in a very specialized field of science. Anything less than 2 probably requires that your review of particular question in science needs to be more broadly based. As science comes to a consensus on an idea, hypothesis or theory, it gets published more and more often in higher and higher Impact Factor journals. Researching various pseudoscience claims, I run across the Cherry Picking of research by those who are trying to support their claims with “real science.” Unfortunately, they almost always utilize articles from journals with impact factors of less than 1-2, because higher impact journals reject the research. Of course, this leads to the laughable conspiracy theories claiming that the higher prestige journals block innovative science because, you know, conspiracy. The fact is that prestigious journals are prestigious sometimes because they do present new thinking and innovation–they just want it to be high quality.

Impact factor can also be one of the indicators of the quality of peer-review (an assumed standard for any prestigious journal). Better and stronger peer-review, the process whereby a submitted article is reviewed by several other researchers, all done anonymously to remove that bias, usually means better and stronger papers are published in the journal. There is a large range in quality of peer review from simply checking for spelling (seriously, for low impact factor, predatory publishers, this is their standard) to repeating statistics, checking citations, reviewing the grammar and spelling (yes, it’s necessary), and critiquing the methods. The better journals, generally have tougher standards of peer review with better reviewers. It’s not a perfect method of sorting the bad science from the good, but it does block out outrageous claims that lack evidence.

Authorities or experts

The quality of research can often be confirmed (or repudiated I suppose) based on the credentials and backgrounds of the scientists involved in the research presented. In an outstanding account in the blog, Science or Not, the writer gives us a checklist of what makes a good expert, or authority, in published research. Take the time to read his (or her) article, but I’ll condense it here.

A real expert is one that:

  • has superior factual knowledge in the field and exceptional understanding of how to apply the facts to produce concrete results.
  • has a better ability than her peers to differentiate between similar, but not identical cases.
  • shows consistency in judgements that is better than her peers.
  • has been identified by her peers in the field as an expert.

Yes, someone could be an expert and not meet these standards, because they are just starting in research. But even if they’re new, and they have startling new information, other researchers will talk about, mention, criticize, and even compliment them. But also being an expert requires experience and knowledge, something gained over time.

Science or Not also provided a “checklist” of Red Flags or warning signs of a non-expert:

So there you are, a couple of lists that you can use to check-off who has outstanding research and who doesn’t. This is difficult and is subject to some amount of “judgement calls.” On PubMed, you can click on the author’s name, and get a listing of all articles written by them (just be warned that those with common names, like Dr. John Smith, may give some bad results). You can determine if their articles are in high quality journals, you can see if they’re cited by other similar articles, and you can determine if their current research matches what they’ve done for awhile. Neil deGrasse Tyson is a world renowned expert on science. But if he suddenly started published papers in immunology, I’m pretty certain I’d not be impressed especially if it wandered from the mainstream of immunology.

Hierarchy of evidence

I have covered the first two steps of evaluating the quality of scientific research. First, the quality of journal where it’s published. Second, the quality of expert who is presenting the evidence. Now we move to the third, what evidence is the best.

false-balance-research-beliefI’m going to focus on medical research, but this list can be used to objectively rank the quality of any scientific evidence. News sites and many blogs utilize a “false balance” in presenting scientific information. They give us a debate about the science, when there isn’t a debate. So in this false debate, one side will use scientific evidence, often the consensus of nearly 100% of the scientists in the field, but the other “side” of the debate will contribute evidence of the lowest quality. In a real debate, let’s say over climate change, a proper balance would be one side with 1000 scientists each with a room full of evidence in the form of papers, books, and data, and the other side with 1 person (probably not an expert as defined above) with petitions, polls on what people believe about global warming, and papers published in junk science journals of low impact quality.

Of course, news services (and many many blogs) just don’t have the time or space to do it. And if you’re pushing an agenda about global warming or GMO dangers, then it makes for better TV to have one real scientist “debate” one pseudoscientist, with the news host declaring “the debate is still ongoing. Next up, have we found sasquatch?”

Recently, the BBC stated that it “wishes to emphasise the importance of attempting to establish where the weight of scientific agreement may be found and make that clear to audiences.” They want to give due weight to the scientific consensus and move away from perpetrating the belief that being impartial means giving equal credence to both “sides” of an issue. One win for science.

Therefore, what are the highest quality types of evidence? In order of best to worst,

  1. Secondary reviews published in peer-reviewed, high-impact journals. These secondary research articles include meta-reviews, review articles, and Cochrane Collaboration reviews. These studies essentially roll up the data from possibly dozens of other research articles, while attempting to remove obviously poor quality research or biased data. They are mostly useful for examining numerous randomized clinical trials, providing the reader with a higher number of data points, usually with better statistical analysis. These are almost always published in high impact factor peer-reviewed journals. Occasionally, low quality meta-reviews, especially those that cherry pick only primary research that supports the hypothesis of the author, are published in low quality journals (and infrequently, in major journals), but are of little use. I once held a position that Cochrane was the gold, if not platinum, standard for systematic reviews, but I’ve noticed a lot of bias slipping in their research, relying upon authors who have axes to grind about their personal beliefs. Systematic reviews require a fine hand and strong statistical skills. And there are plenty of journals that publish systematic reviews who are not Cochrane. Thus, secondary systematic reviews are the pinnacle of great science, but that does not mean that they get to skate by without a critique.
  2. High quality randomized controlled trials with definitive results. These are studies that include fairly large numbers (in general, I like to see >1,000 subjects in each arm), with confidence intervals (errors) that do not overlap, and show a clinically significant effect. And published in high impact journals.
  3. Randomized controlled trials with non-definitive results. These are small trials, usually with large confidence intervals which only suggests a clinically significant effect. At some point, these type of studies border on the level observational studies that need to be confirmed (or refuted) by larger studies.
  4. Cohort studies (retrospective studies). These are an analysis of risk factors and follows a group of people who do not have the disease, and uses correlations to determine the absolute risk of the subjects contracting a disease. A cohort study is often undertaken to obtain evidence to try to refute the existence of a suspected association between cause and effect. Because it is inappropriate to perform a double-blind randomized controlled trial with vaccines (it would be unethical to withhold a lifesaving vaccines from the placebo group, even if they volunteered to be in that group, because that would break the randomization), cohort studies (and sometimes, case-control studies mentioned next), usually with tens of thousands up to millions of subjects, can provide definitive data refuting a cause and effect. For example, vaccines do not cause autism.
  5. Case-control studies. This is a type of analytical study which compares individuals who have a specific disease (“cases”) with a group of individuals without the disease (“controls”). The proportion of each group having a history of a particular exposure or characteristic of interest is then compared. An association between the hypothesized exposure and the disease being studied will be reflected in a greater proportion of the cases being exposed
  6. Cross sectional surveys. Essentially, these are surveys where the research observes one single point of data in a large population. It provides little information about correlation or causality, but is used to determine what proportion of an observed population may have a disease at a single
  7. Case reports. I dislike these types of articles. They often show up in high quality medical journals, but they are usually observations of one or two patients. Those in medicine know their purpose, which is to give a “heads up” to an observation. They have no scientific validity beyond observational. Unfortunately, science deniers will pull a case report published, and use it to condemn a whole field of medicine. Don’t use them to support your argument, one way or another.
  8. Animal or cell culture studies. This is basic primary research which may or may not be applicable to human biomedical science. Too much credence is given to studies that may be 20 years away from having any applicability, if any at all. Someone publishes that cannabis “cures” breast cancer in mice, and that is used to support the smoking of pot to “cure cancer”. Except there’s no evidence of a significant human clinical effect. There is an old joke in medical research circles: science has cured cancer in mice for decades, so the value of these types of studies is limited, until a significant clinical effect can be shown in real studies. Furthermore, the same standards of evaluation must be applied to animal or cell culture studies that are used for clinical studies–real experts who publish in peer reviewed, high impact factor journal. There is a tendency to over-dramatize results from animal studies, when only a small percentage of compounds that are tested in animals or cell cultures ever make it into human clinical trials (let alone are approved by the FDA). The National Cancer Institute has screened over 400,000 compounds for treating cancer, and maybe 20-30,000 have even made it to early clinical trials, and of those just a handful are used in modern medicine. You have to be extremely skeptical of reading an article that has source to a press release that might overstate the results (or even if they refer directly to such a primary study).
  9. Meeting abstract or poster presentation. These are presentations made at meetings/conventions of a scientific society. There are hundreds made every year, and preliminary research is presented in poster sessions or formal presentations. Usually, there is a lot of question and answers, that of course doesn’t show up in a link to the abstract or poster, that can help explain data. Moreover, these types of presentations vary from nearly ready for publication to pure speculation. They have not been peer-reviewed (although peer review can unintentionally happen through tough questions). They are not published formally. And they often do not contain enough explanation of methods, data, analysis, and other issues to evaluate properly. I would not consider this type of information as anything but “observational” in a scientific sense. In an article in JAMA, the authors found that within 3 years after abstracts were presented at meetings, 50% were published in high-impact journals, 25% in low-impact journals, and 25% remained unpublished. Interestingly, the 39 abstracts that received front-page coverage in newspapers had a publication rate almost identical to the overall publication rate. The authors concluded that, “abstracts at scientific meetings receive substantial attention in the high-profile media. A substantial number of the studies remain unpublished, precluding evaluation in the scientific community.” In other words, until the research is published in peer reviewed journals, it’s hard to evaluate their quality and importance.
  10. Press releases or news reports. Do not accept the conclusions stated by a press release from a major research university, they haven’t been peer reviewed. I have a feed filled with press releases from major research universities, and I’ve found errors in interpretation from the university’s public relations office relative to the real research. However, it is possible to use a press release to chase down the actual published, and peer reviewed study. So, it’s not completely worthless.
  11. Natural News or Mike Adams, the Health Ranger. Let me make this clear, Natural News is a foul, fetid, putrid sewer of information, which befouls any real science with lies. The antisemitic, anti-science, anti-knowledge website, Whale.to, similarly reeks of malodorous smells of pure garbage. Anyone who uses either site as a source for anything in medicine loses the discussion without any further debate. The pseudoscience pushers who claim that they’ve done extensive “research” into a subject, and quote either of these websites are beneath contempt, and supports the hypothesis that those who support anti-science are intellectually lazy charlatans.

In the false debate (but let’s make it happen anyways), one must give weight to not only more evidence, but critically to the quality of evidence. If I go into such a debate with 5 or 10 published systematic reviews that all revolve around a scientific consensus, and the other “side” has poorly done studies or, and this happens a lot, quotes from Natural News, it’s no longer a debate. It is simply a shutout (to use a sports metaphor), with the other side barely able to walk on the same field.

There is also something that must be highlighted with regards to published research, and it is one I mentioned several times above. Results that don’t show a significant clinical difference is not worth much. If you take vitamin XYZ and it, on average, reduces the length of a cold by 10 minutes, how important is that result? It’s only marginally better than doing nothing and just waiting out the cold by binging Mad Men episodes on Netflix.

David Gorski, editor-in-snark at Science Based Medicine, made some important points about clinical significance:

In human studies, the problem appears to be different. There’s another saying in medicine that statistical significance doesn’t necessarily mean that a finding will be clinically significant. In other words, we find small differences in treatment effect or associations between various biomarkers and various diseases that are statistically significant all the time. However, they are often too small to be clinically significant. Is, for example, an allele whose presence means a risk of a certain condition that is increased by 5% clinically significant? It might be if the risk in the population is less than 5%, but if the risk in the population is 50%, much less so. We ask this question all the time in oncology when considering whether or not a “positive” finding in a clinical trial of adjuvant chemotherapy is clinically relevant. For example, if chemotherapy increases the five year survival by 2% in a tumor that has a high likelihood of survival after surgery clinically relevant? Or is an elevated lab value that is associated with a 5% increase in the risk of a condition clinically relevant? Yes, it’s a bit of a value judgment, but small benefits that are statistically significant aren’t always clinically relevant.

Now, as Gorski states, this is a bit of guesswork and instinct, but none of us should probably accept results as meaningful if they are tiny, even if statistically significant. I see this a lot, especially with alternative medicine studies that try to “prove” that they have some benefit beyond placebo. But if there’s a new therapy, or “eat XYZ and it will reduce ABC by 5%”, those results may just be no different than random.

I realize it’s really hard work to decide what is and isn’t good science. Maybe that’s why Americans think that there is a debate about global warming, or vaccines, or evolution. Sometimes you have to trust real experts. And quantity and quality of those experts matter. But it is important to remember that not all science is equal. Cherrypicking research that confirms your bias–that’s not science. Giving equal weight to everything that is written–that’s not science. Thinking that a researcher is an expert because they have a degree or qualifications in another area–that’s not science.

So, after much study, over 30 years, reading literally thousands of research articles, thousands of books, and spending years of my life in biomedical laboratories, I’ll lay it out to you. Evolution is a fact. Human caused global warming is a fact. The outstanding safety and incredible effectiveness of vaccines are a fact. GMO’s are safe for humans, animals and the environment are facts. And these aren’t up for debate, because there is no debate, unless someone thinks that undue weight, given to bad science, is considered important evidence.

I have the science on my side, the highest possible quality and amount. Science is not some magical, irrational process, it’s actually a logical method to answer a question. Because I have the science, I win on these points before a debate can even start. And that’s not my opinion based on arrogance or conceit. It’s not my belief that relies on ignorance and fallacies. My statements are based on what the scientific consensus says, and that matters more than your opinions and beliefs, unless you’ve read this article and bring me high quality evidence that gives due weight to your opinions.

It’s important to remember that just because the science doesn’t support your point of view doesn’t mean it’s wrong. It means your point of view is wrong. Get over it–it’s all right for science to conquer ignorance and belief.

The TL;DR version:

  1. Only high quality articles, published in peer reviewed, high impact factor scientific journals count in a discussion about science.
  2. Expertise matters, and it’s not self-proclaimed nor is it established by a few letters after your name. It’s established by the amount and depth of respect from peers.
  3. Not all science is equal. And the best research is a systematic review which shows a clinically significant effect.
  4. Giving false balance or undue weight to fringe beliefs that are unsupported by the vast breadth of research in that area is inappropriate and can be ignored.

Key citations:

 

Comments (19)
Powered by WordPress 4.0
Don't forget to subscribe to this blog through any the services in the right sidebar.