Ed Yong, a scientist and contributor to Discover Magazine, wrote an blog post, What does it mean to say that something causes 16% of cancers?, discussing a news report that stated that 16% of cancers around the world were caused by infections. Here are some excerpts:
A few days ago, news reports claimed that 16 per cent of cancers around the world were caused by infections. This isn’t an especially new or controversial statement, as there’s clear evidence that some viruses, bacteria and parasites can cause cancer (think HPV, which we now have a vaccine against). It’s not inaccurate either. The paper that triggered the reports did indeed conclude that “of the 12.7 million new cancer cases that occurred in 2008, the population attributable fraction (PAF) for infectious agents was 16·1%”.
But for me, the reports aggravated an old itch. I used to work at a cancer charity. We used to get frequent requests we got for such numbers (e.g. how many cancers are caused by tobacco?). However, whenever such reports actually came out, we got a lot confused questions and comments. The problem is that many (most?) people have no idea what it actually means to say that X% of cancers are caused by something, where those numbers come from, or how they should be used.
Formally, these numbers – the population attributable fractions (PAFs) – represent the proportion of cases of a disease that could be avoided if something linked to the disease (a risk factor) was avoided. So, in this case, we’re saying that if no one caught HPV or any other cancer-causing infection, then 16.1% of cancers would never happen. That’s around 2 million cases attributable to these causes. From answering enquiries and talking to people, I reckon that your average reader believes that we get these numbers because keen scientists examined lots of medical records, and did actual tallies. We used to get questions like “How do you know they didn’t get cancer because of something else?” and “What, did they actually count the people who got cancer because of [insert risk factor here]?”
No, they didn’t. Those numbers are not counts.
Those 2 million cases don’t correspond to actual specific people. I can’t tell you their names. Instead, PAFs are the results of statistical models that mash together a lot of data from previous studies, along with many assumptions. At a basic level, the models need a handful of ingredients. You need to know how common the risk factor is – so, for example, what proportion of cancer patients carry the relevant infections? You need to know how big the effect is – if someone is infected, their risk of cancer goes up by how many times? If you have these two figures, you can calculate a PAF as a percentage. If you also know the incidence of a cancer in a certain population during a certain year, you can convert that percentage into a number of cases.
There’s always a certain degree of subjectivity. Consider the size of the effect – different studies will produce different estimates, and the value you choose to put into the model has a big influence on the numbers that come out. And people who do these analyses will typically draw their data from dozens if not hundreds of sources.
In the infection example, some sources are studies that compare cancer rates among people with or without the infections. Others measure proteins or antibodies in blood samples to see who is infected. Some are international registries of varying quality. The new infection paper alone combines data from over 50 papers and sources, and some of these are themselves analyses of many earlier papers. Bung these all into one statistical pot, simmer gently with assumptions and educated guesses, and voila – you have your numbers.
This is not to say that these methods aren’t sound (they are) or that these analyses aren’t valuable (they can tell public health workers about the scale of different challenges). But it’s important to understand what’s actually been done, because it shows us why PAFs can be so easily misconstrued.
This is another case of what constitutes good scientific research versus what needs to be reviewed more carefully. Someone will quote this “16%” as if it’s a fact, yet if you analyze the data, we’re not sure what it says. At a general level, we can say that infections do cause “cancer”, but what kind? How? What types of infections cause what kind of cancer?
Always analyze any article to determine it’s validity. Sometimes, it’s hard, but you don’t want to walk around thinking that “if I never get an infection, I’ll reduce my chances of getting cancer by 16%.” Well, not getting an infection is impossible, and you’d be ignoring all the other environmental and biological causes of “cancer”. You could stay out of the sun, never smoke, drink water from a pristine glacier in the Sierras, and never get an infection–but your house is filled with radon gas. Oops.