This article, about America’s Frontline Doctors anonymous anti-vaccine affidavit, was written by Kelsey S Hollenback, a Ph.D. student in Systems & Information Engineering at the Center for Risk Management of Engineering Systems, Department of Engineering Systems and Environment, School of Engineering and Applied Sciences, University of Virginia.
On reviewing America’s Frontline Doctors anonymous affidavit, part of a recent lawsuit over the COVID-19 vaccines, my first and most important takeaway is that, while Jane Doe apparently wishes to assert that she has discovered excess mortality associated with COVID-19 vaccines, what she describes in her affidavit bears absolutely no resemblance to how to conduct an actual such analysis. Not even a little bit.
My second takeaway, which almost doesn’t matter given the first, is that, insofar as it’s possible to determine the methodology she used from the extremely limited description provided, that methodology is…flawed.
Flawed in what ways? It counted more deaths associated with a COVID-19 vaccine than there are total deaths recorded in VAERS. It doesn’t account for differences between the CMS patient population and the general population. Depending on what criteria Jane Doe used to query the CMS claims database, she may have pulled patients receiving any vaccine, not just the COVID-19 vaccine; she may have pulled only patients receiving Moderna, or only Pfizer, or only Johnson & Johnson; she may have failed to pull patients receiving a no-cost or government-provided vaccine; and/or she may have oversampled patients at higher risk generally for all-cause mortality.
My third and final takeaway is that, while America’s Frontline Doctors affidavit does not in any way describe either a valid method for identifying excess mortality associated with the COVID-19 vaccine, doesn’t do anything at all to establish causality, and certainly doesn’t expose any cover-up, it does possibly reveal a serious HIPAA violation.
America’s Frontline Doctors affidavit – excess mortality
Excess mortality is called excess mortality because you are looking for the number of deaths in excess of what would be expected in the usual run of things. In the usual run of things, people die. If you follow any large group of people over time, even a short period of time, some of them will die. With excess mortality, the question you want to ask is not, “How many people died?” but, “Was the number of people who died significantly higher than expected?”
To answer that, you need to know what was expected. You must first develop and validate a model for predicting the expected number of deaths in a population and then compare that expected number to the observed number of deaths. If the observed number is indeed higher than the expected number, you must then determine if that difference is statistically significant.
For example, the Centers for Medicare and Medicaid Services (CMS) tracks specific hospital mortality-related outcome measures, e.g. deaths occurring within 30 days of a hospital inpatient admission for acute myocardial infarction (heart attack). CMS has a very specific methodology for calculating these measures, developed in partnership with Yale-New Haven Health Services Corporation/Center for Outcomes Research and Evaluation, and that methodology is reviewed and updated annually.
A hierarchical generalized linear model is fitted for each outcome measure. Then, CMS calculates each hospital’s “predicted” deaths to “expected” deaths ratio, which is multiplied by the national observed mortality rate to assign hospitals risk-standardized mortality rates. The measures account for differences among hospitals by adjusting for patient-specific characteristics (e.g. demographics, comorbidities), and hospital-specific characteristics (e.g. case mix, geographic location, type of hospital). Certain patients, such as those on hospice or with terminal diagnoses, are excluded. All model parameters are clearly defined, and the relevant billing codes are given.
The reason why CMS adjusts for hospital, patient, and population characteristics is simple: it wants to ensure, as much as possible, that it is comparing apples to apples.
An obvious example would be if Hospital A and Hospital B report the same number of deaths – but Hospital A has 24 beds, and Hospital B has 600 beds. Clearly, reporting the number of deaths without adjusting for Hospital A’s much smaller number of total patients would be misleading. Or perhaps Hospital A serves a population of primarily wealthy young professionals in an area where no one is more than a five minute’s ambulance drive away; and Hospital B serves a community of mostly blue-collar and agricultural workers, with many residents living below the poverty line, who must drive an hour or more to reach Hospital B, and for whom transportation is unreliable. Or Hospital A’s wealthy patients are the ones who live an hour from the hospital, whereas Hospital B’s working-class patients have a quick ambulance ride.
In the America’s Frontline Doctors Jane Doe affidavit, there is good reason to suspect that the general U.S. population is Hospital A and the population she analyzed is Hospital B and that she failed to recognize that there was a distinction. The CMS claims database is limited to Medicare and Medicaid patients. It may have included uninsured and/or undocumented patients, whose COVID-19 vaccinations are being covered under CMS. This patient population may be expected to trend older (Medicare) and/or be from a group recognized to experience health disparities (Medicaid, uninsured, undocumented), both in general and related to COVID-19 specifically.
Nobody looks at the death rate in a retirement community, compares it to the death rate in the state, and concludes that retirement communities are a health hazard that must immediately be shut down. But according to her affidavit, Jane Doe did not conduct even that level of analysis.
There was no basis for comparison established. There was no statistical modeling to estimate an expected mortality rate. Instead, what Jane Doe states she did is as if she’d counted the number of people in a retirement community who died over a three-day period and concluded because people did die, living in a retirement community killed them.
Flawed methodology in the Jane Doe affidavit
In the America’s Frontline Doctors affidavit, Jane Doe asserts that she is a subject matter expert in healthcare data analytics, specifically in fraud detection. However, from her subsequent description, she did not employ any data mining/machine learning techniques at all. Instead, it seems that she downloaded freely and publicly available VAERS datasets, queried a CMS claims database to pull a dataset from there, possibly performed some merges, joins, and filtering, and…counted.
And she didn’t even do that well.
From the affidavit: “I verified these numbers by collating all of the data from VAERS myself, not relying on a third party to report them.”
The VAERS data is readily available for download. It’s easy to find as it’s linked on the main VAERS webpage, and the main VAERS webpage and the data download page are the top two Google hits for “VAERS COVID-19 dataset.”
The VAERS data includes a disclaimer at the top of the page clearly explaining the limitations of the dataset. There are a lot of limitations. You must agree that you read and understand the disclaimer before you can download the dataset. This disclaimer is repeated in many places throughout the VAERS website, including in the VAERS Data Use Guide, which explains the contents of the datasets. Per the VAERS Data Use Guide, “Accumulations of events reported to a passive surveillance system do not allow incidence rate calculations [emphasis mine] due to the generally unknown extent of under-reporting as well as lack of information on the number of people being vaccinated.”
I downloaded the VAERSData and VAERSVAX datasets for 2021, which are current through July 9, 2021, as per Jane Doe’s affidavit. There are more entries in the VAERSVAX dataset than in the VAERSData dataset because VAERSVAX has one entry per vaccine dose administered, e.g. if an individual received two COVID-19 vaccine doses, they appear in VAERSVAX twice; or if an individual received one influenza vaccine, one DTaP vaccine, and one pneumococcal pneumonia vaccine, they appear three times. VAERSData lists whether the patient died but doesn’t list the type of vaccine; VAERSVAX lists the type of vaccine but not whether the patient died.
Using the R statistical programming language, I examined the datasets and counted the number of unique deaths due to all vaccines for 2021, as recorded in VAERSData. There are 5,530 total reported unique deaths associated with all vaccines.
I then performed a left join by VAERS_ID to merge the VAERSData and VAERSVAX datasets and counted unique VAERS_ID entries receiving a COVID-19 vaccine who died. This produced a count of 4,836 deaths associated with the COVID-19 vaccine in the VAERS system.
Of these 4,836, there were 561 individuals who received more than one vaccine. I did not check to see if the additional vaccines were second COVID-19 vaccine doses or were non-COVID-19 vaccines. I also did not check for inconsistencies such as the date of death prior to the date of vaccination, nor did I filter for deaths occurring within three days of vaccine administration, as Jane Doe states she did.
The 9,048 number Jane Doe gives in the America’s Frontline Doctors affidavit is incorrect. It overcounts the number of unique deaths following a COVID-19 vaccine dose by 4,212 and overcounts the number of unique deaths following any vaccine dose by 3,518. I am completely at a loss to explain this.
The R script that I used to examine the VAERS datasets is available at the end of this post, in case others are interested in repeating these results or refining the code, perhaps to check for non-COVID-19 vaccine administered with a COVID-19 vaccine or to remove entries with suspect dates, e.g. where the date of death precedes the date of vaccine administration. his was a quick back-of-the-envelope bit of code, and I welcome anyone who would like to build or improve upon it.
From the affidavit: “I queried data from CMS medical claims with regard to vaccines and patient deaths.”
It is not possible to evaluate Jane Doe’s statements re: the dataset she pulled from the CMS claims database without knowing the exact query syntax she used. There are several reasons why the query may have produced inaccurate results depending on the criteria.
ICD-10. If providers vaccinated patients for COVID-19, and the vaccine was provided at no cost, then the provider will only submit a claim for the administration, ICD-10 code Z23 (“Encounter for immunization”). This code does not specify the type of vaccine. The ICD-10 code would therefore have to be combined with some other criteria to accurately identify patients receiving the COVID-19 vaccine and not, say, the DTaP vaccine.
CPT. There are three CPT codes for COVID-19 vaccines: one for Moderna, one for Pfizer, and one for Johnson & Johnson. A CPT code is only used if the vaccine was not provided at no cost. The exception is if the vaccine was administered in the home setting, in which case CMS instructs providers to always include the CPT code.
Because of the patient criteria for administering the vaccine in the home, these patients may be reasonably expected to have a higher mortality rate generally. A query using CPT code alone would underestimate the total number of COVID-19 vaccines administered (possibly by a lot, if you didn’t include all three codes), as the majority of doses were supplied by the government at no cost, and would disproportionately sample these home patients.
MATCHING CLAIMS. If the death occurred during an inpatient hospital stay, it is unlikely that the patient received the COVID-19 vaccine during that same stay. Standard practice is to vaccinate just prior to discharge, and only if the patient is clinically stable. (Obvious exceptions are vaccines that are a part of the treatment, such as tetanus vaccine to a trauma patient, or rabies vaccine to a patient with an animal bite.). Here is the CDC guidance on this specific to the COVID-19 vaccine, and here is a COVID-19 vaccination nursing workflow from UCSF. As you can see, both have the vaccination occurring at discharge.
So, to identify a patient who received a COVID-19 vaccine outpatient and then died inpatient, Jane Doe would have had to match the outpatient claim with the inpatient claim. Charges for outpatient and inpatient services do not appear on the same claim. To verify that this was done correctly, exactly how Jane Doe performed this matching would have to be described, and it’s likely that it would have had to be done by a unique personal identifier.
That brings me to my third and final takeaway.
Protected health information (PHI) is personal and private, and it belongs to the patient. When I worked in the hospital, I could access it to perform my assigned job functions because the patient had given informed consent for that purpose. If I had wanted to use PHI for research, then either I would have had to be granted permission and the data would have had to be scrubbed of all unique personal identifiers before I could look at it, or my research proposal would have had to be reviewed and approved by my organization’s Institutional Review Board (IRB). The IRB may have determined that the research I wanted to do required further informed consent from the patients. It may have required a formal review and informed consent even if the data was anonymized.
It would not have mattered how noble my purpose, nor how pure my intentions were. Under no circumstances would I have been allowed to just dive in. There are ethical and legal standards that must be met. Patients have a right to know what is being done with their PHI, and they have the right to refuse to allow their PHI to be used.
Most claims datasets are scrubbed prior to being made available to researchers. While I am not familiar with the exact contents of a Medicare/Medicaid claim, unique personal identifiers that may appear on an unscrubbed health insurance claim include full name, date of birth, address, Social Security number, and a patient identifier such as a Medicare Beneficiary Identifier; healthcare provider, the location where medical services were received, and the date of service; and codes for medications administered, tests and procedures done, symptoms experienced, and illnesses diagnosed. A claim can show that I had sores in places you don’t want sores and blood drawn for STD screening. It can show tests that reveal that my provider is checking for something specific. It can show that I’m sick. It can show that I’m very sick.
This is potentially a serious HIPAA breach. Access to PHI is limited to the minimum necessary to fulfill assigned job duties. When I access electronic health record systems, a HIPAA disclaimer appears on sign-in, and I must click to acknowledge that I have read and understood it before the system allows me to log on.
To maintain my access, I am required to complete HIPAA training every year. I am also absolutely sure that CMS databases that store PHI is for authorized use only. I click on a disclaimer to that effect every time I access a government database, and I have to complete security training at least once a year for those.
Under HIPAA, organizations are required to notify the Department of Health and Human Services, affected patients, and, in certain circumstances, the media for breaches involving more than 500 individuals, and this notification must take place within 60 days of discovering the breach.
To notify the affected patients, CMS will have to be informed as to whose records Jane Doe accessed. Since Jane Doe claims that she identified approximately 45,000 patients who died within three days of receiving a COVID-19 vaccine, it’s reasonable to assume she pulled at least 45,000 patients’ records.
CMS also needs to know on what computer Jane Doe conducted her analysis and where she stored the dataset that she pulled from its claims database. If either or both systems fail to meet HITECH standards for data security, then CMS will have to seriously consider that another party may have been able to access that PHI. In any case, affected patients must be notified, and they must be advised to take steps to protect themselves against identity theft. Again, it is their personal, private information. They have a right to know.
Organizations that experience a HIPAA breach are required to submit a plan for preventing future breaches. They are then monitored for compliance with that plan. I cannot imagine any acceptable plan in this case that does not include rescinding Jane Doe’s access.
VAERSData <- read_csv(“/filepath/2021VAERSData.csv”, show_col_types = FALSE)
VAERSVax <- read_csv(“/filepath/2021VAERSVAX.csv”, show_col_types = FALSE)
count(VAERSData, DIED == “Y”)
# count(VAERSData, DIED != “NA”) # check that Y or NA only
filter(n > 1) # check for unique entries
filter(n > 1) # check for unique entries
VAERS <- left_join(VAERSVax, VAERSData, by = “VAERS_ID”)
VAERSEx <- filter(VAERS, DIED == “Y”)
filter(n > 1) # check for unique entries
filter(VAX_TYPE == “COVID19”) %>%
filter(n == 1)