Introduction

Gun violence is a significant pubic health problem. Every day, 96 people in America die from a gun-related injury (CDC, 2018) while hundreds more are injured and live (CDC, 2019). Nearly 2/3 of these deaths are suicide (CDC, 2018) while the US gun-related homicide rate is 25 times higher than any other developed nation (Grinshteyn & Hemenway, 2016). To address this threat to the public health, we require a public health surveillance system to facilitate the prevention and control of gun-related injury and death. Unfortunately, gun data are limited because reports are not exhaustive or not reported at all. We propose the use of CaptureRecapture methods to provide public health practitioners the basic information they need to prevent gun-related violence.

In the U.S. there is no single authoritative data source to provide accurate, timely, and complete data on the incidence of gun-related violence—although the National Violent Death Reporting System has made great strides, it is limited to deaths and does not exist in every state (National Research Council, 2005). Instead, what is known about gun-related injuries comes from a patchwork of data sources such as administrative institutional data that suffer from clinical bias, undercounts, or sampling biases. There are numerous national sources of gun-related injury data including the National Crime Victimization Survey, the General Social Survey, Uniform Crime Reports, National Incident-Based Reporting System, and the Bureau of Alcohol Tobacco and Firearms. However, each source has limitations. The most current data on gun-related violence relies heavily on a single source: law enforcement reporting. However, survey, medical examiner, death certificate, police, and healthcare records may significantly undercount the incidence of gun-related violence (Gabor, 2016; Kleck, 2017a, b; Pah et al., 2017; Straus, 2017).

We designed the current pilot study to demonstrate the need and a method for indirect estimation techniques given the dearth of complete gun-related injury data (Hemenway, 1996; Kellermann et al., 1993). Data that are collected from emergency departments (EDs), police departments (PDs), medical examiners (MEs) do not include all gun-related injuries. If a person is killed by a gun and never transported to the hospital, the body will be moved to the morgue and not identified/counted by the ED electronic health record. Unintentional shootings may result in individuals presenting to the hospital without notifying the police or EMS and are missed in those respective datasets. Even though firearm-related injuries should be reported to the police, some patients seen for firearm-related injuries may remain undetected in police records.

To address these limitations, we will use multiple sources of data, match the cases between data sources and operationalize the overlap using capture–recapture (CR) rather than adding them up. We will estimate the number of gun-related injury events and then correct for dependencies between data sources using log-linear modeling. Originally developed to estimate the population size of deer and fish in wildlife biology, this powerful analytical method has been used in epidemiology and demography to estimate invisible or undercounted populations (Bohning et al., 2017; Charette & van Koppen, 2016; Feldman et al., 2017; Miller et al., 2016; Post et al., 2011; Rothman et al., 2017). Furthermore, CR was successfully employed to develop the incidence of gun-related death in Western European countries (Duquet & Alstein, 2015).

Methods

Study design

This study relies on CR and log linear modeling. CR assumes incomplete case ascertainment and requires a minimum of three independent sources to arrive at an estimate for the missing population. When only two data sources are used, dependencies between the sources can lead to over or underestimation. However, when three or more sources are included, log linear modeling can control for positive and negative dependencies (Buckland & Morgan, 2016). For example, if the police respond to a gun-related violence incident, they will call for an ambulance, hence creating a positive dependency between police and emergency medical prehospital services (ambulance) that will result in an overestimate of gun-related injuries. Conversely, a person that dies as a result of a gun-related injury will be transported to the morgue and will be enumerated in the medical examiner data and remain blind to the ED health record. The ED and ME have a negative dependency that will result in an undercount of gun-related injuries.

Study population

We restricted our study to firearm-related injuries to New Haven, Connecticut from August 1, 2013 through December 31, 2013 to determine if CR is feasible; to ascertain if CR estimates are accurate; and to identify a gold standard for gun-related injury surveillance. Using a smaller city and limited timeframe will allow us to achieve these objectives. We had to allow time to pass to clear medical examiner cases and death certificate data. Furthermore, some people with gun-related injuries are revived but often die days to months later changing a gun-related injury to a gun-related death which will be reflected in the police report, the death certificate and the medical examiner record. Finally, the necessary assumption of a stable population is met by using a smaller time frame.

In 2013, the population of New Haven was 130,748 (US Census Bureau, 2019). Of note, the demographic profile of New Haven differs from the demographic profile of each data source which demonstrates that gun-related violence disproportionally affects people based on their age, race, and gender. Racial minorities, males, and younger people are more likely to be victims of gun-related violence (See Table 1). The median age of New Haven residents was 29 years. Twenty-three percent of the population was under 18 years old. The population was 48% male and 52% female. Whites accounted for 43% of the population. African Americans accounted for 35%. Regardless of race, 27% of the population identified as Hispanic or Latino. Eighty-two percent of the population had at least a high school education. Twenty-seven percent of the population was living under the poverty line (US Census Bureau, 2019).

Table 1 Gunshot injuries by data source in New Haven, Connecticut, August 1, 2013–December 31, 2013 (n = 49 total incidents)

New Haven was chosen for two reasons. First, it is a location in which a sufficient number of gun-related injuries occur in order to perform meaningful analyses using multiple data sources (Garcia, 2011). Second, there are only three EDs in the city, and all three are administered and coordinated by the same health system as well as the emergency medical services. As a result, all gun-related injuries that came through the adult and pediatric EDs are captured in a single electronic medical record system and are transported by American Medical Services (AMS).

Gun-related injury

We included injuries due to discharge of a firearm and striking of the victim by the discharged projectile (bullet). Both fatal and non-fatal injuries were included. Intentional and unintentional injuries were included.

Data sources

We collected gun-related injury data from five independent sources: The Emergency Department (ED) at Yale-New Haven Hospital, the New Haven Police Department (NHPD), American Medical Response (AMR AKA EMS), the Connecticut Medical Examiner (ME), and media. To collect data from the news media, we conducted a systematic search of Google, Google News Archive, and local newspaper websites for results that included the term New Haven combined with the terms shooting, gun, gunfire, gunshot, or firearm. To gather data from the ED, we queried the hospital’s electronic health records for all patient encounters that included the term gun or GSW (a common abbreviation for gunshot wound) in fields for Diagnosis; Chief Complaint; Reason for Visit; Arrival Complaint; Injury Type; and Weapon/Type of Assault. From the NHPD, we requested data for all incidents classified as Shooting or Murder by firearm. AMR records included penetrating trauma cases, but excluded stabbings, animal bites, and other non-firearm/gun causes. Medical examiner data was inclusive of all homicides and suicides or ballistic penetrating trauma only. It is important to note that capture–recapture is equally effective on population estimates even when there is a known undercount as long as the cases included are true.

Data processing

We examined each gun-related injury data set to eliminate cases that did not meet the inclusion criteria in terms of geography, time period, and case definition. We combined all data sets and linked records via a manual process of matching. Heterogeneity in spelling of names across data sets was allowed as long as individual records could be unambiguously matched based on other data elements, such as date of shooting and birth, sex, address, and type of injury.

Analysis

Analyses were conducted in R version 3.1.2 (R Foundation for Statistical Computing, Vienna, Austria) using the package Rcapture (Rivest & Baillargeon, 2014). To correct for sample dependencies, the assumptions of classical capture–recapture analysis e.g. two independent sources, we applied log-linear modeling. The necessary assumptions are zero dependencies between data sources, stable population meaning the population remains unchanged during the study, and catchability, meaning every case is equally as likely to be identified (Gold et al., 2015). We modeled gun-related injury data as a closed population and selected optimal models based on Akaike information criterion (AIC), Bayesian information criterion (BIC), and boxplots of the Pearson residuals. These modeling techniques help to correct for error.

Results

The ED, ME, and PD each captured injuries that did not show up in any other source. Furthemore, each data source had a unique demographic profile (Table 1) that also differs from the varous sources. The overlap between sources can best be viewed as an area-proportional Venn diagram (Fig. 1). This figure demonstrates that there is substantial redundancy between the NHPD, AMR, and the news media. However, the ME and the ED captured different segments of the gun-related injured population. The overlap in this diagram was subsequently modeled using log-linear methods.

Fig. 1
figure 1

Venn diagram of data source overlap proportional to data

The complete data set from all sources included 46 unique firearm-related injuries. Table 1 demonstrates various demographic profiles of each data source. Two of the EDs (pediatric and adult of Yale New Haven York campus) are counted in one electronic health record. The EDs have cases not present in other sources. All data sources have higher percentages of males over females (79–89%) with EDs having more female victims. The ME data has the youngest population (Y = 24 years) with an age span of 18–36 years of age compared to 15–77 in every other source. The ED had the oldest average age of 30.3 years. The news media and AMR reported the largest percent of victims < 18 (6%). The most striking findings is that there were zero cases of Non-Hispanic whites firearm-related victims in the NHPD, News Media, AMR, and ME data whereas the two EDs reported having Latino white victims (7% and 10% respectively). There were 9 fatalities during the study period that were picked up by the NHPD, New Media, and Medical Examinder whereas the ED only had 1 fatality during the study period suggesting that bodies bypassed the ED and went straight to the morgue. Exploratory graphing for heterogeneity showed a non-linear form, indicating dependence heterogeneity between capture probabilities. This is indicative of dependencies that require correction through modeling. The closed log-linear models were then evaluated by AIC, BIC, and residual boxplots. AIC and boxplots both indicated that the model “Mth Chao” fit best (Table 2 and Fig. 2). This is one of several models available. The BIC indicated a marginally better fit for the model “Mt.” However, we selected “Mth Chao” as the best model because, in addition to having better AIC and residuals, it allowed for heterogeneity among capture occasions as well as among individuals captured. This model estimated abundance of firearm injuries at 49.7. The model selected affects the estimate, so it is critical to select the best fit model. The 95% confidence interval ranged from 49 to 52.3.

Table 2 Model results: closed log-linear models for the complete data
Fig. 2
figure 2

Boxplots of Pearson residuals of the models fit by Rcapture for the complete data set

To determine if CR is accurate during significant undercounts or with small catches, a secondary CR was conducted after randomly removing 33% of the cases in the ED, PD, and AMR data sources while completely eliminating both medical examiner and media data sources reducing the number of unique cases to < 30 gun events. The CR estimates still netted ~ 49 cases that is comparable to using five sources with double observed cases demonstrating that CR should be the gold standard for surveillance systems in identifying gun-related events.

Discussion

By matching cases from five distinct sources, we have generated a more complete firearm-related injury incidence picture than from any single source alone. This allowed us to estimate the number of additional uncounted firearm-related injuries that occurred during our study period by operationalizing the overlap using CR instead of simply matching, un-duplicating, and adding the data sources together. Adding the cases up results in 46 cases while operationalizing the overlap of cases identifies an additional 3.7 cases.

We have demonstrated that each datasource picked up different populations with varying demographic profiles, meaning single source studies are missing part of the population. Most notably, the ED picked up cases invisible to the police and medical examiner. We have reduced confidence in single source records for public health surveillance purposes. In our study, the most complete single source was the records of the NHPD, recording 43 incidents. However, similar to Kellermann et al., who found that 9% of cases lacked corresponding police reports, 6 out of our 49 cases (12%) were not included in police records (Kellermann et al., 1996). There are a variety of possible explanations for this. In fact, for several incidents that did not have police records, other sources mentioned that the police were on the scene of the gun-related injury incident. This could mean that the police did know about some of these incidents, but for one reason or another a police report was not generated. In the case of hospital records, it is also possible that some cases were not known to the police because hospital providers may have neglected to notify law enforcement on minor cases during busy patient throughput working at maximum capacity. While both ED providers and police officers are required to file police reports of gun-related injuries, in cities that have a high rate of crime, injury, and death, time to reporting may have been a limiting factor.

The main drawback of the ME data is its limits to fatal cases. The media sources seemed to mostly reflect police reports. In fact, most media sources directly attributed their information to the NHPD. Regardless, media sources provided important details such as context and relationship between victim and perpetrator regarding the gun-related incident.

Only one gun-related fatal wound was present in the ED data, despite the fact that every other data set recorded eight or nine fatal firearm injuries. This occurred despite the fact that media, AMR, and ME records mention these patients being transported to the ED for treatment before they died. These patients may have died after transferring out of the ED to the operating room.

Despite the presumption that cases are missed, CR can account for missing cases. The benefit of CR is that the missing data can be captured by operationalizing the overlap of the cases that do exist. Missing cases do not result in an undercount as the estimate is propagated on operationalizing the overlap of multiple datasets.

This preliminary study demonstrates that capture–recapture provides accurate estimates of gun-related incidence. This fact is supported by our experiment where we deleted 33% of the cases and two of the data sources and yet, received the same results as the full model. While CR is a tried and tested method to ascertain injuries as a result of gun-related violence in Europe, this pilot demonstrates that the necessary data exist at the city or state level to produce accurate estimates in the USA. Furthermore, we have demonstrated CR works equally well with fewer data sources and significantly more missing data.

In summary, our preliminary study demonstrates a surveillance method to more accurately estimate gun-related violence injury incidents using multiple sources and the CR method. The next steps are to apply CR to estimate the incidence of gun-related injury at the state level using statewide data sources. Capture–recapture is superior to existing methods for surveillance of invisible populations or difficult to count populations. Interventions aimed at reducing the public health burden of gun-related injuries should be based on surveillance using multiple sources and CR methods to maximize accuracy of incidence estimates.

Study limitations

Surveillance case definitions should be expanded to include cases where guns are used to coerce victims without being discharged. Mental health injuries e.g., post-traumatic stress can be devastating and yet evade enumeration such as victims of intimate partner violence (felonious assaults, armed robberies, carjacking, etc. Without a physical injury, the victim will remain invisible to ED, AMR, and ME records but should be included in surveillance activities and captured by the police and media records.

Because this was a small preliminary study, we did not stratify by race, ethnicity, gender, age, or type of gun. Larger samples allow for independent CR estimates per strata that may inform public health advocates about who is more at risk and which type of gun results in more injuries or death.

Healthcare implications

Without proper surveillance, etiologies remain hidden and policy may be mislaid. Capture–recapature and log-linear models are excellent surveillance techniques to inform public health prevention efforts.