Introduction

Each year, 5 million people die from injuries worldwide [1]. Low- and middle-income countries (LMICs) are disproportionately affected by the burden of trauma. More than 90% of deaths due to injury occur in these countries [2]. India faces a high injury burden with injuries causing around 11% of the country’s total deaths [3]. The World Health Organization (WHO) acknowledges that to prevent injury and improve care, evidence-based strategies should be country and environment specific. This requires the existence of comprehensive injury data on the many aspects of trauma patient care [4].

Trauma registries document presentation, care, and outcomes of injured patients [5]. They have the potential to drive change on many levels—from pre-hospital care to specific hospital trauma care quality improvement to government legislation and resource allocation [5]. The valid use of a trauma registry relies on the quality of its data [6]. Missing observations can lead to biases in analysis and interpretation of information. However, the overall completeness of variables in trauma registries has not been well described and no standardized benchmark of acceptable levels of missing data exist [7,8,9]. If data are missing, the validity of research can become uncertain—especially as most manuscripts based on trauma registry data do not appropriately report and manage incomplete observations [10, 11]. Hence, good quality data recording in registries is critical.

The collection and recording of first in-hospital physiological variables [heart rate (HR), systolic blood pressure (SBP), respiratory rate (RR), and Glasgow Coma Scale (GCS)] are particularly important to guide trauma resuscitation and outcomes. Between them, at least one of these first in-hospital physiological variables is a component of almost all physiological scores developed to adjust for injury severity for trauma care benchmarking and quality improvement (e.g. Revised Trauma Score, Kampala Trauma Score) [12, 13]. Yet physiological observations are at a high risk of being missing in trauma registries [14]. Furthermore, death in hospital has been found to be an independent predictor of missing physiological variables in trauma registries, implying more severely injured patients have less complete data [14]. This has substantial implications for studies assessing mortality as an outcome measure as severely injured patients (those at highest risk of death) may be excluded from analysis.

The aim of this study was to assess the prevalence of missing data in a new multihospital Indian trauma registry, the Australia-India Trauma Systems Collaboration Trauma Registry. The study examined whether any variables—especially hospital mortality—were associated with missing first in-hospital physiological data. By identifying predictors of missing these data points, this study aimed to inform interventions to improve data completeness in the registry.

Methodology

Setting

The Australia-India Trauma Systems Collaboration (AITSC) is a partnership between Australia and India, led by the National Trauma Research Institute (NTRI), a department of Monash University and Alfred Health, and the Jai Prakash Narayan Apex Trauma Center, All India Institute of Medical Sciences (AIIMS), Delhi [15]. The AITSC was funded by the Indian Government (through the Department of Science and Technology) and the Australian Government (through the Department of Industry, Innovation, and Science). Various projects have been underway across four Indian trauma hospitals to develop India’s trauma systems, spanning from pre-hospital to post-discharge care [16, 17]. Underpinning and informing projects on the study of clinical care systems was the development of a trauma registry, established in 2016 [15]. This study of registry quality was undertaken prior to the commencement of the other interventions, notably the AITSC pre-hospital notification study [15].

The AITSC trauma registry is a registry developed to collect trauma data, laying the foundation for trauma registries in India. In its first year, the AITSC registry involved four major Indian trauma hospitals—the Jai Prakash Narayan Apex Trauma Center (JPN), All India Institute of Medical Sciences (AIIMS), New Delhi; the Lokmanya Tilak Municipal General (LTMG) Hospital, Mumbai; the Sheth Vadilal Sarabhai (VS) General Hospital, Ahmadabad; and the Guru Teg Bahadur (GTB) Hospital, New Delhi. Data were collected on all trauma patients who presented with a potentially life-threatening or limb-threatening injury. The hospital of presentation was presented in a non-identifiable format for this study.

At each study site, two trained data collectors certified to code injury severity using the Abbreviated Injury Scale recorded patient data. This occurred either through direct observation of healthcare staff and patients, or by extraction from paper medical records (e.g. for patients arriving out of shift hours), as per the AITSC Trauma Registry Data Dictionary (version 1.02, March 2016). Shifts of data collectors were from 9 am to 7 pm, Monday–Saturday. Data were recorded in prepared data sheets, using codes to identify observations/values. When the data collectors were unable to find or observe the data, or when data were inadequately described, they coded it as such.

Data cleaning involved the project officers reviewing the data retrospectively and correcting any obvious mistakes (e.g. inconsistency between admission and discharge dates) or following up on blank observations with data collectors to the best of their ability.

Patient selection and data extraction

Data for all adults (age ≥ 18 years) with a potentially life-threatening or limb-threatening injury entered into the AITSC trauma registry from inception (19 April 2016) until 30 April 2017 were extracted. Variables (59 in total) were grouped as data on hospital arrival, demographics, injury occurrence, emergency department (ED) measures, initial investigations (Ix) and procedures, and hospital stay and outcomes. Pre-hospital variables were excluded from this study as the AITSC pre-hospital notification study had not commenced.

Statistical methods

Completeness levels

The level of completeness (proportion, expressed as percentages) for each variable was quantified. ‘Missing data’ included ‘non-valid’ missing data (variables which were left blank) and ‘valid’ missing data (variables coded as ‘missing’ or ‘unknown’). Free text variables (e.g. place of residence) were determined to be missing if left blank or described as ‘unknown’ or ‘not recorded’.

The presence of observations for certain variables was conditional on specific interventions. In such cases, the number of observations missing, given that event occurred, was quantified—e.g. the proportion of missing data on the time a CT scan occurred was based on data showing that the scan had occurred.

Association between in-hospital mortality and other variables with missing physiological data

Descriptive analyses were conducted on variables based on patient mortality. Variables regarding hospital interventions (chest X-ray, CT scan, neurosurgical consult, blood transfusion, mechanical ventilation, operations) were converted into binary variables representing whether they occurred or not. Arrival date was converted into day of week. Arrival time was classified into binary ‘in-hours’ (07:00–18:00) and ‘out of hours’ (18:00–07:00) as per previous studies [18]. Median and inter-quartile ranges were determined for days on mechanical ventilation, length of stay (LOS) in ICU and LOS in hospital. Free text variables and conditional variables were excluded from this and subsequent analyses.

To represent missing first in-hospital (i.e. in the ED) physiological data, a new binary variable ‘miss_phys’ was created; 1 represented if any of SBP, HR, RR, or GCS were missing for a given case, while 0 represented none of these observations being missing for a given case. Descriptive and univariable logistic regression analyses were conducted to see whether in-hospital mortality or other factors were associated with missing first-recorded physiological data. Results were reported using unadjusted odds ratios (95% confidence intervals).

Multivariable logistic regression was used to investigate the independent association between mortality and missing physiological data, controlling for covariates, and to assess any other predictors of such missing physiological data. Variables that demonstrated an association (p < 0.1) with ‘miss_phys’ in the univariable analysis, including any sub-category for categorical variables, and missing less than 20% of data (as per benchmark for trauma registry variables by Ringdal et al.) were included in the multivariable model [9]. Analysis was conducted using manual stepwise selection. Results were reported as adjusted odds ratios (95% confidence intervals). Statistical significance was defined as p < 0.05. Missing data on predictors and covariates were managed using list-wise deletion. Data analysis was conducted using STATA version 14.0 (College Station, Texas, USA).

Results

The registry included 4466 adult patients in the specified time period. The majority of patients (2587, 58%) in the registry were from Hospital 1, with 689 (15%) derived from Hospital 2, 473 (11%) from Hospital 3, and 717 (16%) from Hospital 4. Most patients survived (3835, 85.9%) while 582 (13.0%) died. There were 49 (1.1%) patients with unknown survival data.

Of all patients, 2014 were recorded as arrival by ambulance—583 patients (13.1% of total) arrived direct from scene while others were inter-facility transfers. Road traffic incidents accounted for 2485 patient injuries (55.6%).

Completeness of variables

Table 1 outlines the extent of missing observations in the AITSC trauma registry, which spanned from 0 to 67.4%. Four variables had no missing observations (data collector, admission date, sex, and first vital sign date). Most variables (n = 51, 86.4%) were missing less than 20% of observations.

Table 1 Proportion and percentage of patients missing data for variables in the AITSC trauma registry, colour coded using a traffic light system based on % missing (green = low levels of missingness, yellow = moderate levels, red = high levels)

Tables 2 and 3 describe the key characteristics of patients based on the primary predictor of hospital mortality status. They also provide a description of each variable in the registry, as well as the proportion of missing data.

Table 2 Demographics and injury event information for patients in the AITSC registry based on mortality status and composite characteristics (all patients in registryincluding those with missing mortality status), including proportion of missing data for each variable
Table 3 Hospital arrival, processes of care, and outcomes for patients in the AITSC registry based on mortality status and composite characteristics (all patients in registryincluding those with missing mortality status), including proportion of missing data for each variable

Predictors of missing physiological data

Univariable analysis

Data on any one of first in-hospital SBP, HR, GCS, or RR were missing in 808 patients (18.1%). They showed the following proportions of individual missing observations: HR—72 (1.6%); SBP—102 (2.3%); GCS—168 (3.8%); and RR—721 (16.1%). Table 4 outlines the characteristics of patients based on their status of missing physiological data and the results of the univariable logistic regression between the different variables (potential predictors) of missing physiological data.

Table 4 Predictors of missing first physiological dataunivariable logistic regression (conducted on variables missing < 20% of data on all patients). ‘No missing phys. data’ represent none of SBP/HR/GCS/RR missing, and ‘Missing phys. data’ represent any of SBP/HR/GCS/RR missing

Multivariable analysis

Table 5 presents the results of the multivariable logistic regression. The final model included 3382 cases (75.7% of all patients in registry). The following variables were found to be independently associated with missing first physiological data (p < 0.05): death, arrival time out of hours (18:00–07:00), hospital of care, ‘other’ place of injury as compared to home/residential institution, and blunt force and sharp force mechanism of injury (i.e. those injuries caused by blunt/sharp objects, people, or animals) as compared to road trauma incidents.

Table 5 Predictors of missing first physiological dataadjusted results of stepwise multivariable logistic regression

Both an ‘intentional—assault/homicide’ intent of injury and the occurrence of a chest X-ray were found to be associated with not missing first physiological data.

Discussion

This was the first study to examine completeness of variables in the AITSC registry in the context of establishing a new multicentre trauma registry in India. There were low rates of missing data compared to other trauma registries, highlighting a key success [8]. Hospital death was associated with missing physiological data, implying more severely injured patients were more likely to have absent physiological observations in the registry. Other key predictors of missing these observations included out-of-hours arrival time, hospital of care, and mechanism of injury. Respiratory rate was missing much more frequently than SBP, HR, or GCS (RR missing 16.1% of observations while the others were missing less than 5%).

The findings highlight the potential limitations of using a registry with missing data. Managing missing data is critical to ensuring the validity of a trauma registry for benchmarking and quality improvement, and its subsequent use to improve trauma care. Identifying predictors of missing observations can inform efforts to improve this registry’s data quality. This study also provides important lessons for the development and strengthening of registries in other LMICs. Despite the importance of such registries, few exist in LMICs; similarly, only 1% of trauma registry publications are derived from the least developed countries [19, 20].

The AITSC registry’s low rates of missing data were comparable, if not lower, than those reported from other registries. For example, the proportion of missing data in an established Oregon State Trauma Registry for four reported variables was: intubation attempt, 9%; GCS, 17%; SBP, 22%; and RR, 17% [21]. In other trauma registries, missing data have ranged from 4.9% to 20.4% for GCS, 2.1% to 33.7% for RR, and 2.4% to 28.7% for SBP [8]. While there are no formal guidelines for appropriate levels of missing data for trauma registries, Ringdal et al. used > 80% as the benchmark desired level of completeness—most variables in the AITSC registry achieved this level of completeness [9]. Variables with high rates of missing observations in the registry were mostly those regarding trauma arrival and time of investigations/procedures. These variables contribute to the appraisal of trauma care, allowing for baseline assessments and subsequent evaluations of interventions (e.g. time taken, appropriateness) which can ultimately improve patient outcomes. As such, minimizing the extent to which these variables are missing observations will maximize the impact of the AITSC registry in trauma quality improvement. Further, respiratory rate was missing in a substantial proportion of cases, so an increased attention to the collection of observations for this variable should be emphasized across sites. One key use of physiological data is its use in various trauma scoring systems to estimate probability of survival and benchmark trauma quality care across registries [12]. Hence, scores that require the respiratory rate clearly will face limitations when using this registry.

The association between death in hospital and missing physiological variables has been previously reported and was confirmed in this report. This study echoes previous findings of mortality predicting missing observations for specific physiological variables (GCS, RR) in the Victorian State Trauma Registry [14]. The other key predictors of missing and not missing physiological data have important implications for improving data collection in the AITSC registry (see Fig. 1). Hospital of presentation affected the completeness of such data, and hence, strategies on data collection could be shared between sites. Better data recording occurred when cases had law enforcement and/or legal implications (i.e. assault cases); hence, this rigour should be translated to all trauma cases. This may be achieved by better education of data collectors and clinicians as some data points are rarely recorded by clinical staff and hence cannot be inputted into the registry.

Fig. 1
figure 1

Improving physiological data collection in the AITSC registrytranslation of key study findings into specific processes which could increase physiological data completeness

This study also emphasizes the importance of strengthening ‘after-hours’ data collection. In overall mature trauma systems, comparable mortality among injured patients presenting on weeknights versus weekdays and lower mortality among injured patients on weekends versus weekdays have been demonstrated [22, 23]. However, among the most critically ill trauma cohort, the after-hours model of care has been reported to be associated with worse outcomes [18]. This highlights the importance of sustained data collection at all hours. In this registry, after-hours presentation was associated with missing physiological data, i.e. those times where data collectors were not prospectively recording information. Barriers to after-hours data collection may include the expense and lack of availability of after-hours data collectors (i.e. for concurrent data collection), as well as the lack of robust patient record-keeping and/or storage and retrieval systems for retrospective data collection. Thus, strengthening the quality of data for patients presenting after-hours presents a key challenge for this emerging registry.

While the literature on effective interventions for improving trauma data collection in LMICs is limited, the specific actions mentioned above could also be complemented by implementing general underlying departmental and hospital processes. Organizing regular meetings between data collectors and investigators at each site, and among data collectors across institutions, would help support them and address specific concerns or problems. Frequently monitoring data collection procedures and conducting data quality audits, including checking inter-rater agreement of data, would help detect key issues [24, 25]. Hospital-wide education on the importance of data recording would be beneficial for data extraction and would emphasize the need for data quality to be the responsibility of all stakeholders.

A key limitation of this study was that it focused solely on data completeness. Other domains of data quality (e.g. accuracy, case capture) were not assessed. These areas are critical and can also impact the extent of missing data. For example, if cases which have most data points recorded are the only ones entered in the registry, observations may not appear to be missing but ‘capture’ is not complete.

Data accuracy impacted the assessment of variable completeness, where variable completeness was conditional—i.e. when assessing ‘times’ variables dependent on specific interventions occurring. The quantification of missing data in these cases was based on a denominator of cases coded for that event occurring and an underlying premise that this data coding was accurate. Similarly, this study considered that data were missing due to a failure to record clinical information, rather than a failure to measure and thereby promptly diagnose patients. While it is unlikely that core physiological data were not measured, it is impossible to properly verify this.

Complete case analysis (exclusion) was used for covariates in analyses of predictors of missing physiological data. In an attempt to minimize the limitations associated with complete case analysis, only those variables which were missing less than 20% of observations were included in the predictor model (as benchmarked by Ringdal et al.) [9]. Furthermore, the key predictor examined (mortality) was almost complete—this information was only missing in 49 (1.1%) of patients.

Lastly, a high proportion of the cases in the registry were derived from one hospital (58%), introducing institutional bias when considering the whole registry. However, this study’s purpose was to explore the data quality and potential biases introduced when using the complete AITSC registry. It would be beneficial to conduct sub-analysis on each hospital to give more specific, and relevant, data to individual hospitals. This would enable them to better focus efforts on improving their internal records and identifying areas of improvement in their specific contexts.

Future investigation into other aspects of data quality is critical in the AITSC registry. Data should be collected and recorded as thoroughly as possible, and ideally, follow-up evaluation on data completeness would occur annually. However, missing data are ubiquitous and almost impossible to avoid. There remains a need for consensus on acceptable levels of missing data in trauma registries. The absence of observations does not necessarily invalidate a dataset, as long as appropriate measures to manage these are undertaken. This can be achieved by understanding the associations and patterns of missing data, and using suitable advanced statistical techniques such as multiple imputation, a statistical method that has been shown to lead to more valid conclusions when used for missing physiological variables in trauma registries [26].

Conclusion

This first assessment of data quality in a new trauma registry in India found that most variables had low rates of missing data, an important success. Hospital death was found to be associated with missing observations for key first in-hospital physiological variables, as were out-of-hours arrival and hospital of presentation. The completeness of physiological data would likely be improved by: the adoption of data collectors 24 hours a day; the implementation and encouragement of rigorous data recording and collection (as done in assault cases) by all staff; and the sharing of recording processes between sites.