Introduction

Subdural hematoma (SDH) is the most common form of traumatic intracranial hemorrhage, resulting in over 60,000 hospitalizations annually [1]. The computed tomography (CT) scan is a diagnostic for SDH, and the CT scan report typically contains key elements used in deciding the disposition and treatment plan for patients with SDH. The size and location of hematoma have also been shown to be predictive of outcomes [2]. Midline shift (brain herniation) can lead to coma and death. The ability to extract this important data from CT scans, or the accompanying report, is crucial to patient care and research for patients with SDH.

Radiology reports are typically entered into the electronic health record (EHR) as free text. While these reports are generally in a consistent format and use relatively similar terminology, the information is not captured as discrete data elements. Because of this, these reports cannot be easily used in situations where structured data is needed, such as for clinical decision support tools or in observational studies. This presents a particular problem for SDH research, which has no verified severity of injury scale that includes an interpretation of neuroimaging data, in contrast to aneurysmal subarachnoid hemorrhage (e.g., Fisher grade), intracerebral hemorrhage (e.g., hematoma volume), or ischemic stroke (e.g., large vessel occlusion). Natural language processing (NLP) has been used in multiple prior instances to classify radiology reports [3,4,5,6]. The NLP for data element extraction has been successfully utilized in a few specific clinical research scenarios (e.g., colonoscopy reports) [7, 8]; however, we are unaware of prior use in neuroimaging. A functional, accurate NLP algorithm could allow researchers to extract data from a large number of head CT reports more quickly and precisely than manual extraction, leading to advances in prognosis, predictors of complications, and evaluation of potential treatments that might improve patient outcomes.

The primary objective of this investigation was to develop a NLP algorithm to structure SDH characteristics from cranial CT scan reports with known SDH and then to test this mechanism for accuracy when compared to the performance of a pair of trained physician abstractors reviewing the same CT scan reports.

Methods

Population, data collection, and coding

Radiographic data was collected from patients presenting to a single, academic level 1 trauma center. The sample was consecutive CT scan reports with isolated SDH retrieved from the electronic medical record. Reports were identified by searching patient records for discharge diagnoses consistent with intracranial hemorrhage and then narrowed to isolated SDH in a process that has been previously described [2]. Isolated SDH was defined as presence of SDH with no other type of intracranial hemorrhage (such as subarachnoid hemorrhage, cerebral contusion, epidural hematoma). However, patients could have more than one SDH present. Other types of intracranial hemorrhage were excluded in order to simplify algorithm development, as these other hemorrhages have different key characteristics. SDH count was determined by tallying all hemorrhages referenced in the radiology report. Hemorrhages that spanned more than one region but were confluent were counted as a single hematoma. All scans were interpreted and/or approved by attending radiologists.

After initial review, scans were coded by abstractors for four variables: total number of SDH, presence and degree of midline shift (in millimeters), thickness of largest SDH (in millimeters), and side of largest SDH. Variables were chosen as they make up the key components of several SDH risk stratification decision tools [2, 9]. Each scan was assessed twice and results were adjudicated by a senior investigator, an attending emergency medicine physician with substantial experience interpreting and adjudicating cranial CT scan reports in a research setting; the final result of this interpretation was considered the gold-standard value.

Model creation and system development

An algorithm was created to extract from the written CT scan reports using a pattern matching approach. This method was chosen due to the lack of a mention-level gold-standard corpus. The NLP algorithm consists of a pipeline of components that first detect section, sentence, and word token boundaries, breaking the text down into identifiable paragraphs, sentences, and phrases. Subsequent components execute pattern matching searches for mentions of SDH (e.g., “subdural hematoma” or “extra-axial collection”), mentions of location (e.g., “right frontal,” “left occipital lobe”), and measure phrases (e.g., “6 mm”). An example of the location of data extracted within a typical CT scan report is provided in Box 1. Measure phrases and location information are associated with SDH mentions using a heuristic of co-occurrence within the same sentence. These components of the algorithm are then integrated using the Apache Unstructured Information Management Archive (UMIA) environment (Apache Software Foundation, Wakefield, MA). The code of the algorithm was made publicly available after completion and is available for download and free, non-commercial use via the general public license at https://github.com/NUNLP/NeuroNLP.

Box 1. Example CT scan report

figure a

Bold indicates data used for thickness. Italics indicate data used for midline shift. Red indicates data used for SDH count. Underline indicates data used for side of SDH

Outcomes and data analysis

Performance of the algorithm was measured using accuracy compared to data abstracted by two emergency physicians. Accuracy, defined as percentage of the NLP extracted values that were the same as the physician consensus gold standard, was calculated. Statistical analysis was performed using R v3.4 (R Foundation for Statistical Computing, Vienna, Austria) with the IRR and psych packages.

Results

A total of 612 CT scan results, each the first CT scan recorded in the EHR for unique patients, were extracted and used to create the training and test datasets. All scans in the corpus had all four of the key features described.

The created NLP algorithm was found to have 84–90% agreement with human abstractors for size of SDH, degree of midline shift, and side of largest SDH. The algorithm had less optimal performance when attempting to determine SDH count. Algorithm accuracy and Cohen’s kappa comparing the NLP algorithm to agreement of two human abstractors (gold standard) are displayed in Table 1.

Table 1 Performance of the NLP algorithm structuring data from CT scan reports

Discussion

In this investigation, an NLP algorithm was derived and validated to identify and extract key data elements (thickness of SDH, amount of midline shift, side of largest SDH) from radiology reports with performance comparable to that of physicians. This is the first described NLP algorithm to extract information from head CT reports, although cranial CT scans are one of the most commonly used radiographic tests. Additionally, this is one of the first applications of the NLP to extract data into a structured format from radiology reports, rather than to classify if a report contains a positive result (e.g., thickness of subdural hematoma, which is more informative than only mention of its presence).

The NLP algorithm had differing performance for each variable. Midline shift had the highest performance, likely because discussion of midline shift, including the amount of shift, is almost always isolated to a single sentence. Side of hematoma and thickness are also frequently described in close proximity to the primary mention of the hematoma, contributing to their similar accuracy. On the other hand, count is very rarely mentioned explicitly, which makes the process of counting the exact number of hematomas present more indirect, potentially explaining the algorithm’s relatively less accurate performance when extracting this variable.

The use of the NLP techniques is consistently increasing in medicine, both related to clinical and research applications. While the use of the NLP on radiology reports has been extensively previously studied [3], and there have been prior investigations using the NLP to classify cranial CT scan report outcomes [10], this is the first algorithm that has been created to extract data from cranial CT scan reports. Outside of radiology, a previous study extracted data from colonoscopy reports and were able to obtain excellent accuracy [7], while another used the NLP to extract structured data from mammography reports, obtaining an F score of 86% [11]. The accuracy statistics from the mammography study are comparable in accuracy to what is reported here; this is one of the only other data extraction reports published so far which included accuracy statistics. There have been several other applications of the NLP to identify other types of radiography reports with positive findings, including pulmonary nodules, pulmonary emboli, and abdominal aortic aneurysms. The use of the NLP to extract clinically relevant characteristics from head CT reports is an innovative and logical next step that may have important implications for the diagnosis and management of neurological conditions.

Using the NLP to abstract data has several potential important research and clinical uses. It may be used to improve dataset creation for larger, more robust and more generalizable observational studies by allowing for the use of much larger datasets due to the reduced need for human abstractors. In fact, while there is some investment needed to program the initial algorithm, the marginal cost of extracting data from additional records is practically nil. In addition to the benefits for research, accurate NLP technologies may allow for improved real-time decision support capabilities by allowing these tools to extract richer, more robust data from the medical record by providing access to free text fields instead of only structured entries [12].

This investigation does have several important limitations. Data was gathered from a single center with a single style of formatting. Additionally, reports were created and finalized by a single attending group of emergency radiologists and the performance of the algorithm was not compared based on the authoring radiologist. Hospital, physician group, and regional variations may limit the generalizability of this algorithm when applied to other reports, although the large size of the cohort makes generalizability more likely. Despite the excellent agreement between data extracted by the NLP algorithm and the gold standard, other NLP techniques might further improve accuracy. For example, more advanced information extraction algorithms could be applied given a mention-level gold-standard corpus. There are inherent limitations in human abstraction accuracy so the gold-standard, adjudicated records may not be 100% correct. However, multiple reviewers abstracted each record with a senior researcher adjudicating, so this error is likely minimized in this study. Finally, we referenced the NLP toward the CT scan report, which is taken as a faithful interpretation of the images; an error in the interpretation could lead to an unfaithful abstraction of what the CT actually shows, even if it accurately represents the report. Another group has recently reported acceptable accuracy of automated recognition of abnormalities on chest radiographs directly from a large repository of images, which is another potential avenue of research for SDH and brain injury [13]. Automated recognition typically requires thousands of images, more than is available in any known dataset of patients with SDH.

Future investigations should attempt to improve the accuracy of the NLP algorithms presented here. Substituting a program that uses named-entity recognition using the UMLS dictionary, such as cTAKES, will likely substantially improve system accuracy [14]. Additionally, adding a component to classify hematoma type in addition to extracting data would eliminate the need to manually determine the presence of an isolated subdural hematoma, making it a helpful step toward creating a unified NLP algorithm to interpret cranial CT scan reports.

Conclusion

An NLP algorithm can successfully abstract the side of SDH, thickness of SDH, and the degree of midline shift after SDH from head CT reports with SDH with excellent accuracy in a test cohort. The algorithm, available freely, may accelerate research and patient care for SDH, the most common form of traumatic intracranial hemorrhage.