FormalPara Key Points

Missing age in the structured field of post-marketing individual case safety reports (ICSRs) presents a significant challenge for case assessment during safety surveillance.

The rule-based algorithm evaluated in the current study achieved high performance in extracting patients’ age in ICSRs in the FDA Adverse Event Reporting System (FAERS) database and resultingly identified patient age for an additional one million ICSRs between 2002 and 2018.

This tool may be beneficial for implementation in other databases where age may not be consistently available in a structured field but otherwise reported in an unstructured narrative field.

1 Introduction

The US Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) is a database designed to support FDA’s post-marketing safety surveillance program for drug and therapeutic biological products [1]. The FAERS database contains spontaneous individual case safety reports (ICSRs) of adverse events (AEs) and medication errors submitted by healthcare professionals, consumers, and manufacturers.

FDA conducts ongoing surveillance of FAERS for all marketed products. Evaluation of ICSRs includes consideration for the product, event, and patient factors. One crucial patient factor is age, as it provides physiologic, pharmacologic, and epidemiologic context for surveillance. For example, pediatric patients are more susceptible to adverse drug reactions; robust pediatric data for therapies are often lacking, and many products are used off-label [2, 3].

Despite being an elemental demographic, age is frequently missing in an ICSR’s structured field [4,5,6]. However, the narrative of an ICSR allows reporters to provide information in free-form text format, which may include information not captured in structured fields. Currently, FDA reviewers manually review the narratives or use other customized workarounds (e.g., narrative text string searches) to determine if a patient’s age is reported in the narrative when missing from the structured field. Beyond ICSR evaluation, missing age information also has implications on accurate retrieval of relevant cases for review that are dependent on the structured age field and on signal detection algorithms that incorporate age information [7]. These implications are especially important for the identification of cases relevant to pediatric-focused postmarketing safety reviews, including those mandated by US legislation in 2002 [8, 9].

Natural language processing (NLP) has been widely used to facilitate drug safety activities with various data sources including electronic health records, internet-based data such as social media, published medical literature, and spontaneous reporting systems [10, 11]. Wunnava et al. [12] developed and evaluated rule-based and machine learning-based extraction methods using NLP to extract patient demographics, drug, and AE information from unstructured narratives of 60 ICSRs from the FAERS database. Their analyses suggest that the rule-based extraction method using raw text strings performed better than the supervised machine learning-based extraction methods or rules based on grammar or part-of-speech for demographic information in FAERS. Based on the demonstrated performance of the rule-based extraction method, a simple NLP algorithm described by Wunnava et al. to extract patient age, we expanded on their findings to validate this tool using a larger gold standard dataset of ICSRs in FAERS. Furthermore, we evaluated the extent to which NLP can improve the identification of patient age from ICSRs in the FAERS database from 2002 to 2018.

2 Methods

2.1 Data Source

The FAERS database contains more than 21 million ICSRs from 1968 to 2020. Manufacturers submit postmarketing ICSRs as expedited (within 15 days) if the ICSRs contain an AE that is not described in the product labeling and led to a serious outcome; otherwise, the ICSRs are submitted as non-expedited [13]. ICSRs that are voluntarily submitted to the FDA by healthcare professionals or consumers through the MedWatch program are referred to as direct reports [14].

The FAERS structure adheres to the international safety reporting guidance issued by the International Conference on Harmonisation (ICH) [15]. ICSRs contain many data elements for patient characteristics, reaction or event, product, and reporter; structured data are encouraged for electronic submissions [16]. Some data elements, such as case narratives, are unstructured fields in the form of free-text; additionally, submission of ICSRs allows for attachments such as medical records, images, and published literature. FDA guidance on data element transmission of ICSRs allows senders to use different ways of including the same data without being redundant to cope with differing information contents (e.g., age information can be sent as date of birth and date of reaction or event, age at the time of reaction or event, or patient age group according to the available information) [16]. The structured age field is populated by the age provided by the reporter or if missing, a calculated age is populated in FAERS if both the patient’s date of birth and event date are available.

2.2 Descriptive Analyses

We identified all ICSRs initially received by the FDA from January 1, 2002, to December 31, 2018. The latest ICSR version was used as the representative report if follow-up information was received from the same reporter. We calculated the overall and annual proportion of ICSRs missing an age in the structured field by report type (i.e., expedited, non-expedited, or direct). ICSRs were then stratified by those with and without an age in the structured field for comparison of the following report characteristics: report type, reporter type (e.g., healthcare provider, consumer), reporter country, and reported outcomes. Serious outcomes are defined by US regulations and include one or more of the following: death, hospitalization, life-threatening, disability, congenital anomaly, and other serious outcomes [13].

2.3 Sample Selection and Gold Standard Ascertainment

A random sample of 1500 ICSRs from all ICSRs received during the study period was selected for reviewers to manually identify patients’ age from the ICSR narratives. The size of the random sample was first determined by the considerations of statistical power calculation and then slightly increased given the reasonably easy process of manual review. Assuming the positive predictive value (PPV) is 0.90, a sample size of 1500 provides more than 90% power to rule out a lower bound of 95% confidence interval (CI) of PPV lower than 85%.

The gold standard for the NLP validation process was the manually-extracted patient’s age (in years) at the time of the first AE (if multiple AEs were reported) or the patient’s age at the time of reporting (if the age at event onset was not reported). If the patient’s age in the narrative was reported in days or months, reviewers converted the age to years (values were rounded to the tenth digit to match the rounding performed by the NLP algorithm). Patient’s age reported as an age range, approximate age, or not reported in the narrative were assigned as null age.

Two reviewers independently reviewed each ICSR narrative to identify age. If there was a disagreement, a third reviewer adjudicated. Some ICSRs in the dataset may contain age in the structured field, but reviewers were blinded to this value. Age from the structured field was not taken into consideration during the determination of the gold standard or validation phase of the study.

2.4 NLP Algorithm

The NLP algorithm uses rule-based text mining to extract age from the unstructured narrative field of the ICSRs and outputs the age in years. The algorithm searches the text in each ICSR’s narrative field and extracts the first instance of the numerical value preceding variations of the text strings reflecting “years” or “years old.” If there are no year terms in the text, then the algorithm extracts the first instance of the numerical value preceding variations of the text strings for “months” or “months old” and converts the value to age in years (rounded to the nearest tenth digit). If no terms are extracted for age, then the algorithm outputs null age. The regular expression code used to implement the algorithm is provided in Supplementary Materials Table 1.

2.5 NLP Validation

To evaluate the performance of the NLP tool, we compared the values from the NLP output with the gold standard for an exact match (Table 1). The two primary metrics used to evaluate information retrieval are PPV (precision) and sensitivity (recall). Sensitivity is the probability that a relevant value is retrieved by the tool. PPV is the probability that a retrieved value is relevant (i.e., matches the gold standard). The two metrics are often combined into a single measure called “F-measure,” which allows researchers to weigh either PPV or sensitivity more heavily [17]. In our study, PPV and sensitivity were equally important, and F-measure was calculated as:

$$ F = \frac{{2 \times {\text{PPV}} \times {\text{sensitivity}}}}{{{\text{PPV}} + {\text{sensitivity}}}} \times 100 $$
Table 1 Outcome classification of NLP tool

Validity was also measured by specificity, the probability that a non-relevant value is not retrieved by the tool. The percentage of overall matching was calculated as:

$$ \frac{{{\text{True}}\;{\text{positive}} + {\text{true}}\;{\text{negative}}}}{{{\text{sum}}\;{\text{of}}\;{\text{all}}\;{\text{reports}}}} \times 100 $$

For each of these metrics, the 95% CI was calculated using the binomial “exact” method. In our post hoc analysis, we further assessed the performance of the NLP tool among the ICSRs with age missing and with age available in the structured field, respectively.

To analyze the NLP errors, we reviewed the ICSR narratives to categorize the reason for the mismatch.

2.6 NLP Application

We applied the NLP tool to all ICSRs received during the study period. The patient age for each ICSR was then determined using a combined approach considering the structured age field and NLP extracted age. If age was missing in the structured field, the NLP extracted age was considered the patient age. If the structured age field and NLP extracted age were null, then age was considered missing. Using this combined approach, we determined the proportion of ICSRs annually and overall missing an age. Additionally, ICSRs were tabulated by age group using the structured age field alone versus the combined approach (i.e., structured age field + NLP output). Percentage change between the groups was calculated as:

$$ \frac{{{\text{Number}}\;{\text{of}}\;{\text{reports}}\;{\text{after}}\;{\text{NLP}} - {\text{number}}\;{\text{of}}\;{\text{reports}}\;{\text{before}}\;{\text{NLP}}}}{{{\text{number}}\;{\text{of}}\;{\text{reports}}\;{\text{before}}\;{\text{NLP}}}} \times 100 $$

3 Results

3.1 Characterization of ICSRs

Of 10,300,594 ICSRs received in the study period, the overall percentage of reports with missing age in the structured field was 37.2%. The annual proportion of ICSRs missing age increased over time, from 21.9% in 2002 to 43.8% in 2018 (Fig. 1). Direct reports had the lowest overall proportion of ICSRs missing age (12%), followed by expedited and non-expedited ICSRs (31.5% and 46.6%, respectively). An increased trend in percent of ICSRs with missing age was observed for expedited and non-expedited ICSRs; however, no clear trend was observed for direct reports.

Fig. 1
figure 1

Percentage of FAERS ICSRs with missing age in the structured field by report type and overall, from 2002 to 2018. FAERS FDA Adverse Event Reporting System, ICSR individual case safety report

Characteristics of ICSRs with and without age in the structured field are described in Table 2. Approximately half of all ICSRs were reported by healthcare professionals, among which 68.1% provided an age in the structured field. Reports originating outside of the USA contained age more often than those originating in the USA (73.7% vs 58.8%). Of the serious outcomes reported, congenital anomalies had the lowest proportion (35.6%) and life-threatening had the highest proportion (87.2%) of ICSRs with age.

Table 2 Descriptive characteristics of ICSRs with and without age in the structured field from 2002 to 2018

3.2 Gold Standard Ascertainment

The distribution of ICSRs by year and report type for the random sample of 1500 ICSRs was similar between the study sample and all ICSRs received (Supplementary Materials Figs. 1, 2), which indicated the sample was representative of the total ICSRs. In the random sample, 38% (570/1500) of the ICSRs were missing age in the structured field. After manual review of the 1500 ICSR narratives, we identified a numerical age from the narratives as the gold standard for 868 ICSRs (57.9%). The reviewers determined null age to be the gold standard for the remaining 632 ICSRs (42.1%); note that for ICSRs without age identified in the narrative, age may be available in the structured field.

3.3 NLP Validation

Among the 1500 ICSRs in our random sample, the NLP tool extracted an age for 894 ICSRs: 849 were true positives and 45 were false positives. Among the 606 ICSRs without an age extracted from the narrative, 593 were true negatives and 13 were false negatives. The NLP tool achieved high performance with sensitivity of 98.5% (95% CI 97.4–99.2%), PPV of 94.9% (95% CI 93.3–96.3%), F-measure of 96.7% (95% CI 95.3–97.7%), and specificity of 93.0% (95% CI 90.7–94.8%). Overall, 96.1% (95% CI 95.0–97.0%) of the ICSRs had a match between the NLP output and the gold standard dataset.

Our qualitative error analysis identified several categories of false positive and negative errors. Table 3 provides the frequencies of error categories and examples. The absence of a year or month unit was the most frequent reason for false negative errors (30.7%). This occurred because reviewers were able to determine age from the narrative’s context despite the absence of terms analogous to “years” or “months.” Other causes of false negatives were typographical errors, missed extractions of numbers at the beginning of a paragraph, non-numerical ages, and age unit of weeks and days. Missed extractions at a paragraph’s start occurred because the tool required a space to be present before a numerical value. The most common cause of false positive errors was when NLP extracted a value that was a unit of time that was not age-related (60%). Often, these ICSRs described a length of time for the use of a medication or was related to information from the patient’s medical history. The second most common cause of false positive errors included non-patient-specific ages extracted by NLP; examples included literature reports that described age characteristics of patients in clinical trials, observational studies, or case series.

Table 3 Observed causes for false negative and false positive errors

Among the 1500 ICSRs in our random sample, there were 930 ICSRs with age available in the structured field and 570 ICSRs with age missing in the structured field (Supplementary Materials Tables 2 and 3). In the post hoc analysis, we evaluated the performance of the NLP tool among the subset of ICSRs with age missing and with age available in the structured field. The sensitivity and specificity of the NLP tool for the reports with age missing in the structured field were 98.5% and 93.3%, respectively, whereas for the reports with age in the structured field, sensitivity and specificity were 98.5% and 92.2%, respectively. However, the PPV and F-measure were lower for the reports with age missing in the structured field (82.3% and 89.7%, respectively), compared to those for the reports with age in the structured field (97.8% and 98.3%, respectively). Among the reports with age missing in the structured field, only 24% had age in the narratives. In contrast, 78% of the reports with age in the structured field had age in the narratives.

3.4 NLP Application

Among the 3.8 million ICSRs missing age in the structured field, the NLP tool identified age for an additional one million ICSRs. Resultingly, the overall number of ICSRs missing age in the study period decreased 10% (37.2–27%). Figure 2 provides the annual percentage of ICSRs missing age using the structured age field alone and the combined approach (i.e., structured age field + NLP extracted ages).

Fig. 2
figure 2

Percentage of FAERS ICSRs with missing age before and after NLP implementation. FAERS FDA Adverse Event Reporting System, ICSR individual case safety report, NLP natural language processing

Table 4 shows the number of ICSRs with age before and after application of the NLP tool, displayed by age groups. In the pediatric age group (≤ 16 years), the number of ICSRs with age in the structured field after NLP application increased by 59%. For the younger pediatric age groups (< 6 years), the number of ICSRs with age in the structured field after NLP application was substantially higher (more than doubled) compared to the number of ICSRs before NLP application. In contrast, for the older age groups (≥ 17 years), the increase in the number of ICSRs with age in the structured field was more moderate (increase by less than 20%).

Table 4 Number of ICSRs with patient age before and after NLP implementation from 2002 to 2018 across age groups (n = 10,300,594)

4 Discussion

As the overall number of FAERS ICSRs increased from 2002 to 2018, the percentage of ICSRs with missing age in the structured field doubled. This significant increase provided the impetus for exploring solutions to support extraction of age from the unstructured case narrative. Improving patient age ascertainment from ICSRs will aid in age-specific safety surveillance activities. We did not identify other published studies that focused on the extraction of age from narratives except the publication by Wunnava et al. This previous study explored different methods for the extraction of age from ICSR narratives and observed that a rule-based text string search performed better than those based on other methods studied; their text string search algorithm for age in 60 ICSRs achieved a sensitivity of 83%, PPV of 89%, and F-measure of 86% [12]. Our validation of the algorithm in a dataset of 1500 ICSRs, using the text string search for age (reported in months or years), achieved a sensitivity of 98.5%, PPV of 94.9%, and F-measure of 96.7% for all reports. We further compared the algorithm’s performance among reports with age missing/not missing in the structured field. The performance of the NLP tool depends on the prevalence of relevant age in narratives and is restricted if the prevalence is low.

While advances have been made in the use of NLP to extract biomedical information from various data sources, our analysis demonstrated that a straightforward text string-searching algorithm could extract a numerical age from an ICSR’s narrative free-text with high performance. This finding is significant because this rule-based extraction method is easy to implement. Unlike more sophisticated machine learning-based methods, the use of rule-based extraction does not require the need to train datasets. Furthermore, the text string search algorithm is interoperable across different databases. In addition to FAERS, this tool may be beneficial for implementation in other databases containing patient information where age may not be consistently available in a structured field but otherwise reported in an unstructured narrative or free-text field. The high performance of this tool supports its implementation in pharmacovigilance practices.

A new data field that considers both the structured age field and narrative-extracted age information can reduce the manual curation performed on ICSRs missing age in the structured field. For the application of NLP to ICSRs, we determined the patient age using a combined approach considering the structured age field first if a value was present, then applied the NLP extracted age only if a value was missing in the structured age field. Application of NLP to ICSRs from 2002 to 2018 using the combined approach decreased the number of ICSRs missing an age from 3.8 to 2.8 million (10% of the total number of ICSRs). We observed more than 50% increase in the number of ICSRs pertaining to the pediatric age group. Postmarketing safety information for pediatrics is especially important because of the limited number of pediatric patients in clinical trials and off-label usage [18]. Additionally, it is not possible to identify all risks of a product from premarketing clinical trials. Therefore, postmarketing safety surveillance furthers our understanding of a product’s safety in pediatric patients. However, pediatric ICSRs account for a very small percentage of the total ICSRs in the FAERS database (3% during the study period). In consideration of the relatively low number of pediatric ICSRs in FAERS, as well as the known limitations of spontaneous databases, such as underreporting, it is critical to increase the ability to accurately identify and retrieve additional pediatric ICSRs in the database. Further research is needed to determine the implications of using this derived age field for signal identification (e.g., incorporation into signaling algorithms).

We have two hypotheses for the observed larger percentage increase of ICSRs for pediatrics relative to adult age groups following NLP. First, there may be differential inclusion of age information by reporters in an ICSR’s narrative due to the perceived greater importance of providing age information for pediatrics relative to other age groups. Second, the disproportional change may also be attributed to false positive errors. For 30 of 45 (67%) observed false positive errors, NLP incorrectly extracted a value for age that was less than 17 years. This is not surprising given the most common false positive error was the extraction of a value associated with other time frames, such as treatment duration and event onset. These time frames are commonly described with small numbers (e.g., 1 year ago). The small sample of pediatric cases and false positives limits our ability to draw more definitive conclusions, but the performance of the tool in the pediatric ICSR subset may benefit from additional evaluation.

Our qualitative error analysis identified several areas for improvement. Notably, although the NLP algorithm extracts the first instance of an age value preceding terms related to “years” (if none detected, then “months”), the number of false positive errors introduced by this method was low (n = 3). The most common cause of false positive errors was the extraction of a numerical value preceding the terms related to “years” or “months” but unrelated to age. Modifications to the algorithm could be made to reduce the number of erroneous extractions for such values unrelated to age. To reduce false negative errors and increase sensitivity, potential improvements include modifying the algorithm to capture missed extractions due to age reported with variations in spacing, paragraph breaks, or other non-numerical characters, as non-numerical values, with a decimal, or in weeks or days. However, any modifications may introduce new errors and would require validation.

Despite the use of the NLP tool, we were unable to determine age for 27% of the ICSRs in the study period. Refinements to the tool may result in incremental improvements, but it is unlikely they would substantially reduce missingness given the already high sensitivity. Major improvements in missingness will need to come from improvements in the reported data. Notably, age was less frequently missing in ICSRs sent directly to the Agency via the MedWatch program. An ICSR may be missing age because the information may be unknown to the reporter or protected for privacy reasons [19].

While beyond the objective of the study to investigate reasons for the trend in missingness, we explored missing patterns in data fields relevant to age post hoc. Currently, if age is not provided in the structured age field, FAERS calculates age using the date of birth and event date when both are available. We noted that the availability of date of birth data was stable over time, but the annual proportion of cases with an event date has decreased. It is unclear why this occurs, but it may be due to the increase in reports associated with industry-sponsored programs [6]. The completeness of data fields from these types of programs is highly variable [20, 21]. Jokinen et al. noted that certain types of programs (e.g., patient assistance programs) have been associated with lower vigiGrade scores (a measure of data completeness that includes patient age) [21, 22]. Future work should explore the reasons for the data trends.

There are some limitations to our study. We likely underestimated the availability of age in FAERS as some ICSRs contain attachments that may include relevant age information (e.g., literature article). The NLP tool is currently unable to evaluate text contained in attachments, but optical character recognition technologies would make this possible [23]. Additionally, we did not deduplicate ICSRs in the study dataset; therefore, we are unable to account for the number of unique ICSRs when we applied the NLP tool to ICSRs in the study period. Although we performed the FAERS search to retrieve the latest version of an ICSR, duplicate ICSRs describing the same patient and adverse event may still occur as a result of submission of ICSRs by different manufacturers or reporters. We did not deduplicate reports in the study set of 1500 reports because we wanted to validate the NLP tool for a random sample of reports that is representative of the overall database during the study period (2002–2018). We think validating the tool in a representative sample with duplicates would reflect the reality that many researchers are not able to deduplicate those reports when conducting their research using FAERS. Finally, our gold standard discarded estimated ages, although an estimated age or age group would be more useful than no age information.

5 Conclusions

This study demonstrated the potential for a rule-based text mining NLP algorithm to extract patients’ age from the narratives of post-marketing ICSRs thereby increasing the number of ICSRs with age for analysis. The use of this tool would facilitate pharmacovigilance practice and research using the FAERS data. The tool demonstrated good overall performance; however, further improvements may be considered. Further research to expand the use of NLP to extract information beyond age from unstructured data fields will improve postmarketing surveillance.