Introduction

Maternally linked birth records datasets have emerged as a potentially powerful resource for investigating maternal and infant health in US populations [1]. These datasets consist of several consecutive years of vital birth records to which an algorithm has been applied to identify, for each woman, the set of records that represents her different births over the follow-up period [2]. These maternal sets are then analyzed as longitudinal data on pregnancies and births. Some maternally linked datasets include fetal death records.

Numerous studies have used maternally linked data to investigate perinatal outcomes [1, 316], maternal behaviors [7, 10, 11, 17, 18], and the quality of vital records birth data [19, 20]. In contrast, very little attention has been given to assessing linkage errors with respect to maternal sets and to understanding how such errors affect estimates derived from the maternally linked data. Progress in this area has been hampered by lack of a framework for conceptualizing the errors and lack of methods for quantifying them. The objectives of this study were to develop a framework for conceptualizing linkage errors in maternally linked datasets, to develop measures for quantifying the errors, and to demonstrate the application of the new measures to a maternally linked birth records dataset.

Error in assigning birth records to maternal sets was conceptualized as being analogous to misclassification in epidemiologic studies (see Table 1). This approach was chosen because the overall goal of this line of research is to develop quantitative techniques for incorporating adjustments for linkage errors into analysis of the linked records [21]. Two new measures, analogous to sensitivity and specificity, were developed to quantify the linkage errors. The measures were applied to a maternally linked birth records file, using a hospital birth log file as the gold standard to calculate true and false linkage rates. Varying degrees of random error were introduced into the maternal linkages and the behavior of the new measures under these known conditions was examined.

Table 1 Conceptual scheme for misclassification of maternal sets

Methods

Jaro’s [22] method of record linkage, as implemented in the AutoMatch software (version 4.3, MatchWare Technologies, Inc., Kennebunk, ME), was used to construct a maternally linked dataset from North Carolina resident in-state birth and fetal death records for 1988–1997. The linkage strategy is given in the Appendix. A typical internal validation was performed by assessing the logical consistency of selected variables across records within maternal sets. This maternally linked dataset was used as the baseline for assessing the new measures. Evaluating the quality of the maternal linkages in this file was not an objective of the present study; the results of the internal validation are given for purposes of describing the baseline data.

Table 2 Two measures of misclassification of maternal sets in maternally linked birth records

Five additional files were created by introducing increasing degrees of random linkage errors into the maternal sets of the original linked file. In each case, a specified proportion (1, 2, 5, 10, and 20 percent, respectively) of the records was randomly reassigned to different maternal sets, thus simulating errors in the maternal linkages. The new measures were then applied to these files, with the expectation that the measured error should increase with the increasing proportion of simulated errors.

The external gold standard file consisted of the birth log for 1988–1997 of one North Carolina hospital, organized into maternal sets by the mother’s hospital ID number.

For calculating linkage error rates, AutoMatch was used to identify records in the birth file that corresponded to records in the gold standard file, matching on mother’s and infant’s names, birth dates, etc. Gold standard-birth record pairs that met the matching criteria were considered as representing the same birth. To simplify the presentation, references to birth records in the following text includes fetal death records.

Quantifying linkage errors

Errors in the composition of maternal sets were conceptualized in two dimensions: the true linkage proportion, analogous to sensitivity, which captures the degree to which all of a woman’s births were assigned to a single maternal set, as opposed to being divided among different sets; and the false linkage proportion, analogous to specificity, which captures the degree to which the assigned maternal sets combined births from different women.

The true linkage proportion was operationalized as the percent of maternal sets of size two or greater in the gold standard file that were completely, partially, or not-at-all represented as sets in the birth records file (see Table 2). Completely represented means that all of the births that comprised the gold standard set were assigned to the same maternal set in the birth records file (but the birth records set could include other births as well, i.e., from other gold standard sets). Partially represented means that at least two but not all of the births in the gold standard set were assigned to the same maternal set in the birth records file. Not-at-all represented means that no two births from the gold standard set were assigned to the same birth records set. In cases where there was not a birth record corresponding to the gold standard record, the gold standard record was considered assigned to a separate birth records set.

Fig. 1
figure 1

Distribution of maternal sets in a maternally linked birth records file and a hospital birth log file (gold standard), by number of records in the set

The false linkage proportion was operationalized as the percent of maternal sets of size two or greater in the birth records file that were completely, partially, or not-at-all composed of births from one gold standard set only (see Table 2). Completely means that all of the births in the birth records set were from a single gold standard set (but the birth records set did not necessarily encompass all of the births from that gold standard set). Partially means that at least two but not all of the births in the birth records set were from the same gold standard set. Not-at-all means that no two births in the birth records set were from the same gold standard set. In cases where there was not a gold standard record corresponding to the birth record, the birth record was considered as representing a separate gold standard set.

The above categories are ordinal in the same direction for both the true and false linkage proportions: completely represents the smallest amount of misclassification and not-at-all represents the greatest amount of misclassification. In addition, an optimal category was defined for sets in which there was a one-to-one correspondence between the records in the birth set and the records in the gold standard set. Thus, optimal sets represent the absence of misclassification, and the optimal category is a subset of the completely category. An analogous designation for the opposite situation — total misclassification — could not be defined because any pairing of birth records sets with gold standard sets for this purpose would be arbitrary.

A variation of these measures calculated the percent of births included in sets variously categorized as above. These measures performed similarly to those using set as the unit of analysis. Only the latter are reported here.

Because the gold standard file corresponded to a small subset of the birth records file, calculation of the above measures was based on selected subsets of the birth and gold standard records. The base population for the true linkage proportion consisted of maternal sets in the gold standard file that included at least two births, along with all birth records that corresponded to those gold standard records. The base population for the false linkage proportion consisted of maternal sets in the birth records file that included at least two births, at least one of which corresponded to a gold standard record, along with all of the gold standard records that corresponded to those birth records; birth records sets that did not have at least one corresponding gold standard record (a majority of the birth records sets) were not included in the base population for the false linkage proportion.

Results

There were 1,010,788 birth and 9,022 fetal death records in the birth file. (Due to a programming error, 10,601 (1.0%) birth records were excluded from this file.) From this, 234,235 maternal sets of two or more records were identified, 80% of which consisted of two records. The distribution of the sets by size (number of records) is shown in Fig. 1.

The results of the internal validation showed that the change in mother’s age between two consecutive linked records in the maternally linked file corresponded to the difference in time between the two events in over 99% of linked record pairs; the estimated date of the beginning of gestation for a later event began after the occurrence of the previous event in 97% of pairs; the dates of occurrence of the previous event as indicated on the record for the next event and on its own record matched in over 95% of pairs; and parity increased by one between 91% of consecutive birth records and by two or three between an additional 3% of consecutive birth records.

The gold standard file contained records for 21,875 births and 394 fetal deaths, including 3,447 maternal sets of two or more records. The distribution of the sets by size is very similar to that of the birth records. In matching the gold standard and birth files, corresponding records in the birth file were identified for 95% of the records in the gold standard file (Fig. 1).

The true and false linkage proportions for the original (i.e., without simulated errors) linkages are shown in Table 3. There were 3,447 gold standard maternal sets in the base population for the true linkage proportion, and 7,133 birth records maternal sets in the base population for the false linkage proportion. For the true linkage proportion, 87.8% of the gold standard sets were categorized as completely, whereas for the false linkage proportion, 36.1% of the birth records sets were categorized as completely. As the percentage of simulated errors increased, both linkage proportions shifted in the direction of greater misclassification (Fig. 2). With 20% of the birth records randomly re-assigned to a different maternal set, 54.5% of the gold standard sets were categorized as completely (true linkage proportion) and 12.9% of the birth records sets were categorized as completely (false linkage proportion) respectively.

Table 3 Maternal linkage error rates before introduction of simulated errors
Fig. 2
figure 2

True (upper graph) and false (lower graph) linkage proportions in a maternally linked birth records file with increasing levels of simulated linkage errors. For both proportions, the completely category represents the least degree of misclassification and the not-at-all category represents the greatest degree of misclassification

Discussion

Results of epidemiologic studies using maternally linked birth records reflect bias and imprecision introduced by errors in the linkages. To date, methods for assessing the impact of those errors on the validity and reliability of the results, and for interpreting or adjusting the results as appropriate, have not been developed. As a first step in that direction, this study developed and tested new measures for quantifying maternal linkage errors. The conceptual framework and quantitative measures were guided by the general epidemiologic approach to misclassification, i.e., determining the sensitivity and specificity of the operationalized measure with reference to a gold standard. However, sensitivity and specificity are not directly applicable to the maternal linkage context because sensitivity and specificity are derived by comparing alternate categorizations of individuals, whereas assessing misclassification in maternal linkages involves comparing alternate compositions of groups (i.e., maternal sets).

To develop measures of misclassification for this situation, misclassification was conceptualized as a quality of the maternal sets rather than a quality of individual records. It was further conceptualized as a continuous characteristic, although it was operationalized as a categorical variable with three levels. Specifically, the true linkage proportion was proposed for capturing the notion of true positives as usually represented by sensitivity, and the false linkage proportion was proposed for capturing the notion of true negatives as usually represented by specificity.

These two measures behaved as expected when increasing degrees of error were introduced into the linkages—the distributions of the sets shifted in the direction of greater misclassification (see Fig. 2). For the true linkage proportion, the change was nearly linear and consisted primarily of a shift out of the completely category and into the partially category. For the false linkage proportion, the rate of change decreased somewhat as the percent of errors increased, and consisted primarily of a shift out of the completely category and into the not-at-all category. These results support the conclusion that the true and false linkage proportions constitute valid measures of maternal linkage error, although further development and evaluation are needed.

In addition to serving as measures of misclassification with reference to a gold standard, the true and false linkage proportions can be used to compare different maternally linked files that are based on the same data but different linkage methods. For example, the measures can be used to compare files produced by different match specifications or different linkage software. This could provide useful information for developing final match specifications or for choosing among alternative software applications.

When suitable gold standard files are available, the true and false linkage proportions should be calculated and reported in studies analyzing the linked records, much as response rates are reported for surveys. Furthermore, reviewers and editors should expect investigators using maternally linked files to demonstrate the quality of their data by reporting the true and false linkage proportions, or other measures of linkage quality that may be developed. The current practice of reporting the percent of records that matched is meaningless as an indicator of linkage quality.

Adams et al. [2] conducted the only other published quantitative evaluation of maternal linkages. They reported the percent of sets in which the number of records in the set differed in each direction by one, two, three, or four or more from the expected number. Two such comparisons were made, one with expected numbers derived from obstetric history information on the records themselves, and the other with expected numbers obtained from interviews with a small proportion of the birth cohort mothers. The true and false linkage proportions proposed in this paper extend Adams et al.’s aggregate approach to one based on counts of maternal sets containing misclassified records. This yields additional information about the nature of the misclassification, and will facilitate the development of quantitative techniques for assessing the impact of misclassification on studies using maternally linked data.

Further development of the measures introduced in this study should examine their behavior with gold standard files of different sizes relative to the maternally linked file, as well as examining how the measures are influenced by missing data (i.e., a gold standard record that is missing a corresponding birth record, for calculating the true linkage proportion, or a birth record that is missing a corresponding gold standard record, for calculating the false linkage proportion). The gold standard file used in this study was small relative to the birth records file. This difference in size may have produced a large proportion of missing data, which in turn may explain the different patterns shown by the true and false linkage proportions as linkage errors increased. Moreover, each missing corresponding record was counted as an additional set. Future research should determine the minimum relative size of a gold standard file necessary to obtain meaningful assessments of the linked file, and the optimal method of handling missing data under various conditions.

Future research should also investigate operationalizing the linkage proportions as continuous variables, and how the distribution of maternal set size influences the true and false linkage proportions. Although the completely and not-at-all categories capture the maximum and minimum degrees of misclassification, the partially category could include a wide range of intermediary degrees of misclassification. However, this is constrained by the distribution of set size. Sets of size two, which will generally account for the majority of maternally linked sets, can be categorized as completely or not-at-all, but not as partially.

Although the comparison files are called “gold standards,” perfect comparison files will rarely be available. Future research is needed to identify the relevant characteristics of potential comparison files, to understand how variations in these characteristics affect the true and false linkage proportions, and to identify potential sources of comparison files. Possible sources include records from maternal and child health programs, such as WIC; medical records, such as the hospital birth log used in this study; survey data, such as that used by Adams et al. [2]; and a validated subset of records, as is commonly used for internal validation studies [21].

Finally, the terminology introduced in this paper is somewhat awkward. Improvements would aid in communicating results using the new measures.

The measures developed in this paper are relevant for epidemiologic studies beyond those using maternally linked data. In general, the measures can be used to quantify errors in assignment where the unit of analysis is the group, and values for group-level variables are obtained by aggregating the values of the individual units that have been assigned to the groups. This type of design is often found in studies of institution-related populations, especially schools [23], and in follow-up studies that combine exposure measures from different sources [24]. For maternally linked and similarly structured data, the measures constitute a scheme for conceptualizing the mis-assignment and a technique for measuring it. This is a necessary step towards the ultimate goal of developing methods for assessing misclassification bias in parameter estimates derived from the linked data and for addressing such bias through sensitivity analysis [25], adjustment [21], and other means.