Introduction

When birth records or other official identification documents reporting the age of an individual are unavailable in criminal, civil and asylum procedures, forensic age estimation can be deemed necessary by the authorities. The estimation usually has to contain a predicted age together with a measure of the uncertainty, and the probability that the examined person has reached a specific legally relevant age threshold. In most countries the age threshold lies between 14 years and 22 years, representing children and young adults [1]. Furthermore, in sports, age estimation is used to ensure fair play by checking whether athletes are participating in the correct age category [2].

Established methods for age estimation mainly use radiographs to evaluate teeth, carpal bones and long bones, which are still developing in children and young adults. The 2-D radiographic registrations have two major drawbacks. First, they imply an exposure to radiation without a clinical indication, resulting in deontological and ethical issues [3]. In some countries the use of ionizing radiation is prohibited in asylum and civil procedures [4]. Second, on plain radiographs, superposition can yield mistakes or impede allocating a developmental status to the anatomical structures of interest [5].

To counter these drawbacks, several research groups have been studying the use of MRI to register the developmental status of the considered anatomical site. Because the details necessary to study development might not be clear in routine clinical MRI, several dedicated MRI protocols have been developed. However, MRI has not been routinely used in age estimation practice yet because it remains unclear which is the optimal MRI approach.

The different MRI approaches have been reported in pilot studies and cross-sectional reference studies. Compared to reference studies of age estimation based on radiographs of developing teeth or bones, the MRI studies have two shortcomings: (1) they all included a relatively small study population, and (2) few external validation studies (with an independent test sample) on any MRI approach for age estimation have been conducted. As a result of these shortcomings, a first attempt to bring forensic age estimation based on MRI into practice resulted in large error rates [6].

To address the issue of small study population, pooling of the MRI data could be considered to increase age estimation performance. However, a review of the MRI studies is indispensable to study whether pooling is appropriate. MR images are highly dependent on the technical parameters of the MRI approach; thus, merging incompatible data would lead to wrong conclusions. Unfortunately, a review cannot address the lack of external validation studies, but it can provide an overview of the internal validation statistics (within the study population).

We conducted this systematic review with the following objectives: (1) to synthesize the MRI data for forensic age estimation in living children and young adults and (2) to provide a comprehensive overview that can guide age estimation practice and future research. We examined the following research questions:

  • How is age estimation on MRI affected by population characteristics and MRI approach?

  • How does the development of different anatomical structures, as registered on MRI, relate to chronological age in living children and young adults?

  • How reproducible is developmental stage allocation based on MRI?

  • What is the performance of age estimation based on development of different anatomical structures as registered on MRI?

  • Which anatomical structures provide the best MRI information to render a point prediction of age?

  • Which anatomical structures provide the best MRI information to discern minors from adults?

Materials and methods

Protocol design

The review protocol was drafted according to the Cochrane Guidelines for review protocols [7] and was registered in Prospero (National Institute for Health Research, York, UK), an international prospective register of systematic reviews, with registration number CRD42017061043. This project was approved by the Ghent University Hospital Ethics Committee as part of an ongoing larger project. The reporting of the systematic review complies with the PRISMA (preferred reporting items for systematic reviews and meta-analyses) statement [8, 9].

Selection of studies

We included cross-sectional observational studies. When a pilot study was published, followed by a more recent study including a larger study population, only the final publication was included for the review. When the final publication was not yet published, results of the pilot publication were considered. Furthermore, cohort observational studies were included, but we extracted results of only one moment in time to avoid bias. Case reports and case series were also included because they might provide information on minimum or maximum age per developmental stage. We excluded review articles. Furthermore, we made no restrictions based on the country of publication, language or publication date.

We considered study populations including living children, adolescents and adults up to 30 years old. After the age of 30, age estimation is no longer based on development but rather on degenerative changes [10, 11]. Moreover, we excluded studies that only included deceased individuals because MRI is influenced by body temperature [12] and motion artifacts [13].

We included MRI of any field strength studying hard-tissue development related to age. Authors should refer to the staging technique used to assess development. When measurements are made, the way of obtaining them needs to be described clearly. It was considered inappropriate to compare the age distributions within developmental stages based on MRI with those based on radiographs because it has been demonstrated that imaging-modality-specific reference data are required [14,15,16,17,18,19].

The control for age estimation performance was the chronological age. The included papers needed to provide any of these outcome measures: (1) descriptive statistics on age distribution within the different developmental stages of the considered anatomical structures; (2) probabilities of attaining certain threshold ages, diagnostic indices; and (3) statistics on the performance of the age estimation model.

Search methods

According to the described eligibility criteria, literature was searched in MEDLINE (via the PubMed interface), Embase (via the embase.com interface) and Web of Science. The search strings are reported in the supplementary material. Furthermore, we searched reference lists of included studies for additional suitable papers, and we searched papers citing the included studies, using Web of Science and Google Scholar. Finally, we searched gray literature by consulting the following study registers: the United States’ ClinicalTrials.gov, EU’s Clinical Trials Register, the United Kingdom’s ISRCTN registry and the German Clinical Trials Register (DRKS). All searches were conducted on Sept. 2, 2018.

Reviewing process and selection of studies

We conducted every step of the reviewing process independently. The first author (J.D.T.) was a reviewer throughout the whole process. Other authors (J.B., G.I.L.P., A.F.) acted as second reviewers. After a first selection of articles based on title and abstract, the authors considered and compared their selections to achieve a consensus. Of the retained abstracts, the full-text paper was checked independently for eligibility. Discrepancies among reviewers were identified at this stage and resolved by discussion to reach consensus. We kept a record of reasons for excluding studies at each step (either title and abstract, or full text). Reasons for exclusion were checked in the following order:

  • Pilot of other reference,

  • Wrong study design: review,

  • Wrong population: deceased individuals, insufficient data to differentiate within the group of 1–30 years of age,

  • Wrong intervention: MRI studying soft tissue, and

  • Wrong outcome measures: no data on age distribution or age estimation performance.

References were managed and duplications removed with endnote software. We used Covidence software (Veritas health innovation, Melbourne, Australia) for study selection. The process and the results of the literature search and study selection are presented in the PRISMA flowchart (Fig. 1).

Fig. 1
figure 1

Flowchart shows the process of literature search and study selection

Data extraction and management

Study characteristics and outcome data were extracted by two reviewers independently. Regarding study characteristics, data were collected on study population, MRI approach, staging technique, statistical analysis, and observers. Regarding age estimation outcomes, data were collected on correlation between stages and age, age distributions within stages, reproducibility of staging, regression formulas, age estimation performance, and reasons for missing data.

Table 1 Population characteristics of eligible studies grouped per anatomical site and ordered per staging technique (see Table 6 of the supplementary material) [2, 15, 16, 19, 22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72]

When multiple records were identified of the same study, they were collated so that the unit of interest in the review was the study, rather than each record.

Data analysis

We compared the results from data extraction to detect trends relevant to the research questions. To do so, we summarized age distributions within stages into graphs, combining data from multiple studies with similar approaches. Then, we checked whether meta-analysis of those data would be appropriate.

Quality assessment of studies

Two reviewers independently assessed risk of bias by using a dedicated tool based on the EPOC overview [20] and QUADAS-2 [21]. Questions were phrased in such a way that the preferred approach corresponded with answering “yes.” In case the answer was “no,” the reasons for high risk of bias were elaborated.

When information was missing in a paper, only graphs were reported or clarification was needed, our reviewers contacted the authors by e-mail or telephone. When the authors did not provide additional data, but graphs were reported in their paper, we extracted data from the graphs using calibration and the measuring tool in Photoshop CS2 (Adobe, San Jose, CA). In cases of missing values from images not being assessable, we registered the reasons. Moreover, we evaluated whether missing values depended on age, imaging sequence or research group.

We checked methodological heterogeneity by comparing biological origin of participants and types of MR sequence. Statistical heterogeneity was taken into account by comparing the different types of statistical analysis that were used.

Results

The essential results are discussed in this section, while the supplementary material includes additional considerations, overview tables and graphs.

Selection of studies and data

Figure 1 displays the selection process, whose details are elaborated in the supplementary material.

Characteristics and quality of included studies

Results on age distribution were affected by the study characteristics displayed in Table 1 (and Tables 5 and 6 in the supplementary material) [2, 15, 16, 19, 22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72]. In those tables, studies are grouped according to anatomical site from head to toe: skull, teeth, chest, upper limb, hip and lower limb. Note that only one study has been published that integrates information from several anatomical sites into one age estimate (multi-factorial age estimation, as opposed to single site age estimation) [34].

Table 1 displays the population characteristics. Most studies included European (Caucasian) populations. In addition, there were limited studies including African, Asian and Latin American populations. Healthy volunteers or athletes were recruited prospectively or patient records were searched retrospectively, excluding pathology. Only one study included patients with possible growth disorders, but that study’s focus was on the agreements between radiograph-based bone age and MRI-based bone age, rather than on chronological age [19]. Furthermore, the age range of the study populations varied widely, with some studies only including minors, while others included participants from birth to age 30.

The included scanning protocols used scanners with field strengths from 0.2 tesla (T) to 3 T (Table 5 of the supplementary material). The low field open scanners did not render the highly detailed images necessary for staging and substaging of both the epiphyseal and physeal development, but they allowed for assessing individual bone development of the hand/wrist [36, 39]. Conversely, to study developing teeth and clavicles, 3 T appears to be necessary [35, 41, 47,46,47,50, 73, 74].

T1-weighted sequences were most frequently used to study bone development, whereas for teeth, T2 sequences were most frequent. The voxel size of those sequences varied widely. Retrospective studies mostly lacked specifics on this, but some reported slice thicknesses ranging from 2 mm to 4 mm. In-plane resolution was never lower than 1.0×1.0 mm2 and high resolutions were reached in all anatomical sites, with a minimum of 0.188×0.188 mm2 [45]. Unfortunately, the study with the highest resolution [45] did not report the acquisition time. Because 6 min 30 s could be considered the maximum acceptable acquisition time [75], only the teeth and the iliac crest exceeded this threshold.

Regarding dental development, the first staging techniques were based on radiographs [76, 77]. However, because the cemento-enamel junction is indiscernible using the reported MRI sequences, these staging techniques were said to be inappropriate for MRI [73]. Consequently, an MRI-specific technique was reported (Table 6 of the supplementary material) [47, 73].

Regarding bone development, staging techniques were developed based on radiographs and CT. In contrast to the dental staging techniques, the criteria for staging bone development did not include tissues that are indiscernible on MRI. Therefore, the staging techniques could integrally be applied to MRI (Table 6 of the supplementary material). Moreover, they could be grouped when their stages overlapped. The most elaborate staging technique (Table 2) was developed by a German research group and combined stages [78], substages [79] and advanced substaging [80]. When applicable, other staging techniques were transposed to this staging technique to compare studies (Fig. 5 of the supplementary material).

Table 2 Descriptive criteria for developmental stages of long bones on magnetic resonance imaging

In a minority of included papers, regression was used to relate ordinal staging data to age. Most papers only reported descriptive statistics on age per stage in tables. Those statistics were summarized in Fig. 5 of the supplementary material and will be elucidated further on. Furthermore, a few papers applied Bayes rule to nuance the age estimation, which has been stated to be more appropriate than linear regression [46, 47, 58, 72, 74]. Finally, advanced machine learning was applied to estimate age in two papers, but no details on the statistical approach were reported [34, 44]. The latter studies, together with four others, applied cross-validation [30, 34, 44, 46, 47, 74], while one study tested results on a validation sample [31].

Risk-of-bias assessment

Bias was a major concern in almost all included studies (Table 7 of the supplementary material). Selection bias was caused by including elite football players, who might be advanced in their development [33], or by including patients in whom developmental disorders could not be ruled out [19, 23, 24, 29]. Furthermore, the small study samples resulted in an uneven distribution among age categories [2, 16, 19, 22, 25, 26, 29, 33, 35, 39,38,39,40,41,42,45, 51, 53, 54, 58,57,58,59,60,61,64, 72], or frequencies per age were not reported [23]. Retrospective studies did not report the biological origin of the population, while some prospective studies included different ethnic groups [2, 35, 43], or only a few individuals of another ethnic group [51]. Moreover, few studies reported the socioeconomic status of their study participants. Other sources of bias are elaborated in the supplementary material.

Because of the highly biased nature of most studies, we decided not to conduct meta-analysis on the age distributions per stage. Moreover, it remains unclear whether data from an anatomical site can safely be pooled when the MRI sequences are not identical. To date, only one study compared scanning protocols in the same individuals, but that study’s sample was too small to draw strong conclusions [15].

Quantitative synthesis

Statistics were extracted from boxplots for the following references: [2, 58, 67, 72]. Moreover, the following authors provided additional data: Jopp [42], Auf der Mauer [45], De Tobel [46, 47, 73, 74], Tscholl [43] and Urschler [19, 44].

To provide a clear overview, statistics on age distributions per stage were displayed in boxplots (Figs. 4 and 5 of the supplementary material). Note that some boxplots (in early stages) fall entirely below the 18-year threshold, while others (in late stages) lie entirely above the threshold. Cut-off stages for these absolute statements regarding childhood and adulthood are summarized in Table 3.

Table 3 Absolute statements regarding the age threshold of 18 years

To quantify reproducibility, different statistics were used, with a majority of studies reporting reproducibility statistics >0.80 (Table 8 of the supplementary material). However, several studies on clavicular development indicated that staging was less reproducible than at other anatomical sites [35, 71, 74]. Furthermore, for all anatomical sites except the ankle, at least one study reported considerably lower values than 0.80 [19, 35, 40, 41, 53, 67, 74]. No relation between those lower values and MR-sequence or staging technique seemed apparent.

Regression formulas were reported in several studies [25, 30, 36, 60, 70]. Corresponding coefficients of determination ranged from 0.40 [25] to 0.85 [36]. When statistical models were applied to estimate age, two aspects were considered to quantify age estimation performance: (1) the point prediction of age with its uncertainty, and (2) the ability to discern minors from adults.

The first aspect is reflected by the mean absolute error and root mean squared error. Only a few studies reported mean absolute error. For females, mean absolute error reached 2.0 years studying third molars [47]. For males, it reached 1.7 years studying third molars [47], 0.9 years studying the left hand/wrist [44] and 1.1 years combining third molars, both clavicles and the left hand/wrist [34]. Not gender-specific, mean absolute error reached 2.0 years studying both clavicles [74] and 1.8 years studying the left wrist [46]. Moreover, the effect of large differences between chronological and estimated age was quantified by the root mean squared error in three studies: for females root mean squared error was 2.4 years and for males 2.1 years, studying third molars [47], whereas it was 2.6 years studying both clavicles [74] and 2.2 years studying the left wrist [46]. The latter values were not gender-specific.

The second aspect is reflected by predictive probabilities to be younger/older than 18 and by diagnostic indices (Table 4) [15, 30, 34, 46, 58, 63, 71, 72, 73]. For diagnostic indices in the current review, reported statistics were recalculated so specificity would reflect the proportion of correctly classified minors, whereas sensitivity would reflect the proportion of correctly classified adults. Overall, the sensitivity was higher (ranging from 83% to 100%) than the specificity (ranging from 66% to 93%), whereas the reverse is desirable in forensic age estimation. Still, the reported predictive probabilities to be a minor were very low for the final stages of development, with values less than 1% for third molars and clavicles.

Table 4 Ability to discern minors from adults [15, 30, 34, 46, 58, 63, 71, 72, 73]

Discussion

Characteristics and quality of included studies

Included studies showed high risk of bias, mainly because of their study population. Because a wide age range of ages was studied, from birth to age 30, large reference populations are indispensable. It has been suggested that at least 10 participants per age category of 1 year, per gender, should be included for each anatomical structure [1]. Moreover, the age range of the study population affects lower and upper limits of age distributions within developmental stage, as well as the mean age. This phenomenon is called “age mimicry” and has been a major issue in age estimation for decades [81]. Ideally, a reference study should include participants with an age range starting several years before the studied anatomical structure starts its development, and ending several years after the structure has reached full maturity. For instance, an ideal reference study on third molars’ development might include participants between 6 years and 28 years old [82]. Unfortunately, these ideally designed studies are scarce even using radiographs, which can easily be done retrospectively. Therefore, it seems self-evident that, in the case of MRI, those ideally designed studies would be rarer still. Only for the clavicles’ sternal end did two studies encompass the entire development with lower and upper age margins beyond developmental changes [50, 74]. For other structures, pooling the data of different studies might address this issue, but before this is done, it needs to be ascertained whether it is safe to pool data obtained with different MR sequences. After all, it has been demonstrated that age distributions within stages might differ among sequences for third molars [15] and for the left wrist (Fig. 2) [46]. In the latter study, applying the model derived from one MR sequence to assessments of the other sequence resulted in a markedly worse age estimation performance [46]. Moreover, different sequences might lead to different staging techniques, impeding the pooling of data [53, 54]. On the other hand, different sequences might provide complementary information, to allow for a more nuanced age estimation [46, 55].

Fig. 2
figure 2

Coronal wrist MRI in a 17.9-year-old male. a T1-W spin-echo sequence shows partial bridging of the physeal plate. Stage 3b was allocated. The chemical shift artifact causes a widened appearance of the remaining physeal plate. b T1-W gradient echo volumetric interpolated breath-hold examination sequence shows more advanced bridging of the physeal plate. Stage 3c was allocated. Fat suppression avoids the chemical shift artifact, causing a more tight delineation of the physeal plate

Compared to age estimation studies using radiographs, MRI study populations were relatively small, which could be attributed to the MRI technique. Because developmental stages are based on details, such as bone bridging and apical closure of teeth, routine clinical MRI is mostly not suitable for age estimation. For instance, a thorax MRI is not suitable to study clavicular development, and neither is a maxillofacial MRI suitable to assess the apex of third molars. Only larger anatomical structures, such as knee and ankle bones, show sufficient details on clinical MRI. This also explains why only those structures have been studied for age estimation in retrospective studies [16, 23, 25, 26, 28, 29, 31, 32, 53, 54, 59, 61,60,61,64, 70, 72]. Smaller structures require a dedicated scanning protocol, with a dedicated coil and sufficiently high in-plane resolution (Table 5 of the supplementary material), and thus require a prospective study design. Still, such prospective studies have been conducted and it should be investigated whether their data can safely be pooled to create a large reference study.

Ethnic differences among populations have been studied using radiographs. Conclusions vary, with some authors claiming that inter-individual variability within ethnic groups is larger than inter-ethnic variability [82,83,84,85,86] and others claiming that socioeconomic status is a more relevant factor than ethnicity [87]. By contrast, differences among ethnic groups have been demonstrated, too [88, 89]. Presumably, trends in those studies also apply to MRI, but ethnic differences have only been studied for hand/wrist MRI [2, 43, 60, 67]. Moreover, these studies were only conducted in football players, who might be more advanced in their development than a general population of the same age [90,91,92,93]. After all, their advanced development might be part of their talent, i.e. their advanced development might contribute to better performance in sports. Thus, they might be scouted at an earlier age and be more likely to move on to elite sports. The study by Sarkodie et al. [33] in 2018 was excluded for quantitative analysis because it only included elite football players. At the other end of the spectrum, skeletal development in gymnasts might be delayed, allowing more elasticity at a relatively older age [92, 93]. Maybe different standards should be applied to athletes, to take into account their possible advanced or delayed skeletal age.

Staging techniques and statistical processing

MRI-specific staging techniques have been developed [47, 53, 55], but no comparative studies were conducted among staging techniques. Moreover, two studies on clavicle MRI have raised concerns about possible confusion between Stage 1 and Stages 4/5 [71, 74]. The authors advise others to discard clavicles in those stages for age estimation and to assess development of other structures instead.

Remarkably, only one study [34] has combined the information of three anatomical sites into one age estimate. Other groups have studied different anatomical structures in the same individuals but did not report how to combine them. From studies using radiographs, it has been demonstrated that linear regression takes on statistical assumptions that do not hold for age estimation [94]. Neither should conditional independence be assumed [95]. Otherwise, artificially narrow uncertainty intervals of the point prediction and artificially high probabilities of being a minor or an adult could cause the judicial evidence to appear stronger than it really is [81, 95, 96].

Quantitative synthesis

Bone development has been studied with MRI at most joints of the appendicular skeleton. The only site of the axial skeleton that has been studied is the spheno-occipital synchondrosis. Combined, these anatomical sites cover development from childhood to adulthood. By contrast, dental development has only been studied with MRI in molars, while in children up to age 14 the development of other permanent teeth is essential to estimate age [97, 98].

The graphs (Fig. 5 of the supplementary material) revealed some remarkable concerns about how stages relate to age. First, only few anatomical sites and staging techniques have provided a steady increase of age with increasing stage, with all participants in the first stage well below the 18-year threshold and those in the final stage well above it in both genders. They were Dedouit staging of the distal femur and Vieth staging of the distal femur (Fig. 5k-n in supplementary material). De Tobel staging of the lower left third molars came close, but the minimum ages of the final stage were still close to 18 (Fig. 5d in supplementary material).

Second, the high maximum ages in Stage 1 of clavicular development and the low minimum ages in Stages 4 and 5 suggest that those stages might be confused (Fig. 5e in supplementary material), as was pointed out in the original studies [71, 73]. This hinders a logical increase of age with an increase in stages.

Third, although in wrist MRI, Dvorak Stage 1 coincides with Schmeling Stage 2, Dvorak Stage 1 has never been reported above the age of 18, while Schmeling Stage 2 has been reported in one male of 18.6 years old (Fig. 5g-h in supplementary material) [58]. At the other end of the spectrum, in third molar MRI, De Tobel Stage 8 coincides with Demirjian Stage H. The first has not been reported below the age of 18, while the latter has in males (Fig. 5c in supplementary material) [41, 48].

Fourth, the influence of the study populations’ age ranges is obvious. For instance, Fig. 5e (in supplementary material) demonstrates that the boxplots of the male participants in Vieth et al. [49] (2014) are situated at the upper ends of other studies’ boxplots for lower stages, while they are at the lower end of other studies’ boxplots for higher stages. This can be explained by the narrow age range (5 years) of participants in Vieth et al. [49]. The same applies to Schmidt et al. [56] (2015) in Fig. 5h (in supplementary material). Fifth, the iliac crest does not seem useful for age estimation because ages within stages all overlap [66]. However, this study had high population bias, with the same narrow age range of participants as Vieth et al. [49] and Schmidt et al. [56].

Finally, the introduction of substaging was clearly an attempt to provide more accurate age estimation around the age of 18. This provided a more gradual increase of age with increasing stage than the main stages.

However, there is more to certain staging techniques than the graphs revealed. Some MRI-specific characteristics of skeletal structures have been studied, but their relevance to age estimation remains unclear. The threefold stratification sign was stated to be useful by Timme et al. [57] while De Tobel et al. [46] could not confirm its use. Other signs such as the metaphyseal stripe [23], the oreo sign and the crack sign [31] still need to be explored in future studies.

Furthermore, considering how stages relate to age, correlation coefficients and coefficients of determination need to be interpreted cautiously because they depend on the age distribution of the study population. Relatively high coefficients have been reported for single-site age estimation based on MRI. Still, they are expected to increase by multifactorial age estimation, as has been demonstrated for multifactorial age estimation based on radiographs and CT [99,100,101,102,103,104]. Although only one study on multifactorial age estimation based on MRI has been published [34], all researchers in this field prefer multifactorial age estimation over single-site age estimation [105]. However, no study has been published on how the MRI information of the different sites can be combined appropriately for age estimation. Štern et al. [34] in 2017 combined all four third molars, both clavicles and the left hand/wrist. Unfortunately, the statistical approach of their network remains to be elucidated. This combination of third molars, clavicles and hand/wrist complies with international recommendations but is only partly supported by the current results of the review. Table 3 suggests that in females, combining third molars, the left hand/wrist and the knee might render a more robust model for age estimation. For males, combining third molars, the proximal humerus and the knee might be ideal. However, in practice, a uniform approach for both genders is desirable.

Another major concern regarding age estimation based on MRI is the low reproducibility of staging that has been pointed out by some authors (Table 8, supplementary material). An obligatory quality control of centers that perform age estimation is still lacking, resulting in large discrepancies among results from different centers [106]. This already affects the current gold standard of age estimation, using radiographs, and its effect might be even larger using MRI, considering the complexity of interpreting different MR sequences. Therefore, staging development should be based upon a consensus of experts. These experts should be experienced in age estimation and in interpreting the imaging modality at hand.

To solve this problem, automated approaches have been developed to assess radiographs for age estimation [107, 108]. Because validation studies support the use of these approaches, they are applied in current age estimation practice [109]. Such an automated approach has been developed and optimized for MRI but still needs to be validated [34, 44, 110,111,112,113]. Moreover, should the same automated approaches be used internationally, discrepancies among age estimation performed at different institutes would, presumably, be eliminated [114, 115].

Few MRI studies have developed models for age estimation and reported statistical measures of age estimation performance. Remarkably, the same applies to radiographic studies. Studying radiographs of third molars, Thevissen et al. [94] reported a mean absolute error of 1.1 years. Knowing that their study population included 2,513 participants, one might presume that such a mean absolute error value would also be reached by larger MRI studies. Note that this value is almost equal to the one reached by the multifactorial age estimation MRI study by Štern et al. [34]. Therefore, the limiting effect of the small study populations in MRI studies might be overcome by the study of multiple anatomical sites with MRI. Furthermore, note that studies applying Bayes rule to estimate uncertainty of the point prediction are not hampered by age mimicry, and counter false assumptions made when linear regression is applied [95]. Therefore, confidence intervals from those studies should be preferred over those obtained from age distribution tables or regression.

Similar to the better (i.e. lower) mean absolute error, the proportion of correctly classified minors is better (i.e. higher) for multifactorial age estimation than for single-site age estimation. This has been demonstrated for MRI [34] as well as for radiographs [116].

To combine the information of different anatomical sites for forensic age estimation, two approaches have been put forward. The first approach — called the minimum age principle — is based on descriptive statistics of the age distributions within stages, reported in reference studies [105]. The combined age estimation is an interval (Fig. 3a). For the lower border of the interval, the highest minimum age is retained because, for that anatomical site, no individuals younger than that age have been reported. For the upper border of the interval, the lowest maximum age is retained because, for that anatomical site, no individuals older than that age have been reported.

Fig. 3
figure 3

Male case example of two methods for multifactorial forensic age estimation in practice. a Minimum age principle. Three anatomical sites are assessed. For the third molars and the wrist, only one anatomical structure is considered. For clavicles, both clavicles are assessed and in case of different stages between left and right, the most advanced clavicle is selected. The boxplots show the age distribution for the allocated stage per anatomical site, based on a reference study. The whiskers show the minimum and maximum ages, the box the first and third quartiles, and the central line the median. The combined age estimation is an interval: (1) the highest minimum age is retained because for that anatomical site no individuals younger than that age have been reported, and (2) the lowest maximum age is retained because for that anatomical site no individuals older than that age have been reported. In this male example, the interval was [18.60;19.88]. b Continuation ratio model with Bayesian correction for violation of the conditional independence assumption. Three anatomical sites are assessed. For third molars, all four third molars are taken into account. For the wrist, the distal radius and ulna are taken into account. For the clavicles, both are taken into account. Thus, the curves per anatomical site already combine the information of the different anatomical structures per site. The curves show the posterior densities of age for the allocated stages to all anatomical structures per anatomical site, based on a reference study. The combined age estimation is defined by the combined curve, providing the following statistics: point prediction, 95% prediction interval, and the probability to be an adult. In this male example these statistics were 19.03 years old, [16.57;22.00], and 0.709, respectively

The second approach is also based on the age distributions within stages, albeit incorporated in a statistical model [95]. Posterior density curves of age are obtained using a continuation ratio model with Bayesian correction for violation of the conditional independence assumption. The combined age estimation is defined by the combined curve, providing the following statistics: point prediction, 95% prediction interval, and the probability to be an adult (Fig. 3b).

There is no legislation on which approach should be applied or which statistics should be reported. Moreover, the magistrate who decides a case is free to interpret the findings. For instance, when the age estimation interval of the first approach is close to the threshold of 18 but does not contain it (Fig. 3a), then the magistrate might decide to grant the benefit of the doubt and consider the individual as a minor. Similarly, when the second approach renders a probability to be an adult equal to 0.706 (Fig. 3b), then the magistrate decides whether this is sufficient to consider the individual as an adult. Therefore, it is up to the forensic expert who conducts age estimation (e.g., radiologist, odontologist) to be transparent and clear in the report, and to motivate and nuance the findings as much as possible. Moreover, to minimize the effect of interobserver variability, at least two experts should reach a consensus about the age estimation.

Strengths and weaknesses

This systematic review provides a comprehensive overview of literature that is available on age estimation based on MRI. It puts the studies into perspective, allowing medical professionals to decide on which approach seems the most valuable in their casework, and allowing judicial advisors to interpret the evidential value of the age estimation results. According to the PRISMA guidelines, all steps of the review were independently conducted by two reviewers, to avoid errors in the reported data.

However, this review also has two limitations. First, the search string did not include a part on “development.” Instead only “age estimation” and its variants were used. Therefore, there remains the possibility that studies on development were missed, which may, in turn, have highlighted other MRI-specific signs that might be of interest to age estimation. On the other hand, the encountered studies on development — without a focus on age estimation — were excluded from quantitative analysis because their data were not sufficiently extensively reported. Second, pooling of the data was considered inappropriate because of discrepancies between the MRI approaches and the staging techniques. New studies are necessary to compare the age distributions within stages using different MRI approaches in the same population.

Future prospects

The use of MRI for forensic age estimation has been intensively studied since 2007 because of its major advantage of avoiding ionizing radiation. In its most recent Practical Guide on Age Assessment [114], the European Asylum Support Office stated that “radiation-free methods should be applied first, and only as a last resort can other methods involving radiation be considered.” However, in the European Commission’s Science for Policy Report by the Joint Research Centre [117], the authors stated that “more studies should be conducted with MRI instead of CT in order to increase the available knowledge base.” Consequently, despite the large number of studies discussed in this systematic review, MRI has not found its way into age estimation practice. Thus, the considerations from this review should be taken into account when future studies are designed and when MRI would be taken into practice for age estimation. In particular, the following recommendations can be made:

  • Larger reference populations are desirable. Because the prospective nature of studies impedes a fast expansion of reference data, it would make more sense to try to combine the data of different research groups. However, because small differences exist between MRI approaches and between populations, comparative studies are needed to check whether the data can be pooled safely.

  • Multifactorial age estimation seems to improve age estimation performance, as has been demonstrated using an automated age estimation method. Because most MRI data are based on staging of development, studies are needed in which that staging information is combined using an appropriate statistical approach.

  • Several research groups have collected MRI data at different anatomical sites, in the same individual, on the same day. Those groups could try to combine that information to create age-estimation models, taking into account the possible conditional dependence.

  • It remains unclear whether data from different anatomical sites could be combined safely to create age-estimation models, when those data were not collected in the same individual. This could be studied as soon as results from studies complying with the former recommendation are available.

Because the intervention of interest was MRI, results of the initial search included many studies on brain development and degeneration. However, in literature on age estimation in children, adolescents and young adults, the developing brain is generally not considered. After all, structural changes in the brain are mostly studied in older patients, when degeneration occurs related to age (or disease). However, changes in the developing brain might be useful for age estimation in younger individuals. Another strength of MRI is the possibility of studying dynamic changes in the body, such as diffusion in the brain or blood flow in the heart [118,119,120,121,122,123].

Therefore, because inter-individual variation remains a challenge in age estimation, adding soft-tissue information might allow for a more nuanced age estimation than that based solely on hard-tissue information. Moreover, studying functional and anatomical age-related changes in a research context is justifiable because of the lack of ionizing radiation. MRI even enables longitudinal evaluation of the changes over the years in an ethically justifiable way. However, the bridge between hard- and soft-tissue development remains unexplored.

Conclusion

Single-site age estimation using MRI has been studied extensively, providing several reference studies, which all included a relatively small study sample. Although a review might solve the issues of small study samples and disparities in their age distributions by pooling the data, this was not appropriate because of the wide variety in study characteristics. Furthermore, the current review highlighted that age estimation performance was better for multifactorial age estimation than for single-site age estimation. As a next step in the field, more multifactorial age estimation studies with MRI are imminent because MRI avoids the use of ionizing radiation and, consequently, allows for the study of multiple anatomical sites. The current review results can guide those multifactorial age estimation studies. Moreover, this review can help medical professionals to decide on the preferred approach for specific cases, and it can help judicial professionals to interpret the evidential value of age estimation results.