Introduction

During the last few years, a growing section of forensic anthropology concerns the study of living individuals and, in particular, the problem of identifying and estimating age in living young adults for legal purposes [13]. Verified documentation of the date of birth is the only way of determining the exact age of an individual. However, in subjects who do not possess proper identification documents, it is of the utmost importance to verify whether these persons should be accepted as juveniles or adults [16]. This is so not only in criminal prosecutions but also in civil hearings, including determination of refugee status. Although age thresholds change according to country of interest, in most countries, the age of relevance to criminal liability ranges between 14 and 18 years [79].

To estimate the age of an individual for whom proper documentation is not available, forensic practitioners must rely on population data concerning growth and development [1013]. These data enable the creation of a biological profile of the developmental status of an individual of unknown age, thus leading to an estimate of age. This profile is based on assessment of skeletal and dental anatomical growth markers [9, 1418]. The search for the best anatomical growth markers which provide the most reliable age assessments has been ongoing for decades [1].

It must be stressed that biological variations in human development mean that any assessment of age based on analysis of anatomical growth markers is only an estimate and will, thus, contain a degree of error. The size of that error depends to a great extent on the anatomical markers used, the statistical methods applied to construct an age estimation table or formula based on those markers, and also on the developmental characteristics of the population assessed [1921].

Forensic practitioners have a variety of age estimation methods available to them [1]. These include physical examination by a suitably qualified medical practitioner; assessment of the degree of dental development, which should include inspection of the oral cavity and analysis of panoramic radiographies; and assessment of skeletal maturity by radiological analysis of the left hand [22]. In addition, if it is uncertain whether the subject has reached the age of 21, conventional radiological, CT (computed tomography) or ultrasound examination of the clavicles should also be performed to expand the analytical spectrum. Depending on the question to be answered, these methods may be more or less reliable [13].

Nowadays, a new research focuses on the development of multi-factorial methods for age estimation, whereby a combination of methods (or anatomical growth markers) can provide the most accurate age estimate with the smallest possible error [1]. The difficult task for forensic researchers assessing age for the purposes of the legal system is still to convey accurately the degree of error inherent in whatever method is used and to clarify the likelihood that an individual below 18 years of age will be wrongly classified as an adult [1, 2, 8]. New techniques have been developed for older teenage individuals, which can provide age estimates with substantially narrower age ranges at the 95 % confidence level [2326].

From the mid-teens to the early twenties, there are only a few reliable predictors of age of individuals, and dentition is one of them [27, 28]. In childhood, observation of dentition status results in highly accurate age assessment. However, this accuracy decreases as a person's dental development is completed. As a person grows beyond these years, developmental variability increases as the effects of the environment begin to outweigh those of genetics, thus making age estimations far more precise in early childhood and relatively imprecise during adolescence [29]. At around the age of 14 years, most teeth are fully developed, and age estimation becomes increasingly difficult. Generally, at this stage, after formation of the premolars and canines, the third molars are the only teeth still developing. Hence, although development of the third molar may be unpredictable and the tooth itself has a relatively high incidence of agenesis, the third molar has been the subject of great interest as a predictor of 18 years of age and status of majority [3032].

Although the hand–wrist region and third molar development have received the greatest attention in the assessment of age [3335], radiological analysis of the epiphyses of long bones can overcome the deficiencies of non-adult skeletal collections and allow researchers to obtain contemporary information on epiphyseal union. The knee represents an ideal anatomical area which can also provide new information and which can be used in combination with the above well-established techniques in order to maximise age estimation accuracy. In this study, the knee joint was selected for several reasons. It is an anatomical area which can easily be radiographed at low radiation doses, with no interposed anatomical structures; the knee can also be easily positioned for antero-posterior radiographs; and, last but not the least, it yields information for three epiphyses at the same time, i.e. the distal femur, proximal tibia and proximal fibula.

There are many anthropological studies of skeletal maturation of the knee, based on dry bone and X-ray examinations, but they differ with regard to numerous variables: study population, gender, number of individuals, age range and number of bone fusion stages [3641].

The most exhaustive anthropological study of the knee was that of McKern and Stewart, who examined the bodies of American soldiers who had died during the Korean War [36].

On the other hand, radiographic data on the development of the knee had been collected during a number of longitudinal studies. This has resulted in a number of reference standards which can be used in age estimation, and all are based on the assessment of radiographs [37, 42, 43].

The most exhaustive in terms of skeletal maturation was that of Pyle and Hoerr [37]. This atlas was developed from data collected during the Cleveland Study which ran in North America in 1926. The longitudinal study into child growth involved the collection of anthropometric data and radiographs from birth through to 21 years of age. In total, the data of 4,483 children formed the dataset which was also enhanced by radiographs of children collected in Boston by Dr. Harold C. Stuart [37]. The results were collected in an atlas published in 1955, consisting of X-rays of male and female subjects and including antero-posterior and lateral incidences of the right knee. The atlas method differs from that presented in the previous two approaches since it is not based upon a scoring system but is based on finding the best match between the radiographic image of the subject and the series of radiographs displayed in the atlas.

Although a wealth of knowledge is apparently provided, it is extremely difficult to find a consensus view on the definable stages of change between no mineralized bridge formation, non-union and complete ossification. Unfortunately, the authors do not agree on the definition of complete fusion; some take into account the persistence of an epiphyseal scar, some do not [3741].

The aim of the present study was to develop a radiographic scoring system for the assessment of epiphyseal union at the knee joint in order to estimate the likelihood of having attained 18 years of age in an Italian sample.

Materials and methods

Sample

This is a retrospective cross-sectional study of the relationship between epiphyseal fusion at the knee joint and chronological age. A sample of knee radiographs from Macerata Hospital (Italy) was studied. All X-ray projections were obtained with the same X-ray device (technical data of X-rays equipment and settings were as follows: FFD 110 cm, no grid, 110 mA, 7 mAs, 55 kV, CR Kodak® software, multifrequency processing). They belong to 215 healthy subjects (99 females and 116 males) aged between 14 and 24 years. Table 1 lists the age and gender distribution of each age category. About 55 % of the individuals had attained 18 years. Upper and lower age limits for the sample were derived from an extensive review of the literature and were refined during the data gathering process. All individuals had an antero-posterior radiograph of the knee available for assessment.

Table 1 Frequency distribution by gender and age cohort

Radiographs were excluded from the study if they showed fractures or dislocations involving the growth plate, medical fixatives (i.e. surgical implants or casts) near the diaphyseal–epiphyseal junction or if they had been obtained from subjects with a medical history of chronic disease which may significantly alter skeletal development. Radiographs of subjects with known endocrine, metabolic or nutritional disorders were also excluded from the study. Protocols to collect radiographs for human subjects were approved by the Ethics Committee for Research Involving Human Subjects of the University of Macerata (Italy), and the study was conducted in accordance with the ethical standards laid down by the Declaration of Helsinki. The World Medical Association developed the Declaration of Helsinki as a statement of ethical principles for medical research involving human subjects, including research on identifiable human material and data. The chronological age of the individual was calculated by calculating the difference between date of the image and the date of birth.

When necessary, digital images were submitted to contrast enhancement for easier interpretation. The following information was recorded for each subject: (1) date of birth and date of radiograph; (2) gender, as reported on the radiograph; (3) ancestry, reported by either self/family or visual evaluation by hospital staff; and (4) stage of fusion for each epiphysis. However, the observer was blinded to information on the subject's gender and age. Based on the combined use of these criteria, the observer was asked to assign a stage of union for each of the epiphyses of the distal femur, proximal tibia and proximal fibula.

Radiographic staging

To assess the degree of ossification of the distal femur, proximal tibia and proximal fibula, classification into three stages for each bone was applied as follows: stage 1, epiphysis is not fused (Fig. 1); stage 2, epiphysis is fully ossified and epiphyseal scar is visible (Fig. 2); stage 3, epiphysis is fully ossified and epiphyseal scar is not visible (Fig. 3). Distal femoral, proximal tibial and fibular epiphyses were separately evaluated according to these stages.

Fig. 1
figure 1

Stage 1: epiphysis is not fused

Fig. 2
figure 2

Stage 2: epiphysis is fully ossified and epiphyseal scar is visible

Fig. 3
figure 3

Stage 3: epiphysis is fully ossified and epiphyseal scar is not visible

Scores of 0, 1 and 2 were assigned to stages 1, 2 and 3, respectively. Lastly, the score related to the epiphyseal fusion at the knee joint (SKJ) was obtained by adding the three scores of the distal femur, proximal tibia and proximal fibula. In cases of uncertainty during an observation, the lower stage was retained, according to in dubio pro reo, i.e. the principle of benefit of the doubt used in criminal proceedings.

Several subjects in this sample had radiographs of both knees available for assessment. However, in this study, knee laterality was not taken into account and only the right knee joint was used for the analysis. Previous studies indicate that epiphyseal union usually occurs at the same time on both sides [36, 4446]. Auerbach and Ruff [47], in their study of limb bone bilateral asymmetry in modern humans, showed that all asymmetries are more pronounced in the upper limb, favouring the right side. Conversely, most lower limb dimensions, especially those around the knee, show much less asymmetry [47]. This information is extremely useful if only the epiphyses of one side are available, as may occur in some forensic investigations.

Statistical analysis

Subjects' radiographs were analysed by the same author (RC). In order to evaluate intra-observer reliability, the observer made repeated measures of 30 images at an interval of 2 weeks. Inter-observer error was assessed based on a random sub-sample of radiographs (30 images), which were assessed by three investigators, one odontologist and two physicians with many years of experience in the field of radiology, anatomy and forensic research. Cohen's kappa non-parametric test was used to evaluate intra- and inter-observer agreement.

With individual age as a dichotomous response variable (E = 1 if an individual is at least 18 years of age, E = 0 otherwise) and with gender and score of epiphyseal union at the knee joint (SKJ) as predictor variables, a generalised linear model was developed to predict whether an individual is older (E = 1) or younger (E = 0) than 18 years of age, with a logistic model as link function.

The significant variables between gender and SKJ were used to test the medico-legal question as to whether an individual is older or younger than 18 years of age. The predictive accuracy of the model was assessed by determining the receiver operating characteristic (ROC).

ROC curves provide three important ways of examining the efficacy of a test [48, 49]. First, they can evaluate the overall ability of an indicator variable to make correct classifications, as in this study, of whether an individual is older than 18 (adult age) or not. Their shape shows the discriminatory ability of the indicator variable examined. The area under the curve (AUC) is widely recognised as the measure of the discriminatory power of a diagnostic test. The maximum value for the AUC is 1.0, thus indicating a (theoretically) perfect test (i.e. 100 % sensitive and 100 % specific). An AUC value of 0.5 indicates no discriminative value (i.e. 50 % sensitive and 50 % specific) and is represented by a straight diagonal line extending from the lower left corner to the upper right of the graph. That is, the closer the apex of the curve to the upper left corner, the greater the discriminatory ability of the test (i.e. the true-positive rate is high and the false-positive [1–specificity] rate is low). This is measured quantitatively by the AUC, so that a value of >0.96 indicates excellent discriminatory ability. Ninety-five percent confidence intervals for accuracy were calculated with a non-parametric method [48, 49].

The second function of ROC curves is to allow direct comparison of the abilities of different indicator variables to make correct classifications through a common metric, the AUC. Lastly, ROC curves facilitate the selection of a maximum threshold value of the indicator variable which best balances sensitivity and specificity. For each indicator variable, the curve shows the trade-off between sensitivity and specificity for any potential threshold [48, 49].

The test was performed to identify a threshold (cutoff) among the seven scores of epiphyseal union at the knee joint which could be used to assign an individual to the population of those younger (T = 0) or older (T = 1) than 18. Sensitivity p1 (i.e. the proportion of individuals who had attained 18 years of age which verifies event T = 1) was evaluated, as were its specificity p2 (i.e. the proportion of individuals younger than 18 who verify event T = 0) and correct classification (test accuracy, i.e. the proportion of all correct results). Lastly, the SKJ can help to discriminate between individuals who are or are not at age 18 years or more by the positive predictive value, PPV or post-test probability of being 18 years of age or more (i.e. the proportion of individuals aged 18 or over in whom event E = 1 is verified). According to Bayes' theorem, PPV may be written as follows:

$$ {\text{PPV}} = \frac{{{p_1}{p_{{0}}}}}{{{p_1}{p_{{0}}} + \left( {1 - {p_2}} \right)\left( {1 - {p_{{0}}}} \right)}} $$

where PPV is the positive predictive value and p 0 is the probability that an individual is 18 or older, being between 14 and 24 years of age (our target population). This probability, p 0, was evaluated with data from Italian Institute of Statistics [50] and was p 0 = 0.66. Statistical analyses were performed with the R statistical program [51]. The significance threshold was set at 5 %.

Results

Age distribution gradually increases with each SKJ category for both girls and boys (Fig. 4). The mean age (±standard error) for girls and boys in each score category varies between genders (Table 2), but the differences were not significant (p > 0.11). The mean age provides an indication of the typical age at which each stage of union occurs.

Fig. 4
figure 4

Age distribution of boys (left) and girls (right) by scores

Table 2 Age in relation to scores of epiphyseal union at knee joint for girls and boys

Regarding to the repeatability and reproducibility of the method, κ statistics did not reveal significant intra-observer effects (κ = 0.974 with 95 % bootstrapped confidence interval, 0.934–0.988), indicating substantial homogeneity of evaluation between two observations of the same knee radiographs. Cohen's kappa test demonstrated no statistically significant inter-observer differences in the age estimation performed by the three examiners: good inter-observer agreement for the distal femoral epiphysis (κ = 0.89), for the proximal tibial epiphysis (κ = 0.81) and for the proximal fibula (κ = 0.80) was observed.

In order to study the extent to which the age of 18 years or more of an individual is related to SKJ and the dichotomous variable gender (1 for boys, 0 for girls), probability p, that an individual is at least 18 years of age, was modelled by a linear logistic model. To examine the effect of including or excluding one of the two factors in the model, the difference in deviance between two nested models was examined (Table 3). The change in deviance on adding the variable gender to a model which includes a constant term alone (null model) was not significant (p = 0.313). Instead, when the scores were added to the null model, the deviance was reduced by a significant amount (p < 0.001). In summary, the probability that an individual is aged 18 years or more depends on the score of epiphyseal union at the knee joint but does not significantly depend on gender.

Table 3 Deviance on fitting considered linear models to data

Both estimated probability with the logistic model and frequency distribution by the scores of individuals who had attained 18 years are listed in Table 4, which shows that SKJ values 0 and 1 cannot be used to test adult age. In fact, we found that a very small percentage of individuals in the first two scores were adults. Consequently, taking into account only the SKJ values, five tests were performed to discriminate between individuals who were or were not aged 18 years or more, considering the scores from 2 to 6 as cutoffs. Results are listed in Table 5.

Table 4 Conditional probability, p, and relative frequency, fr, that an individual with a given score has attained 18 years (expressed as percentage)
Table 5 Percentages of sensitivity, specificity and accuracy (95 % confidence interval) of test of adult age when scores of epiphyseal union at knee joint are used to discriminate between individuals who are or are not at age 18 years or more

The predictive accuracy of these tests and their discrimination capacity were also assessed by determining the ROC curves by classification matrices for different levels of predicted probability that an individual is of age. The resulting ROC curves for boys and girls are shown in Fig. 5. The values for the area under the ROC (±standard error) were 0.961 (±0.018) and 0.915 (±0.023) for boys and girls, respectively.

Fig. 5
figure 5

Receiver operating characteristic curves for “18 years of age or older” status for boys (continuous line) and girls (dashed line)

For boys, the highest value of accuracy, i.e. the proportion of all tests which give a correct result, was obtained with score 3 (Acc = 91.38 %) with high sensitivity (Se = 93.33 %) and specificity (Sp = 89.29 %). The estimated positive predictive value PPV was 94.42 %, with a 95 % confidence interval, CI (%) = (88.78; 97.31).

For girls, the highest value of accuracy was obtained with both scores 3 and 4 (Acc = 85.86 %). However, when score 4 was chosen as cutoff, the ROC curve for girls appeared closest to the point (0, 1) of the graph which represents the ideal point at which the true-positive fraction is 1.0 (sensitivity = 1.0) and the false-positive fraction is 0 (specificity = 1.0).

In addition, for forensic purposes, it is important that the test shows a low proportion of individuals younger than 18 whose test is positive (high specificity level). It therefore seemed appropriate to focus on the possibility of a false positive than on that of a false negative. The choice of score 4 to characterise adult age for girls yielded a specificity higher than that obtained when score 3 was chosen (Table 5). The PPV was 94.27 %, with a 95 % CI of 86.56 % (97.68 %).

Discussion

Many papers have been published concerning age estimation in young adults and many different methods have been developed, although some of them have been considered inaccurate [1, 52]. The choice of a particular method depends on the specific conditions of each case and mainly on the accuracy required [1]. In adolescence (14–21 years), third molars are the only teeth undergoing maturation, and they are especially attractive as a study subject because the degree of mineralization can easily be ascertained with non-invasive methods such as radiology [710]. However, several biological, ethnic and dietary motivations may substantially influence third molar growth [10]. In addition, as third molars are sometimes missing, for reasons of agenesis or extraction, estimating the age of majority based on these teeth is impossible [52].

Previous studies also examined the medial extremity of the clavicle, which has been exhaustively studied by plain X-rays, multi-slice computed tomography and ultrasound [5358], and the wrist, which has also been closely studied with the Greulich and Pyle radiographic atlas [60]. For this reason, and in order to offer a valuable alternative method for assessing 18 years of age, a different anatomical region was analysed in this study.

Although the skeletal maturation of the knee has already been studied anthropologically on dry bone [36], rare radiographic studies have evaluated its potential forensic interest in the bone maturation process [37, 38]. O'Connor et al. [38] investigated the relationship between stage of epiphyseal union at the knee joint and chronological age in a modern Irish population. They used a novel radiographic method which subdivides the continuum of development into five specific stages of union. Regarding the application of an imaging modality with no radiation risk, one MRI study of the knee concerned only the proximal tibial epiphysis [61]. Dedouit et al. [62] developed an original MRI staging system for epiphyseal fusion of growth plate maturation of the knee and evaluated its reliability and validity for age assessment of living individuals.

In this cross-sectional study, the radiographic staging system of growth plate maturation and epiphyseal fusion of the knee is analysed. The present results indicate that, in general, intra-observer variability is good for all bones (κ = 0.974, with 95 % bootstrapped confidence interval, 0.934–0.988). In previous studies [e.g. 38], intra-observer error was higher for the fibula than for the tibia and femur probably because, in some instances, a good view of the fibula may be influenced by the position of the knee, which may partially obscure the fibula. However, in these radiographs, systematic evaluation of image blackening, peripheral blackening, appropriate exposure settings and the visibility of image details and contrast increased the likelihood of good identification of all parameters.

The ROC curves indicate that radiographic study of the bone maturation of the knee joint is highly correlated with the age of majority of living girls and boys. In addition, in these final stages of ossification of the knee, no significant differences between genders were observed. This trend was found to be consistent in both girls and boys by O'Connor et al. [38], in which the correlation between age (in years) and stage of union of the femur, tibia and fibula in boys produced correlation coefficients of 0.868, 0.858 and 0.802 (p < 0.01), respectively. As regards girls, the correlation test examining the relationship between age (in years) and stage of fusion of the epiphyses of the same bones gave coefficients of 0.880, 0.877 and 0.910 (p < 0.01), respectively. However, due the different classifications and staging used to characterise the growth plate, any comparison with previous radiological studies of the knee is unfortunately difficult.

As regards absence of statistical differences between genders, it was noted that the femoral and tibial epiphyses in boys and girls exhibited similar rates of fusion. These results were not consistent with previous radiological studies in which the distal femoral epiphyseal fusion occurs between the ages of 14–18 years in girls and 16–19 years in boys [41, 63]. On the Pyle and Hoerr radiographic standards, for instance, there is definite fusion in the centre of the epiphyseal plate at 14.5 years in girls and 17 years in boys, and the growth plates are replaced by lines of fusion by 15.5 years in girls and 18 years in boys [37]. Concerning proximal tibial fusion times, the standard for completion of distal femoral fusion is 14.5 years in girls and 17 years in boys [37]. Other observations have rather later times for complete fusion, up to 17 years in girls and 19.5 years in boys. However, as with other atlases, the use of the Pyle and Hoerr [37] approach to assess chronological age raises a number of methodological issues. The data which formed the basis of the atlas were collected from children who were growing to adulthood in the first decades of the twentieth century and were described by the authors as being white children of high socioeconomic class. Additionally, the children were deliberately chosen for their good health and nutritional status [64]. This background, both socioeconomically and ethnically, is very different from that of children being age estimated for forensic purposes today. Maturational rate is known to depend on a large number of factors not least of which are the nutritional intake and the health status of the individual. The applicability of an age estimation method which was developed on a population potentially so far removed from a modern population must be therefore tested robustly if it is to be used and accepted for forensic purposes.

Analysis of the ROC curve showed that, for girls, the choice of score 4 to characterise adult age yielded specificity higher than that obtained when score 3 was chosen. In fact, when score 4 was chosen as cutoff, the ROC curve showed its point closest to the point (0, 1) of the graph, which represents the ideal point at which the true-positive fraction is 1.0 (sensitivity = 1.0) and the false-positive fraction is 0 (specificity = 1.0). For forensic purposes, it is important that the test shows a low proportion of individuals younger than 18 whose test is positive (high specificity level). From the practical perspective applied to forensic sciences, the above results are of special importance when we consider the fact that, from both a criminal viewpoint and according to undocumented immigrant acceptance policies, the age limit to adopt urgent criminal or acceptance measures is 18 years.

Regarding the responsible use of X-rays in forensic age estimations, it is indispensable not only to use dose-lowering techniques but also to take the presumed age of the person being examined into greater account [59, 65, 66]. Established doses for X-ray examinations in forensic age estimations vary from less than 0.1 mSv (left hand–wrist X-ray) to more than 800 mSv (CT) per examination [65, 67]. However, the use of conventional X-rays is not appropriate for individuals under 18, and, like the collar bone CT, it should be recommended as a method only to answer the question of whether the subject is over 18/21 years [65, 68]. Article 3 of the European Council Directive 97⁄43⁄Euratom states that when exposing an individual to ionising radiation, the net benefit to the individual must outweigh the risks [4]. Nowadays, more recent data are also available that compare radiation exposure from medical X-rays with the hazards of everyday living [69, 70]. Based on these studies, the resulting risk from using X-rays in age estimation procedures (with the exception of CT scan on sternoclavicular joints) is very low in comparison to other life risks [65]. However, the need for the moment is to minimise all radiation as far as possible and to evaluate risks from radiation when performing X-ray examinations in age estimation procedures [65].

Conclusions

The radiographic technique offers a valuable alternative as a non-invasive method of estimation of 18 years of age. In this study, very encouraging results were obtained when this technique was applied to analyse epiphyseal fusion at the knee joint. However, there are some problems in using a radiographic data set: one involves the integrity of the X-rays and the operator's experience. Although radiographic positions and exposure variables are standardised, the biological variability of tissue thickness and bone density require a certain level of experience to obtain an optimal film series. Nowadays, as hospitals move toward digital X-rays, these considerations may have less influence. The second problem involves observer experience in interpreting radiographic films. Because of the two-dimensional nature of radiographs, which causes overlaps from surrounding structures, an observer may incorrectly assess the stage of fusion, unless multiple radiographic views are available.

In view of the above difficulties, it is very important to standardise the methodology for assessment of epiphyseal union. Further studies with larger sample groups of both antero-posterior and medio-lateral incidences are therefore recommended in order to verify not only the accuracy of the method but also inter-observer agreement. The scoring technique must be clearly defined so that it can be applied by forensic researchers all over the world. As Cunha et al. [1] noted, sometimes the best methods are not those with the best published standard error but those which have been tested by many researchers on many different populations, which are suitable for a specific forensic scenario, practical, user-friendly, relatively quick and economical.