Introduction

Enlarged lymph nodes are commonly encountered in pediatric patients, especially in the head and neck region [1,2,3]. Studies have shown that children may experience lymph node enlargement at least once in their lifetime, often due to viral or bacterial infections, which stimulate lymph node enlargement and often resolve spontaneously within 2 to 3 weeks, with or without antibiotic therapy [4,5,6]. Some studies have found that up to 90% of healthy children aged 4–8 years may have palpable lymph nodes, with normal lymph nodes in children typically measuring no more than 1 cm [5,6,7,8,9] but may grow up to 1.5 cm in the groin area [2,3,4,5,6,7,8]. This condition is usually diagnosed without the need for additional specialized examinations [8].

Although enlarged lymph nodes in children are predominantly non-neoplastic, they often cause concern and prompt parental consultation with physicians [7, 8]. Abnormal lymph node enlargement can also be observed, such as lymphadenopathy due to tuberculosis infection [10], inflammation, or lymph node enlargement from malignancies in children, although the latter is rare [6,7,8]. Despite the low incidence of malignancy, a timely and accurate diagnosis is crucial for prompt treatment. The management of pediatric patients with enlarged lymph nodes typically involves obtaining medical history, physical examination, additional blood tests, chest X-rays, or ultrasound examinations to identify the cause of lymph node enlargement or to assess the likelihood of malignancy [5, 11, 12]. Patients with enlarged lymph nodes who have received antibiotic therapy without improvement for more than 4 to 6 weeks [11] or exhibit suspicious symptoms of malignancy such as prolonged fever, weight loss, or lymph nodes larger than 2 cm [3, 5] often undergo further diagnostic evaluation through lymph node biopsy, which is considered the gold standard. Fine-needle aspiration biopsy is generally not preferred in children because of the small size of the obtained tissue [2], which may lead to inaccurate histopathological interpretation [5]. Therefore, surgical excisional biopsy of enlarged lymph nodes in children often requires general anesthesia to obtain lymph node tissue for examination, which may entail risks of complications from anesthesia or surgery [12, 13].

Accurate prediction of the abnormality of enlarged lymph nodes in children based on precise medical history and physical examination can guide treatment decisions, appropriate selection of biopsy sites, reduction of the risk of unnecessary anesthesia and surgery [14], and minimization of surgical complications such as wound pain, infection, or wound dehiscence [12]. Moreover, it alleviates parental anxiety regarding lymph node conditions without the need for confirmatory biopsy and reduces unnecessary healthcare costs. Thus, this study aimed to develop a clinical prediction model to aid in diagnosing histopathological results in pediatric patients with enlarged lymph nodes, providing guidance for treatment decisions and referrals for further evaluation.

Materials and methods

Following institutional review board approval, this study adopted a differential diagnostic prediction approach and utilized a retrospective cross-sectional cohort design, drawing data from the medical records of pediatric patients aged < 15 years who underwent lymph node biopsy for the diagnosis of enlarged lymph nodes at Buddhachinaraj Hospital, Phitsanulok, from January 2012 to December 2022.

Pediatric patients aged 0–15 years who underwent lymph node biopsy due to peripheral lymphadenopathy, which can be palpated and examined, such as supraclavicular, cervical, and inguinal lymph nodes, accompanied by complete pathological results, were included. Patients with enlarged lymph nodes in the abdominal or thoracic cavities or those with masses in the neck or other body regions, where pathological results indicate diagnoses other than lymphadenopathy (masses mimicking lymphadenopathy), such as vascular malformations, branchial anomalies, pilomatrixoma, and thyroglossal duct, are classified separately.

Definition

Once patient data and histopathological results were obtained, the patients were categorized into two groups based on histopathological findings.

Benign lymphadenopathy is a condition in which the glands enlarge the lymph fluid and occurs in response to various stimuli, such as infection. Chemicals or foreign substances can lead to tissue hypertrophy, congestion, and edema. Depending on the stimulus, the lymph nodes within them can vary, such as reactive lymphadenopathy, infectious lymphadenopathy, and lymphadenitis, which are linked to clinical syndromes. Based on histopathological findings and the need for a particular treatment, we classified benign lymphadenopathy into two subgroups.

Reactive hyperplasia is a subtype of benign lymphadenopathy characterized by lymph node enlargement resulting from various stimuli such as infections, chemicals, or other foreign agents. This type of lymph node enlargement is non-specific and cannot be attributed to any specific disease. Microscopic examination revealed a reactive lymphoid hyperplasia.

In this study, benign lymphadenopathy refers to abnormal benign lymphadenopathy beyond reactive hyperplasia caused by responses to various stimuli, leading to an increase in multiple tissues within the lymph node. Examples include infectious lymphadenitis (e.g., HIV-associated lymphadenopathy, infectious mononucleosis, bacterial or mycobacterial infections), or lymphadenopathy associated with clinical syndromes (e.g., systemic lupus erythematosus, Kimura disease, Kikuchi–Fujimoto disease, and Castleman disease). It is crucial to note that these conditions may require specific treatment or clinical management, and may not be distinguishable from malignancy.

Malignant lymphadenopathy refers to lymph node enlargement caused by aggressive malignant tumors, either originating within the lymph nodes (e.g., non-Hodgkin lymphoma and Hodgkin lymphoma) or spreading to the lymph nodes from other organs (metastasis).

Data collected from the patients included general information, such as sex, age, comorbidities, and clinical data obtained from the initial medical history and physical examination conducted at the outpatient department. Specific data related to lymphadenopathy included the size of the lymph nodes (in centimeters), the number of enlarged lymph nodes, whether they were solitary or multiple, the location of enlarged lymph nodes, the duration of enlargement (in days), any associated symptoms or history such as exposure to tuberculosis, fever, fatigue, generalized lymphadenopathy, weight loss, physical examination findings such as palpable hepatosplenomegaly, and characteristics of palpated lymph nodes (e.g., firm/hard, fixed, tender, fluctuant/cystic, and rubbery). These data were analyzed to identify factors that could be used in subsequent predictive modeling using the following definition:

Anemia/fatigue: Clinically diagnosed based on hemoglobin levels below the age-appropriate reference range and patient-reported fatigue persisting for > 2 weeks.

Duration of fever:Fever lasting longer than 7 days without resolution.

Weight loss:Unintentional weight loss of > 5% of body weight over the past 3 months.

Lymph node characteristics

Firm/hard: lymph nodes that are noncompressible and maintain solid consistency upon palpation.

Fixed :lymph nodes that did not move when palpated. They adhere to the surrounding tissues, which can include the skin, muscles, or other structures.

Rubbery :lymph nodes that are elastic and somewhat compressible but firm.

Fluctuation/cystic: lymph nodes that exhibit wave-like motion when pressure is applied, indicating a fluid-filled consistency.

Size or cutoff size: lymph nodes larger than 1 cm in the neck or axillary regions and larger than 1.5 cm in the inguinal region are considered abnormal.

Statistical analysis

Statistical analysis was performed using Stata 16.1 (Stata Corp., Lakeway, Texas, USA). Categorical data are presented as frequencies and percentages (n, %), while numerical data are assessed for distribution using histograms and presented as mean and standard deviation (SD) or median and interquartile range (IQR), depending on the data distribution. Non-parametric tests for trend were employed to compare data among the three groups, considering statistical significance with a p value < 0.05.

Multivariable analysis

Exploratory analysis utilized all variables capable of predicting outcomes using multivariable risk regression. Variables that were significant at the 0.05 level were entered into a backward stepwise logistic regression analysis. Only those factors that were significant (p < 0.05) remained in the final prediction model. For our analyses of the data using GLM with a binomial distributed outcome and a log link function identity link function to estimate the risk difference, an error is returned. Results were presented as risk differences (RD), p values, and 95% confidence intervals (CI). Reactive hyperplasia was used as a reference test to construct a generalized linear model (assumed Gaussian distribution) to determine the predicted probabilities of benign or malignant outcomes. The area under the receiver operating characteristics curve (AUROC) was calculated for each group using all 12 parameter factors that were significant at the level of 0.05, including lymph node size, location, duration, and associated symptoms such as hepatomegaly, fever, fatigue, bleeding, and the characteristics of palpated lymph nodes (firmness, fixation, tenderness, fluctuation, and rubbery consistency).

Calibration of the predictive model was defined as the concordance between the predicted and observed probabilities. The area under the curve (AUC) of the receiver operating characteristic curve was used to assess the discriminative ability of the model. The value of AUC ranged from 0.5 to 1.0, with a value of 0.5, indicating that the model had no discriminative ability, and a value of 1.0 indicating perfect discrimination. All presented p values were two-sided. Missing data were handled using grand mean substitution to ensure an accurate and effective model construction.

Predictive model and validation test

The multivariable analysis identified 12 significant factors used to predict the likelihood of malignancy and benign lymphadenopathy. These factors included lymph node size, duration, associated symptoms, and characteristics such as firmness, fixation, tenderness, fluctuation, and rubbery consistency. The predictive model generated probabilities for each patient, classifying their lymphadenopathy as reactive hyperplasia, benign, or malignant based on these probabilities.

Based on the results of the multivariable analysis, each case of lymphadenopathy was assigned two probabilities: one for being benign and one for being malignant compared with reactive hyperplasia using a 50% cutoff. The final prediction was then classified into three categories: malignant, benign or reactive hyperplasia.

If the multivariable analysis predicted a probability of malignancy of greater than 50% and a probability of benign less than 50%, the lymphadenopathy was classified as malignant.

If the predicted a probability of malignancy was greater than 50% and probability of benign was also greater than 50%, the classification remained malignant.

If the probability of malignancy was less than 50% and the probability of benign was benign. If both probabilities of benign was greater than 50% and the classification was benign. If both probabilities were less than 50%, the lymphadenopathy was classified as reactive hyperplasia.

The predictive performance of the model was assessed by determining its validity using a 3 × 3 table, comparing the actual biopsy results of the reactive hyperplasia, benign, and malignant groups against predictions from the model. This evaluation assessed the accuracy of the predictions and the ability to differentiate between the three conditions. In addition, overestimation and underestimation of diagnoses were analyzed to evaluate the clinical utility of the model.

Results

Among pediatric patients with lymphadenopathy who underwent lymph node excisional biopsy at Buddhachinaraj from January 2012 to December 2022, there were a total of 188 cases. These cases were categorized into three groups: reactive hyperplasia (91, 48.4%), benign lymphadenopathy other than reactive hyperplasia (70, 37.2%), and malignant lymphadenopathy (27, 14.4%) (Fig. 1).

Fig. 1
figure 1

Flow of the patients within the study

Histopathological diagnoses were performed for all 188 patients (Table 1). Most cases are based on excisional biopsies. The incidence rate of benign lesions in this study was 85.6%. The most common benign condition was “reactive hyperplasia” (48.4%) followed by caseating granulomatous inflammation (13.8%). In the present study, the incidence rate of malignancy was 14.4%. The most common malignant diagnosis was Hodgkin lymphoma (4.2%), followed by T cell lymphoma (3.1%).

Table 1 Summary of benign and malignant histopathology results derived from the lymph node biopsies

Comparing the baseline demographics and clinical characteristics of patients from history and physical examination findings (Table 2), there were no significant differences among the three groups in terms of gender, age, comorbidities, palpable lymph node count, history of tuberculosis exposure, or weight loss. However, statistical differences were observed in lymph node size, location, duration of swelling, hepatomegaly, presence of fever, fatigue, generalized bleeding, and physical examination findings of firm/hard, fixed, tender, fluctuating, and rubbery consistency. These factors were further analyzed using multivariable regression analysis to identify predictors (Table 3).

Table 2 Clinical characteristics of lymphadenopathy in the reactive hyperplasia, benign, and malignant groups
Table 3 Multivariable analysis of risk factors for benign and malignant lymphadenopathy compared with reactive hyperplasia (reference group)

Factors aiding in the differentiation between benign and malignant lymphadenopathy compared with reactive hyperplasia (reference group) were determined (Table 3). Considering missing data of less than 5%, predictors included lymph node size, location, duration of swelling, hepatomegaly, presence of fever, fatigue, generalized bleeding, and physical examination findings of firm/hard, fixed, tender, fluctuating, and rubbery consistency. These factors were used to construct a multivariable polynomial logistic regression model to predict benign and malignant conditions from reactive hyperplasia.

The diagnostic prediction model based on all these predictors demonstrated good accuracy in predicting benign and malignant conditions. The accuracy for predicting benign lymphadenopathy was 92.2% (AUROC = 0.92; 95% CI 0.87, 0.96) with a sensitivity of 88% and specificity of 78% (Fig. 2), while for predicting malignant lymphadenopathy, it was 98.6% (AUROC = 0.98; 95% CI 0.94, 0.99) with a sensitivity of 91% and a specificity of 85% (Fig. 3).

Fig. 2
figure 2

AUROC curve for predictive benign lymphadenopathy

Fig. 3
figure 3

AUROC curve for predictive malignant lymphadenopathy

A model has been developed to predict reactive, benign, and malignant conditions by considering the probability of being benign or malignant. This model was calibrated according to Tables 4 and 5, which evaluated the predictive ability compared with the actual biopsy results.

Table 4 Estimation of lymphadenopathy between true biopsy status and predicted status by prediction model
Table 5 Estimation of prediction lymphadenopathy accuracy

The overall accuracy of the prediction model was 68.6%, indicating that the model correctly estimated 68.6% of all cases. Out of 91 reactive hyperplasia cases, 77 were correctly predicted as reactive hyperplasia (41.1% of the total predictions). An underestimation of 6.4% reflects cases in which reactive hyperplasia was misclassified as benign (6 cases). Overestimation is not applicable because reactive hyperplasia cannot be overestimated.

In 70 cases of benign lymphadenopathy beyond reactive hyperplasia, 25 were correctly predicted as benign (13.3% of the total predictions). No benign cases were underestimated, indicating that no malignant lymphadenopathy cases were misclassified as benign. The overestimation in 12 cases of reactive hyperplasia was benign (3.2%).

In malignant lymphadenopathy, out of 27 malignant cases, all 27 were correctly predicted to be malignant (14.3% of the total predictions). Underestimation is not applicable since malignancy cannot be underestimated. Overestimation (21.8%) occurred in 8 reactive hyperplasia and 33 benign cases, which were overestimated as malignant, totaling 45 cases.

Discussion

Enlarged lymph nodes are a common occurrence in pediatric patients, often leading to parental concern and frequent medical consultations [15]. The most common cause of lymph node enlargement in children is reactive hyperplasia, a subtype of benign lymphadenopathy that typically resolves without specific treatment or antimicrobial therapy. However, differentiating these benign conditions from other causes that require targeted treatment or indicate malignancy is of paramount importance.

In this study, a predictive model was developed based on clinical characteristics obtained through history taking and physical examination. Notably, these clinical factors can be assessed by general practitioners, without the need for specialized examinations. The model incorporated 12 factors, like those identified in the study by Grant et al., 2021 [3], which aimed to develop algorithms for managing pediatric patients with enlarged lymph nodes. However, our study did not utilize these factors to predict the likelihood of malignancy, as including all factors would result in numerous variables, including some blood parameters (such as C-reactive protein), that are not routinely tested in Thai community hospitals.

Upon evaluating the ability of the model to differentiate between abnormal benign lymphadenopathy and malignant cases from reactive hyperplasia, it was found to have superior discriminatory power for malignancy (AUROC = 98.6%) compared to abnormal benign lymphadenopathy (AUROC = 92.2%). Nonetheless, the model maintained a high accuracy in predicting both conditions. Moreover, when considering the model’s findings, unnecessary lymph node biopsies could be reduced by nearly 45% in the reactive hyperplasia group, potentially mitigating the severity of surgical complications, anesthesia administration, treatment costs, and school or work absenteeism for patients.

The model demonstrated high accuracy for predicting benign (92.2%) and malignant (98.6%) conditions. However, using the same dataset for both development and validation could overestimate the model’s performance. The ROC curves and AUC values for both benign and malignant lymphadenopathy predictions highlight the model’s strong diagnostic capabilities. Despite this, the sensitivity and specificity values reveal that while the model is highly effective in identifying true-positive cases, there are still instances of false positives and false negatives. The high specificity for malignant predictions (85%) ensures the model’s reliability for identifying malignancies, but the sensitivity and specificity for benign predictions (88% and 78%, respectively) suggest that some benign cases may still be misclassified.

The model has high accuracy for identifying reactive hyperplasia, correctly predicting 77 out of 91 cases (84.6% accuracy within the group). However, it underestimated 14 cases, which were misclassified as benign or malignant. This indicates that while the model is effective at detecting reactive hyperplasia, there is still room for improvement.

The accuracy for benign lymphadenopathy is relatively low, with only 25 out of 70 cases correctly identified (35.7% accuracy within the group). There is significant overestimation, with benign cases misclassified as malignant (33 cases). This suggests that the model is more conservative in ruling out malignancy but less effective in correctly identifying benign lymphadenopathy.

The model is highly effective at identifying malignant cases, correctly predicting all 27 cases (100% accuracy within the group). However, there is a high rate of overestimation, with reactive hyperplasia and benign cases misclassified as malignant (41 cases). This reflects the model’s tendency to prioritize sensitivity (identifying all malignant cases) over specificity.

The clinical application of this model demonstrates its simplicity and ease of use, as predictive factors can be readily assessed through routine history taking and physical examination. This makes it applicable across all hospital settings, underscoring its versatility and potential as a treatment guide. In addition, it serves as a valuable tool for ongoing treatment decisions, facilitating the initiation of appropriate interventions, while minimizing unnecessary medical procedure. However, caution is warranted when predicting cases of reactive hyperplasia as concurrent abnormal benign lymphadenopathy may also be present. Conversely, if malignancy is predicted, patients should be promptly referred to specialized care centers to minimize delays in administering appropriate treatments, including surgical intervention.

Study limitations

The predictive model tends to overestimate malignancy by 21%. This bias was intentional to ensure the identification of potentially malignant cases, which is crucial in pediatric patients. However, this overestimation suggests that while the model effectively identifies potential malignancy, it may lead to unnecessary biopsies. In addition, the model’s development and validation using the same dataset is another limitation, necessitating further validation with an independent cohort.

Conclusion

The model demonstrated high accuracy in identifying reactive hyperplasia and malignant lymphadenopathy, which is crucial in clinical practice to avoid missing a diagnosis of malignancy. However, the model was less accurate in identifying benign lymphadenopathy, with significant overestimation of malignancy potentially leading to overtreatment or unnecessary anxiety.

Despite these limitations, the application of this model can reduce unnecessary lymph node surgeries, thereby mitigating the risk of surgical complications, treatment costs, and school or work absenteeism. Clinical implementation should be carefully considered, taking into account individual patient circumstances to ensure optimal treatment decisions.

Future research should focus on transitioning the model into the validation phase to assess its accuracy and precision with patient data. Using a more diverse dataset for model training and validation will improve generalizability. In addition, incorporating or refining clinical features can enhance the differentiation between benign and malignant cases. External validation with a separate dataset is necessary to evaluate the model’s performance in different clinical settings.