There is evidence indicating that breast cancer (BC) survivors, especially those who were treated with surgery involving the axilla, may be exposed to a lifetime risk of developing upper limb (UL) lymphedema.1,2 Surgical techniques aimed at reducing the damage to the lymphatics of the axilla, e.g., sentinel lymph node biopsy (SLNB) instead of axillary lymph node dissection (ALND), can reduce the risk of a woman developing lymphedema.3,4,5 Unfortunately, SLNB might not be an option for a few subsets of BC women. As the number of BC cases increases, so does the number of women with lymphedema.6 To date, there is no known method to completely avoid or cure the condition, although early physical therapy interventions may help reduce symptoms.5,7,8

There is a growing number of techniques and instruments for lymphedema diagnosis, and no consensus has been adopted so far defining which is the best method to diagnose and monitor the condition.2 Indeed, most studies use measurements of upper limb circumference (ULC) as their primary technique for lymphedema diagnosis. The reason for this technique selection is the practicality of ULC use.6,9 However, there are several differences reported related to technique usage (e.g., how many measuring points in the UL are evaluated and the threshold for ULC difference between the UL required to render a positive diagnosis). Some authors recognize the upper limb water displacement (ULWD) technique as a “gold standard” for lymphedema diagnosis. ULWD can directly measure volume differences between the UL ipsi- and contralateral to surgery.6,9,10,11,12 Unfortunately, ULWD has many limitations compared with ULC, requiring a much greater workload and consuming much more time compared with ULC. Importantly, ULWD shall not be performed in women with active skin lesions, and, since ULWD measures the volume of the entire UL, the technique can miss localized cases of lymphedema.9,10,11 Finally, some authors advocate for the use of self-reported upper limb swelling sensation (ULSS) as a straightforward form of diagnosing lymphedema.3,13,14 ULSS addresses the patient’s own impression about her UL.15

Besides the controversy about the choice of a suitable technique to diagnose lymphedema, there is a dearth of studies tackling the natural history of the condition, since data on BC patients with continued and repeated reevaluations for lymphedema during long-term follow-up (FU) are currently not available in the literature. Lymphedema may be present in the first months following BC surgery and can subside and/or persist after weeks, months, or even years. Importantly, it remains unclear whether a patient who has early forms of lymphedema will develop long-term, persistent forms of the disease. If these patients were identifiable, appropriate interventions could be devised to prevent permanent sequelae.

In light of the current knowledge gaps about the identification of patients at risk of developing long-term forms lymphedema, we designed the present longitudinal, long-term study to determine the incidence and prevalence of lymphedema over time (with repeated evaluations over a period ≥ 24 months) comparing ULWD, ULC, and ULSS techniques. Using ULWD as reference standard for the diagnosis of lymphedema, we examined the performance of two different approaches to ULC, either single- or multiple-point, and that of ULSS as well. Also, we evaluate if ULC and/or ULSS performed shortly after surgery or later on during follow-up, can predict long-term/persistent forms of lymphedema in women who underwent surgery BC. Our results help trace a roadmap for practitioners interested in promoting early interventions for patients at high risk of later developing hard-to-treat forms of long-term, persistent lymphedema.

Methods

Patient Selection

Patients were first approached 1 day before surgery, after being admitted at the woman’s hospital (CAISM/State University of Campinas—UNICAMP). At that moment, patients were appraised of the study objectives and specific procedures, and were invited to participate. Patients with primary, unilateral, nonmetastatic BC, who underwent axillary (either ALND or SLNB) and breast surgery without immediate breast reconstruction were included. Patients previously submitted to surgical procedures in their UL or axilla, with orthopedic and/or neurologic UL ailments, or renal and/or cardiac insufficiency were excluded. Sample size to accommodate the study’s objectives was calculated at 200 women at study start, considering a loss to FU of approximately 25% after 12 months, and 60% after 24 months (resulting in a final operational sample of at least 80 women with complete, 24 months follow-up encompassing all six planned evaluation rounds). Cancer stage, estrogen, progesterone and HER receptor status, type of surgery, approach to the axilla (either SLNB or ALND), and histopathological status of tumor were retrieved from medical records. Patients who had metastatic cancer diagnosed during FU, who underwent late breast reconstruction, or who missed one or more evaluations, without replacement, were discontinued. For statistical purposes, patients with a positive SLNB who were further treated with ALND were analyzed as ALND. Figure 1 depicts patient allocation, exclusions, losses and deaths during FU, and the incidence and prevalence of lymphedema. The study is restricted to female patients since approximately 98% of BC cases occur in women16 and to achieve a representative population of male patients would require a timeframe incompatible with the study’s objectives.

Fig. 1
figure 1

Flow chart depicting the study design, patient attendance to follow-up consultations, and incidence of lymphedema for each evaluated method. BC breast cancer, ULSS upper limb swelling sensation, ULC upper limb circumference, ULWD upper limb water displacement

Assessment

Women who accepted to participate, the day before breast and axillary surgery, responded to a brief questionnaire addressing their personal characteristics such as age, weight, and height. Surgeries were performed according to the hospital’s standard protocols for BC, and patients were discharged from hospital when appropriate. Then, 1, 3, 6, 12, and 24 months after surgery, patients were approached by the investigators for evaluations of lymphedema status. This study was conducted according to the standards of Resolution 466/12 of the Brazilian National Health Council and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. In addition, it was approved by the hospital ethical committee (1.693.660).

Evaluation of Lymphedema

Patients had their UL, ipsilateral and contralateral to the surgical site, examined for lymphedema 1, 3, 6, 12, and 24 months after surgery. During all evaluation rounds, ULC and ULSS were applied. At the 24-month evaluation round, ULWD was made in addition to ULC and ULSS.

ULC was performed with the volunteers in a sitting position, with both ULs flexed and supported at 45º on a table, keeping the forearms in maximum supination. Measurements were carried out after identifying the most prominent area of the olecranon, which was used as a reference for demarcating six points along the UL, according to the criteria established by Humble, 1995.17 The measurement points (MP) are depicted in Fig. 2A. For the duration of the study, we always used the same tape, which was positioned snugly or by compression above the marking of the prespecified points. We performed measurements in the UL ipsilateral and contralateral to the operated breast. Differences greater than or equal to 2 cm between the UL were considered for the diagnosis of lymphedema.

Fig. 2
figure 2

a Segmental evaluation of upper limb circumference (ULC), at measurement points as established by Humble (1995). ULC was considered positive if a ULC difference ≥ 2 cm between the upper limbs ipsi- and contralateral to surgery was detected at two contiguous measurement points (approach 1) or in at least one measurement point (approach 2); b equipment developed by the authors to carry out the upper limb water displacement method (ULWD). A positive diagnosis was made if there was a volume difference ≥ 200 mL between the upper limbs ipsi- and contralateral to surgery. h height, d diameter, cm centimeters

As for ULWD, we developed an equipment, which consisted of a rigid acrylic tube 86 cm in height and 20 cm in diameter, fixed on a square base. Close to its upper opening, 80 cm from its base, there is an opening for water exhaust (water exhaust tube). The height of the water exhaust tube was marked with a red stripe surrounding the tube externally and representing the water filling limit. The water exhaust tube is a rigid acrylic tube, 2 cm in diameter and 12.5 cm long, arranged diagonally, to direct the flow of water. The volumetric capacity of this equipment is approximately 24 L. The technical configurations of this experiment were developed by the study team itself (Fig. 2B). The experiment consisted of the equipment being filled with water at room temperature up to the red stripe. Next, the patient was positioned seated in a chair, with ankles in a neutral position, knees and hips flexed at 90°, and spine supported on the back and was asked to remove any adornment from her UL. Both ULs were evaluated, being immersed one limb at a time, slowly, until the height of the axillary cavity—the exact positioning was adjusted with inclination of the spine to the side of the equipment. The UL remained submerged and immobile until the water flow ceased. Water that overflowed was collected in a graduated plastic beaker and equals the volume of the UL. This volume was weighed on a high-precision digital scale, with a weighing capacity of 10 kg, accurate to 1 g and with tare self-calibration (Electronic Kitchen Scale, model SF-400). The tare was reset after each assessment (discounting the weight of the collecting cup). For statistical purposes, differences of 200 mL or higher between the ULs rendered a positive lymphedema diagnosis.9 Considering that water density is 1 g/cm3 (gram per cubic centimeter), we assumed that 1 L of water is equivalent to 1 kg.8

ULSS consisted of a questionnaire addressing the patients’ self-perception of signs and symptoms of lymphedema. A “positive” diagnosis was made if the patient reported two or more of the following sensations, in relation to the UL ipsilateral to the surgery: (1) increased weight, (2) swelling, (3) tension (characterized as a sensation of skin and/or muscle stretching).19,20 At each of the FU interviews, the researcher questioned the patients specifically about each of these perceptions.

Statistical Analysis

All statistical calculations were performed using the R environment for statistical computing.21 Confidence levels were set to 5%. The gold standard for a positive diagnosis of lymphedema was a volume (ULWD) difference between the affected and contralateral UL ≥ 200 mL. We tested two different approaches to ULC: (1) two MPs with ≥ 2 cm, i.e., ULC is positive if a difference of 2 cm or above in ULC is detected at two contiguous MPs [if we detected circumference difference of 2 cm or greater at the two MP corresponding to each UL segment—arm (MP e and f), forearm (MP c and f), hand (MP a and b)]; and (2) one MP with ≥ 2 cm, i.e., a difference of 2 cm or more is found in at least one MP. We used χ2(or Fisher’s test where appropriate) to evaluate the relationship between key clinical and pathological features of the subjects according to the lymphedema status 24 months after surgery (lymphedema diagnosis as determined by ULWD) (Table 1). Next, we calculated the performance estimators for each of the two approaches ULC at diagnosing long-term lymphedema, using as gold standard ULWD; we compared these two approaches by generating log-likelihood models for each of these approaches; models were then compared using the likelihood ratio test (Table 2). Because the likelihood ratio test showed a significant superiority of approach 1, we used this approach in all subsequent analyses. In Table 3, we calculated the probabilities of a patient developing long-term lymphedema (i.e., being diagnosed with lymphedema, 24 months after surgery, using ULWD) depending on ULC results 1, 3, 6, and 12 months after surgery. Figure 3 is a graphical depiction (Fagan’s nomogram) of the data presented in Table 3. Finally, we calculated the performance estimators for ULSS at diagnosing lymphedema, using as gold standard either ULWD (24 months) or ULC (approach 1) at assessment rounds 1, 3, 6, and 12 months after surgery (Table 4).

Table 1 Key clinical features of the patients and their relationship with lymphedema (evaluated by water displacement method) 24 months after surgery
Table 2 Performance estimators and kappa agreement for two ULC approaches in the diagnosis of lymphedema 24 months after surgery, compared with ULWD (gold standard)
Table 3 Likelihood for developing long-term lymphedema as related to upper limb circumference differences measured at specific postsurgery time intervals
Fig. 3
figure 3

Fagan’s nomogram depicting the probabilities of women developing long-term lymphedema based on upper limb circumference (two measurement points ≥ 2 cm) during follow-up. Prob probability, LR likelihood ratio. Blue lines refer to positive LR and pink lines to negative LR. Positive LR in 1 month after surgery = zero; therefore, the nomogram for that evaluation round was not produced

Table 4 Performance of ULSS for diagnosing lymphedema, using as gold standard either ULC (1, 3, 6, 12, and 24 months after surgery) or ULWD (24 months)

Results

In Table 1, we present the main clinical and surgical characteristics of women according to the presence of lymphedema 24 months after surgery, diagnosed using ULWD. Of the 85 women evaluated 24 months after surgery, lymphedema was diagnosed in 19 (22.4%). The majority (78.8%) of the patients were aged 55 years or over and were overweight (36.5%) or obese (35.3%). Regarding BC features, most women had stage I (27.1%) or II (42.4%) tumors, positive estrogen receptor (82.4%), or positive progesterone receptor (77.7%), were HER2-negative (60.0%), and had invasive ductal carcinomas (94.1%). The majority of the patients (75.3%) underwent conservative (quadrantectomy) surgery, and 52% of women had their axilla treated with SLNB. Only ALND was associated with a higher prevalence of lymphedema 24 months after surgery: 16 (84.2%) of 19 women with lymphedema underwent ALND. Among women without lymphedema, 42 (63.6%) of 66 underwent SLNB (p = 0.001).

In Table 2 we compare the performance estimators (sensitivity, specificity, positive and negative predictive values, positive and negative likelihood ratios, accuracy, and kappa) for the two ULC approaches (1: two MPs ≥ 2 cm; 2: one MP ≥ 2 cm) for the diagnosis of long-term lymphedema, considering ULWD as the gold standard. The highest agreement (kappa = 0.60, 95% CI= 0.34 to 0.81) and accuracy (88.2%) were observed using approach 1, with a sensitivity of 52.6% (40.6–64.7%) and specificity as high as 98.5% (91.3–100%). A higher positive likelihood ratio (PLR) was also observed using approach 1 instead of 2 (PLR 34.74 versus 8.34, respectively). When we compared approaches 1 and 2 by generating log-likelihood models for each approach, the log-likelihood for approach 1 was − 24.29, whereas that for approach 2 was − 28.84. The comparison of these log-likelihoods using the likelihood ratio test resulted in a χ2 of 9.08 with two degrees of freedom, denoting a clear superiority of model 1 (diagnosis of lymphedema if at least two adjacent MPs ≥ 2 cm) when compared with model 2 (diagnosis if at least one MP with ≥ 2 cm), p < 0.001.

In Table 3, we show an analysis of how ULC performed at 1, 3, 6, and 12 months after surgery predicted lymphedema at 24 months after surgery, again using the ULWD as gold standard. Based on the results of Table 2, we restricted analyses to approach 1. Therefore, this table presents the positive and negative likelihood ratio and the probability for a given woman to develop long-term lymphedema (as diagnosed by ULWD 24 months after surgery), if she had a positive ULC at, respectively, 1, 3, 6, and 12 months after surgery. A positive ULC 1 month after surgery was associated with a probability of 0% of that woman having lymphedema 24 months later. In contrast, a negative ULC 1 month after surgery indicated a probability of 77.1% of that woman not developing long-term lymphedema. Six months after surgery, however, a positive ULC was associated with a probability of 60% of that patient having lymphedema 24 months after surgery, and the probability of that particular woman not developing long-term lymphedema in the case of a negative ULC measurement was 80.0%. The best prediction of long-term lymphedema was thus obtained using ULC 6 months after surgery.

In Table 4, we describe the performance estimators for ULSS as a diagnostic tool for lymphedema. The objective measurement technique used to diagnose lymphedema at 1, 3, 6, and 12 months was ULC, using approach 1. For the 24-month evaluation, we present data for both ULWD and ULC (performed for all patients, at the same visit). The agreement (kappa value) between ULSS and either ULC or ULWD, for all measurement rounds, was poor. The best agreement (kappa = 0.33; 95% CI = 0.11–0.54) was obtained between ULSS and ULWD, 24 months after surgery. In general, using as objective detection of lymphedema ULC, ULSS seemed to increase in sensitivity from the beginning of FU to 12 and 24 months into the FU, probably reflecting a gradual increase in patients’ self-awareness of her UL condition during the first postoperative months. A similar trend was observed for the positive predictive value. On the other hand, ULSS specificity remained stable at 75–85% during the entire observation period, regardless of whether the gold standard was ULC (1, 3, 6, 12, and at least 24 months after surgery) or even ULWD (24 months).

Discussion

In our prospective, long-term cohort study, we were able to (1) evaluate the presence of long-term (at least 24 months postoperatively) lymphedema in a relatively large cohort of BC patients using the gold standard for lymphedema detection, i.e., the ULWD technique; (2) test the performance of two different approaches to a straightforward technique that is easy to use in clinical practice (ULC), using ULWD as gold standard; (3) evaluate whether early (3, 6, 12 months postoperatively) detection of lymphedema using ULC is a good predictor of long-term, persistent lymphedema; and (4) examine how well the patients’ self-awareness of lymphedema (ULSS) is associated with the objective (either ULC or ULWD) detection of lymphedema.

In our study, 24 months after surgery, 19 (22.4%) women had persistent lymphedema, according to the gold-standard ULWD technique. A few authors also examined long-term lymphedema in women who underwent surgical approaches to the axilla. Recently, Armer et al., 2019,22 studying women with T0–4, N1–2 tumors, who had previously received neoadjuvant chemotherapy and were treated with ALND, reported a 60.3% prevalence of lymphedema 3 years after surgery. It is noteworthy that those authors used an indirect UL volume estimation method (they estimated the volume using several UL circumference measurements). In our study, 24 months after surgery, using direct measurement of UL volume (ULWD), we detected a prevalence of 40% lymphedema in women who underwent ALND, which was significantly higher than the 7% prevalence of lymphedema in women who underwent SLNB. Indeed, the much lower prevalence of lymphedema in our study can be attributed to 52.9% of the women having undergone SLNB, and to the presumably higher accuracy of ULWD compared with the indirect volume measurement method used by Armer et al.22

We detected a significant association between lymphedema and ALND. This was not unexpected, since several authors reported the same association previously.5,13,14 Our contribution in this respect lies in the long-term evaluation of lymphedema, using the gold-standard ULWD technique. Interestingly, in a secondary analysis (data not shown), observing lymphedema prevalence in relation to the approach to the axilla (either ALND or SLNB) over time (1, 3, 6, 12, and 24 months after surgery), we detected that the presence of lymphedema in the SLNB group 1 month after surgery was nearly 50% of that in the ALND group, declining over time to roughly one-fourth of that observed in the ALND group 24 months after surgery. In other words, this analysis reveals that lymphedema in ALND compared with SLNB is not only more prevalent, but also more enduring.

Most authors advocate for measuring the circumference of just one UL point to identify lymphedema,5,13,23 but it is not uncommon to see authors failing to report how many UL points were measured. We measured the circumference of both ULs at six points and, for analytical purposes, rendered a “positive” lymphedema diagnosis when an UL circumference difference ≥ 2 cm was found at either two MPs (approach 1) or one MP (approach 2). The best performance and agreement between ULC and ULWD (gold standard) was obtained using two MPs. Further, when comparing the two models using the likelihood ratio test, we again found that approach 1 is superior to approach 2, which suggests that adding UL circumference measurements might contribute to ULC accuracy. To our knowledge, our study is the first to prospectively compare ULWD and ULC.

It is encouraging that ULC is a good substitute for ULWD, since the latter technique has several limitations that preclude its application in many circumstances: (1) ULWD is contraindicated in cases of skin lesions and open wounds; (2) the equipment used is more expensive, heavy, and difficult to transport; (3) there is a need to change water, clean, and sanitize after each use; (4) the evaluation takes longer; and (5) the limb has its volume measured as a whole, rendering it impossible to determine the correct location of the lymphedema.9,10,11

In fact, there are several methods for lymphedema diagnosis, and to choose the best diagnostic tool amid the available options is a relatively difficult task. Multiple tools exist to assess the size of the limb, in addition to those studied by us, such as perometry (optoelectronic volumetry), bioimpedance, lymphoscintigraphy, lymphography with indocyanine green (ICG), dual-energy X-ray absorptiometry, computed tomography, and nuclear magnetic resonance, each of them having their strengths and weaknesses.1,2 One of the best studied tools is ICG lymphography, which has been shown to be an important addition to the arsenal of diagnostic tools aimed at evaluating the lymphatic system.24

Unfortunately, invasive methods require specialized infrastructure and personnel in order to be offered. The International Society of Lymphology,25 in a consensus published in 2020, determined that imaging methods should be applied preferentially, in addition to the volume measures of the limb. Our study was aimed at evaluating a clinical, straightforward, extremely low-cost approach to breast cancer survivors, aimed at triaging those who are at increased risk of developing long-term lymphedema and could therefore benefit from early interventions (e.g., physical therapy) capable of reducing the incidence and intensity of long-term lymphedema.

We believe that the most important analysis of our study is how well ULC (approach 1), if used during early (1, 3, 6, 12 months) FU, could predict long-term lymphedema. For that purpose, we used likelihood ratio analysis tools to ascertain the probability of a woman with a positive ULC, early on after surgery, to later develop long-term, persistent lymphedema. According to our analysis, the best moment to perform ULC was 6 months after surgery, since at that moment we observed the best prediction of long-term lymphedema. Because ULC had worse predictive power of long-term lymphedema when performed 1 or 3 months after surgery, it seems plausible that early presentations of lymphedema may be restricted to short segments of the UL, and most likely will subside within the first 6 months after surgery.26 Persistent, widespread lymphedema seems to evolve over longer (> 6 months) timeframes, and possibly is linked to scarring of the lymphatics and or blood vessels, whereas short-term lymphedema may be just be a consequence of short-lived phenomena such as postoperative inflammation and transient lymphatic congestion.

Finally, we attempted to evaluate whether the patients’ self-assessment correlated well with the objective diagnosis of lymphedema. Clinically, the patient’s history, the reporting of symptoms, inspection and palpation, and determination of differences in volume between the limbs are relevant steps to diagnose lymphedema.6,25 A previous cross-sectional study evaluated the agreement between the objective diagnosis of lymphedema with patient-reported signs and symptoms in two ethnically distinct groups of patients: a group of white women and another of African-American women, obtaining kappa values of 0.11 and 0.06, respectively. Those authors concluded that the objective diagnosis of lymphedema did not appear to be related to the presence of signs and symptoms.27 Terada et al., 2020,13 suggested that, although there are no symptoms or unique criteria to consider the presence of lymphedema, sensory changes such as numbness, pain, and stiffness after axillary surgery can be confounding factors for the patient, during the subjective investigation of lymphedema. Our findings are in alignment with those reported previously, since we obtained extremely low kappa values for the comparisons of either ULC (during FU or after 24 months) and ULWD (24 months after surgery) and the subjective evaluation of ULSS. It is somewhat disappointing for us that ULSS presented such an underwhelming performance, since patient self-assessment of symptoms could in theory be a triaging tool for women at increased risk of developing long-term lymphedema.19

Women diagnosed with BC must be informed and aware of their treatment; therefore, obtaining written informed consent is mandatory before any surgical procedure, and even more important is the certainty of patients’ understanding about the procedures they are about to undergo. Therefore, the development of a term that includes a self-assessment of the level of understanding and the quality of the information provided can be a first step towards reducing legal issues.28 Along the same lines, the American Society of Breast Surgeons (ASBrS) brought together an international and multidisciplinary panel of experts to recognize and raise awareness about lymphedema. Among several recommendations, the panel agrees that it is necessary to educate surgeons and patients about the risks of lymphedema, from the preoperative period to follow-up visits, and that this should be incorporated into survival care plans.29,30

In our opinion, our study has several strong points: it is a long-term, fully documented report of lymphedema incidence and prevalence in a relatively large, homogenous cohort of BC patients; we compare direct measurement of UL volume with the more easily performed UL circumference technique, in the short, medium, and long terms; we assess how reported symptoms correlate with objective evaluation of lymphedema. Our data allowed us to depict a clear panorama of how lymphedema evolves over time, and to devise the best strategy to early identify those women at increased risk of developing permanent forms of the disease. Based on our own data, ULWD should be recommended over ULC. If ULWD is not readily available, we recommend performing ULC no earlier than 6 months after BC surgery. Our data suggest that optimal prediction of long-term lymphedema can be obtained with ULC, if ULC is performed 6 months after surgery. We are also convinced that clinical benefit can be maximized if ULC is performed as per our approach 1, i.e., measuring six points in both ULs, and considering as “positive” for lymphedema only women with a ULC difference between UL ≥ 2 cm in at least two contiguous MPs. In essence, our results may guide practitioners to easily discern which patients require preventative interventions to avoid irreparable sequelae. Our proposed triaging method can be applied to virtually all breast cancer survivors, considering as a premise that the ideal detection tools should be objective and reproducible, providing a standardized metric that supports treatment decisions.