Breast reconstruction has become an important part of the surgical care for breast cancer patients.1 Breast reconstruction with a satisfactory aesthetic outcome can have a positive effect on the psychological recovery of the patient following mastectomy.2,3 A standardized measure of aesthetic outcome after reconstruction would enable comparisons of breast reconstruction outcomes for clinical and research purposes. A previous review by Potter et al. failed to identify a well-accepted, standardized, and validated aesthetic assessment scale following postmastectomy breast reconstruction.4 Numerous criteria have been proposed for what constitutes the ideal professional aesthetic assessment scale for breast reconstruction.58 Munshi et al. noted that the ideal professional aesthetic assessment scale should be based on quantitative measures that are easy to understand, reproducible, and have a good correlation with the patient-reported outcome (PRO).6 Potter et al. proposed that the core outcomes for assessing aesthetics involve a multidisciplinary approach, and that in addition to meeting the minimal validity and reliability criteria applied to other measurement systems, there would be an additional PRO domain to capture the perspective of the patient.4

Because a large number of professional aesthetic assessment scales currently exist for breast reconstruction, and there is no one ideal measurement tool, choosing the optimal professional aesthetic scale is challenging. A professional aesthetic assessment scale was defined as a scale to evaluate the aesthetic result by a healthcare professional.4 The goal of this paper, therefore, was to review systematically all of the existing aesthetic assessment scales by healthcare professionals for breast reconstruction. In addition, we are the first group to evaluate all the professional aesthetic assessment scales using well-established quality criteria for measurement properties.

Methods

Search and Selection Process

A computerized bibliographic search was performed in AMED, CINAHL, Cochrane, EMBASE, MEDLINE, PsychINFO, and PubMed in February of 2013. Search items were: (breast reconstruction OR mammoplasty) AND (aesthetic OR esthetic OR cosme*), and the search was limited to English articles published in 1990 or after. Duplicates were removed and two independent reviewers included articles evaluating patients undergoing postmastectomy breast reconstruction, both implant-based, and autologous reconstruction based on title and abstract. The two independent reviewers were Saskia Maass (first author) and Toni Zhong (senior author). Excluded were articles with no primary data, expert opinions, letters to the editor, and conference reports. For the second selection, articles were evaluated based on their full-text and were excluded when they did not contain a professional aesthetic assessment scale or only contained a patient-reported aesthetic assessment. Discrepancies between review authors were solved by reaching consensus. References of the articles were hand-searched to identify additional relevant papers.

Data Extraction

Included papers were reviewed for the following data: (1) study population, (2) type of reconstruction, (3) profession of the observer, (4) method of evaluation, and (5) the characteristics of the assessment scale. If insufficient information about the professional aesthetic assessment scale was provided, then the cited references were evaluated for additional information.

Medical Outcomes Trust Criteria to Evaluate each Professional Aesthetic Assessment Scale

To determine the methodological quality of each aesthetic assessment tool, both reviewers evaluated each aesthetic assessment scale using the Medical Outcomes Trust (MOT) criteria developed by the Scientific Advisory Committee (SAC).915 All methodological information was obtained directly from papers that first described the scales as well as from the papers that subsequently used and evaluated the scales. Each professional aesthetic assessment scale was graded according to the seven MOT criteria. One point was assigned when the aesthetic assessment scale fulfilled the criterion, half a point was given when most of the criterion was met, and zero points were assigned when the scale did not meet the criterion. Six out of the 7 criteria were based on the original MOT criteria and these included: (1) the underlying conceptual framework, (2) reliability, (3) validity, (4) responsiveness, (5) interpretability, and (6) burden for the professional and the patient. The conceptual framework criterion reflects the process of development of the scale. One point was assigned if this was clearly developed for patients undergoing postmastectomy breast reconstruction. Reliability refers to the degree to which scores reflect the underlying phenomenon. Both the intraclass correlation coefficient (ICC), which determines the degree of concordance between test and retest, and the internal consistency measured by the Cronbach’s alpha are common statistics for reliability.16 For research purposes, a measure should achieve a reliability coefficient of at least 0.70.17 One point was assigned for a kappa higher than 0.40 or a coefficient above 0.70. Validity is the degree to which an instrument measures what it is purported to measure. An aesthetic assessment scale with a Spearman ρ > 0.70 was assigned 1 point. Responsiveness is the ability of an instrument to distinguish clinically important changes from measurement error over time even if these changes are small, and this is measured by the responsiveness ratio (RR).10,18 When the RR was at minimum 1.96, this criterion was assigned a point. Interpretability is defined as the degree to which one can assign qualitative meaning to quantitative scores.9 One point was assigned if information was given on the means and standard deviation of the population. The burden for professional and patient evaluates the overall time burden. One point was assigned when burden was deemed to be low from the professional and patient. A seventh criterion is to assess the relationship between the professional aesthetic assessment scale and PRO as advocated by Potter et al., and 1 point was assigned when the correlation was >0.71.4 Because all the aesthetic outcome scales were described in English only, and we are only evaluating those scales intended for professional assessment, the two additional criteria of the MOT “alternatives modes of administration” and “cultural and language adaptations or translations” were found not to be applicable in our review. Table 1 provides a summary of the seven modified MOT criteria used.

Table 1 Simplified summary of the 7 modified Medical Outcomes Trust criteria for evaluating professional aesthetic assessment scales4,9,10,1618,47

Results

Search Results

A total of 5,845 citations were generated from the database search, and of these 3,214 duplicates were excluded, leaving a number of 2,631 articles. Based on the titles and abstracts, articles that did not evaluate patients undergoing postmastectomy breast reconstruction were excluded. Also excluded were records with no primary data, expert opinions, letters to the editor, and conference reports; this totaled 1,753 citations. The full-text of 878 articles was reviewed and 763 articles were excluded, because they did not use a professional aesthetic assessment scale. An additional 5 articles were found after reviewing the full text and references. A total number of 120 articles were included in the review. The search and selection process is summarized in Fig. 1.

Fig. 1
figure 1

Flow diagram of article selection process

Study Characteristics

(1) Study population: From the 120 articles selected, 95 described outcomes exclusively following breast reconstruction, 13 included only outcomes following breast-conserving therapy (BCT), and 12 contained both breast reconstruction and BCT.

(2) Type of reconstruction: Fifty-two articles included only patients following autologous breast reconstruction, 37 included implant-based reconstructions, 29 included both autologous and implant-based reconstruction, and 2 articles did not specify the reconstruction method.

(3) Observer: The aesthetic assessment was performed exclusively by plastic surgeons in 65 studies, whereas another 39 studies contained assessments by other medical professionals, such as nurses, residents, fellows, and other house-staff. Fifteen of the reviewed articles used a combination of professionals and nonprofessionals to perform the aesthetic assessments. The profession of the observer was unknown in 19 articles.

(4) Professional aesthetic assessment scale: For 67 articles, the aesthetic assessment was performed by means of photographs, 9 were based on clinical assessments, 8 were based on both clinical and photographic assessments, and 36 studies failed to state details. Table 2 presents an overview of the current professional aesthetic assessment scales used in each study, organized from most to least commonly used.19

Table 2 Summary of the different professional aesthetic assessment scales in the literature and their frequency of occurrence

(5) Specific characteristics of the professional aesthetic assessment scales: The number of properties per scale varied from 1 to 12 items, with a median of 4 properties for all the reviewed articles. These specific properties included: shape, overall aesthetics, symmetry, volume, scars, inframammary fold, nipple-areola complex, contour, position, areola, color, consistency, ptosis, mobility, and rippling. Of all the articles that we reviewed, a total of 48 articles included a measure of patient satisfaction with the professional aesthetic assessment scale (Fig. 2).

Fig. 2
figure 2

Specific properties addressed by the professional aesthetic assessment scales

MOT Criteria

Table 3 presents the summary of our evaluation of the 12 professional aesthetic assessment scales as prescribed by the MOT and the score that we assigned to each of the 7 modified MOT criteria. The four-point professional aesthetic assessment scale is the most commonly used method of aesthetic evaluation. The ten-point scoring scale fulfills over four of the seven criteria ascribed by the MOT. Below is a more detailed description of each of the 12 professional aesthetic assessment scales with respect to its adherence to the modified MOT criteria.

Table 3 Summary of the 12 professional aesthetic assessment scales and the score that we assigned to each of the 7 modified Medical Outcomes Trust criteria

Four-Point Scale

This scale has not been validated in the breast reconstruction population.20 The reliability of the four-point scale was modest, with an inter-rater agreement κ of 0.55.21 Its validity has not been proven, and the Spearman coefficient for the postoperative scoring has been found to be 0.57.22 The weighted kappa to calculate the intraobserver agreement was 0.70 according to Vrieling et al.21 The correlation with the patient’s assessment was analyzed by Schuster et al. and found to be good.23 A score of 3 out of 7 was assigned.

Five-Point Scale

The reliability measured with the inter-rater agreement was good, the validation of the scale was fair, and the questionnaire burden is low.24 The total score was 2 out of 7.

Garbay/Lowery Scale

The Garbay assessment scale has been analyzed in detail by Lowery and often is referred to as the Lowery scale.8,25 Lowery et al. assessed the reliability and found kappa values from 0.19 to 0.63.21 The intra-rater agreement kappa values were from 0.21 to 0.67 for the subscales. Carlson et al. found inter-rater agreement kappa values from 0.31 to 0.72.26 The total was 2 out of 7 points.

Three-Point Scale

The reliability, validity, responsiveness, and correlation with the patients’ assessment have not been tested. One point out of 7 was assigned for low questionnaire burden.

Baker Scale

The Baker scale was intended to assess capsular contractures for patients after augmentation mammoplasty.27 Spear and Baker et al. modified the capsular contracture scale in 1995 for patients who had implant-based breast reconstruction.28 Spearman correlation coefficient was calculated to evaluate the correlation between the professional’s and patient’s scores, which was 0.40 and considered low.29,30 The scale was assigned 1 out of 7 points for low questionnaire burden.

Ten-Point Scale

Visser et al. showed an inter-rater agreement for the ten-point scale of 0.848.31 Validity tested using the Spearman coefficient ranged from 0.70–0.83. Veiga found an inter-rater agreement from 0.17 to 1.00; the intra-rater agreement ranged from 0.06 to 0.80.1 Five articles described a significant or close correlation between the patient aesthetic assessment scores and the evaluation by professionals.3,29,3133 The total score was 4.5 out of 7.

Harris Scale

This professional aesthetic assessment scale also is referred to as the Rose or Harvard scale.3436 It was developed to monitor the effects or radiotherapy on the aesthetic outcome in BCT patients, and not for postmastectomy breast reconstruction. The inter-rater agreement was 0.66 as found by Preuss et al.37 The total score was 2 out of 7 for low burden and good correlation with PRO.

Linear Numeric Analogue Score

Song et al. evaluated the 0–100 linear numeric analogue scale.38 The inter-rater agreement ranged from 0.23 to 0.38, the Cronbach α was 0.89, and the intra-rater agreement was 0.81.38,39 Salgarello et al. found a close correlation between the patients’ and professional assessment of the aesthetic outcome.40 The score was 4 out of 7.

Two-Point Scale

A two-point scale was used by Chawla et al. to score the aesthetic assessment either as good-excellent or fair-poor.41 The total score was 1 out of 7 points for low burden.

Six-Point Scale

A six-point scale scored 1 point for low burden of use.

Cohen Scale

Cohen et al. developed and statistically analyzed the Cohen Scale.7 The inter-rater agreement was κ 0.0–0.39, the Cronbach was α 0.92, and the intra-rater agreement was κ 0.25 to 0.66.21,42 There was a moderate correlation with the patient assessment; the Spearman coefficient was 0.36–0.53.22 The total score was 3 out of 7.

Seven-Point Scale

The inter-rater agreement of a seven-point scale with 6 subscales had a κ that ranged from 0.36 to 0.56.43 The total score was 1 out of 7.

Discussion

Our systematic review of 120 published articles identified 12 different aesthetic assessment scales by professionals for breast reconstruction. The common deficiencies shared by all the existing professional aesthetic assessment scales include their limited responsiveness and interpretability. Both of these attributes are important requirements in a clinically useful measurement tool. In other words, for the aesthetic assessment to be clinically relevant, it needs to be responsive to detect possible changes in the breast reconstruction aesthetic outcome over time. Furthermore, the numerical grading from the professional aesthetic assessment scale should lend qualitative meaning and provide information on what change in score would be considered clinically meaningful. In addition, the lack of an existing criterion standard for a subjective phenomenon, such as aesthetic outcome makes assessment of validity challenging. Of the 12 different professional aesthetic assessment scales that we evaluated, the ten-point professional aesthetic assessment scale was found to have the most rigorous measurement properties.9 The strengths are the significant correlation with the patient aesthetic evaluation, and the scale’s validity demonstrated by a high Spearman coefficient of 0.70–0.83.3,29,3133 The primary weaknesses associated with this scale are the wide range of inter-rater agreements (0.17–1.0) and intra-rater agreements (0.06–0.80).1

Ideal Professional Aesthetic Assessment Scale

It is important to have a single reliable and responsive professional aesthetic assessment scale to measure aesthetic outcomes following breast reconstruction that is validated in this population, and supported by PRO. The development of this ideal aesthetic assessment tool would enhance the comparability of breast reconstruction results across techniques, surgeons, and studies to aid with the selection of procedures that produce the best aesthetic results. The ideal aesthetic assessment scale or “gold standard” for the professional aesthetic evaluation after breast reconstruction should ideally adhere to all seven of the modified MOT criteria.9

1. Conceptual framework formation: The professional aesthetic assessment scale should be at least analyzed for patients undergoing breast reconstruction after mastectomy.

2 and 4. Reliability and responsiveness: Both the inter-rater and intra-rater agreement of the scale should at least have a fair to good agreement. Fortin et al. recommends a panel of three evaluators for the evaluation of the aesthetic outcome.44

3. Validity: The validity of the ideal professional aesthetic assessment scale should be analyzed and have good correlation for all criteria.

5. Interpretability: The quantitative value on the assessment scale should have qualitative meaning, and the developers of the scale should provide information about what change scores should be considered clinically meaningful.45

6. Burden: The scale should pose a low burden on both the patient and the professional.6

7. Patient assessment: The scale should have a good agreement with the patient assessment of the aesthetic outcome.7

Limitation

To improve inter-rater agreement, all assessment scales should ideally be performed by healthcare professionals with the same level of expertise. However, as demonstrated by our review, healthcare professional is a widely used term, from an unknown observer with unknown experience, to an experienced plastic surgeon. This has shown to lead to different results.46 Furthermore, in some studies the assessor was the operating surgeon, which could lead to significant bias. Another significant limitation of our review was that only 18 of the 120 articles that we reviewed actually provided methodological information on the aesthetic assessment scales that were used.

Conclusions

Of the 12 different professional aesthetic assessment scales, the ten-point professional aesthetic assessment scale was found to have the highest quality as evaluated by the modified version of the MOT criteria set by SAC.9 However, this scale has limited clinical usefulness due to its poor responsiveness to change, lack of interpretability, and wide range of intra- and inter-rater agreements.1 A “gold standard” professional aesthetic assessment scale needs to be developed to enhance the comparability of breast reconstruction results across techniques, surgeons, and studies to aid with the selection of procedures that produce the best aesthetic results from both the perspectives of the surgeon and patients.