Background

Skin cancer represents a major public health burden in the U.S. with an estimated 9000 Americans being diagnosed each day [1, 2]. Keratinocyte carcinoma (KC, i.e., basal cell carcinoma, squamous cell carcinoma) comprises a vast majority of skin cancer diagnoses with over 5.4 million KC cases diagnosed in over 3.3 million Americans in 2012 [2]. While the incidence of KC is much greater than that of melanoma, the latter has a significantly higher mortality rate [3]. For the year 2022, the American Cancer Society estimated 197,700 new cases of melanoma (99,780 invasive and 97,920 in situ) and 7650 deaths from melanoma [3]. In addition to its risks of morbidity and mortality for patients, skin cancer places a considerable economic burden on society in terms of healthcare expenditures [4]. In 2011, Americans spent USD $8.1 billion on skin cancer treatments including ambulatory visits, procedures, inpatient stays, and prescribed medications [4]. Of this sum, USD $4.8 billion and USD $3.3 billion were attributed to KC and melanoma treatment, respectively. [4]

As the global incidence of skin cancer continues to increase [2, 5], the early detection, or secondary prevention, of skin cancer becomes important to efforts seeking to decrease patient morbidity and mortality, especially for those with melanoma. If melanoma is diagnosed early and treated in a timely manner, the 5-year survival rate is highly favorable, nearing 99% [6]. However, once a melanoma metastasizes beyond its site of origin and spreads to regional lymph nodes (stage III disease) or distant organs (stage IV disease), the rates decrease to 65% and 25%, respectively. [6]

While dermatologists are specially trained in skin cancer detection, many Americans lack timely access to a dermatologist for diagnostic skin cancer examinations and procedures. Known barriers to dermatology care include patient socioeconomic status, race/ethnicity, residence in a rural county, and insurance type [7, 8]. On the county level, dermatologists tend to be concentrated in urban counties with higher median incomes [7]. Only 1 in 10 dermatologists practice in rural counties, and 88% of rural counties have 0 dermatologists [7]. With respect to race/ethnicity, most counties with populations reflecting African American, Hispanic, and/or Native American ethnic majorities were also found to have 0 dermatologists. [7]

Many Americans are thus frequently left without convenient access to specialized dermatology care for important skin cancer-related services. Patients without access may suffer delays in diagnosis and treatment, increasing the risk of disease progression [9]. While virtual solutions such as teledermatology help improve access [10], patients still require in-person visits for diagnostic procedures [7]. In these areas, primary care providers (PCPs), who work on the frontline of healthcare delivery, engage in skin cancer behavioral diagnosis and management.

However, many PCPs do not receive sufficient training in skin cancer detection during their post-graduate medical education [11]. Insufficient training may lead to numerous skin cancers being inadvertently overlooked or high volumes of benign skin lesions being needlessly excised [9]. While diagnosis of melanoma in its earlier stages is associated with decreased mortality, excision of benign lesions performed in the course of identifying melanoma not only bears financial consequences but also contributes to patient morbidity [9]. Undergoing a biopsy procedure and waiting for the pathology results may induce a sense of anxiety in patients, and the procedure itself will invariably result in a physical scar. [9]

Objectives

For the above reasons, numerous efforts have been devoted to the development of educational interventions and diagnostic aids (e.g., algorithms, mnemonics) that support sensitive yet specific skin cancer diagnosis by PCPs. To determine the effectiveness of these initiatives, we conducted a systematic review and quantitative synthesis of skin cancer educational interventions and diagnostic algorithms evaluated in PCP populations.

Our review comprised articles published between January 2000 and June 2021, and our PCP educational cohort included practicing physicians, trainee physicians, and advanced practice practitioners (APPs). We analyzed their skin cancer sensitivity and specificity outcomes through a meta-analysis. By comparing these key learning outcomes across multiple studies, this review will inform the future development of PCP-targeted programs seeking to adopt evidence-based approaches for skin cancer detection training.

Methods

This review was conducted in accordance with the PRISMA (preferred reporting items for systematic reviews and meta-analyses) guidelines. [12]

Eligibility Criteria

The inclusion and exclusion criteria for this review are listed in Online Resource 1. Studies deemed eligible for inclusion evaluated skin cancer detection training programs or diagnostic algorithms in PCPs. Eligible studies measured the effectiveness of the particular program/algorithm in terms of sensitivity and specificity for skin cancer diagnosis, including melanoma diagnosis. Studies that did not explicitly report these outcomes but provided sufficient data for calculations by the research team were included. All studies that involved technology deemed inaccessible to most PCPs or assessed computer-aided diagnosis were excluded.

For our population of interest, primary care physicians were defined as MDs or DOs practicing in family medicine, internal medicine, medicine/pediatrics, or obstetrics/gynecology. Studies that involved PCP trainees and/or APPs, such as nurse practitioners (NPs) or physician assistants (PAs), in the educational cohort were included. Studies that involved majority (> 50%) non-PCPs (e.g., dermatology physicians/trainees, medical students, laypeople) without segregation of data between PCP and non-PCP participants were excluded.

Data Sources

A medical research librarian (D.P.F.) searched MEDLINE (Ovid), Embase (Ovid), Web of Science (Clarivate), and the Cochrane Library (Wiley) for relevant articles published from January 1, 2000, to June 22, 2021. For each database, the librarian developed and tailored a search strategy in consultation with the research team and selected controlled vocabulary (MeSH and Emtree) and natural language terms for the concepts of melanoma, dermoscopy, and diagnostic algorithm. The search strategy implemented for each database is shown in Online Resource 2.

Searches were limited to the English language, but no other limiters or published search filters were used. Grey literature (e.g., conference proceedings, dissertations, reports, unpublished data) were included in addition to peer-reviewed articles. Previous review articles related to PCP-targeted training programs on skin cancer detection were excluded from data analysis but were closely examined by the team to identify relevant manuscripts not found during the search process. EndNote X9 (Clarivate) was used to deduplicate search results, and all unique records were identified and uploaded to Rayyan, a web-based software developed to help filter and manage search results. [13]

Study Selection

Two authors (T. T. and N. G.) independently reviewed all results generated from the search process for study eligibility, as depicted in the PRISMA flow diagram in Fig. 1. Titles and abstracts were screened using Rayyan [13]. For studies that passed the initial screening, full-text manuscripts were retrieved and independently assessed for eligibility. A third author (K. C. N.) provided the final decision in the event of disagreement.

Fig. 1
figure 1

PRISMA flow diagram of the study selection process. Abbreviations: PRISMA, preferred reporting items for systematic reviews and meta-analyses; MEDLINE Medical Literature Analysis and Retrieval System Online; PCP, primary care provider

Data Extraction

For articles deemed appropriate for inclusion, two authors (T. T. and N. G.) independently reviewed the full-text manuscripts plus any supplemental materials and independently extracted data. Data extracted included characterization of the educational cohort, educational intervention, and diagnostic algorithm (if applicable) as well as reported outcome measures. Studies that randomized participants to different educational exposures were treated as separate educational cohorts per their allocation.

The primary outcomes of interest were sensitivity (proportion of malignant skin lesions correctly diagnosed) and specificity (proportion of benign skin lesions correctly diagnosed). Related outcomes of interest included the total number of true positives (TP, number of malignant lesions correctly classified as malignant by the participant), false negatives (FN, number of malignant lesions incorrectly classified as benign), true negatives (TN, number of benign lesions correctly classified as benign), and false positives (FP, number of benign lesions incorrectly classified as malignant). If these values were not reported, the research team made reasonable efforts to calculate them using available published data. All calculations performed by the research team are shown in Online Resource 3. In some cases, values were extrapolated from graphical displays or obtained from correspondence with original investigators.

Data Analysis

To compare outcomes across multiple educational interventions and diagnostic algorithms, a meta-analysis was performed by a biostatistician (R. L.B.). For the purposes of this study, dermoscopic and clinical interventions were considered separately. Dermoscopic interventions trained participants in the use of dermoscopy for skin cancer diagnosis, whereas clinical interventions taught skin cancer diagnosis using the naked eye.

Data was aggregated across interventions by diagnostic algorithm. Pooled sensitivity and specificity outcomes were estimated using a bivariate linear mixed model with known variances of random effects, and variance components were estimated using restricted maximum likelihood [14]. For studies that provided both pre- and post-intervention data, odds ratios (ORs) were calculated and visualized using forest plots. The post-intervention datasets used for the meta-analyses have been made available in Online Resource 4. Statistical analysis and figure production were performed using the statistical software R (Version 4.1.1).

Results

The literature search retrieved a total of 1699 records (MEDLINE, n = 403; Embase, n = 632; Web of Science, n = 631; and the Cochrane Library, n = 33). The team also identified 22 additional records for screening from reference lists [15, 16]. Following de-duplication, 1164 unique records were identified and screened for eligibility (Fig. 1). In the initial round of screening, 1124 records were excluded based on their titles/abstracts, leaving 40 manuscripts for full-text review. Another 19 articles were then excluded for reasons listed in the PRISMA flow diagram (Fig. 1), the most common being insufficient reporting of the outcomes of interest without means for calculations by the team. For articles with overlapping datasets, the most recently published dataset was favored for inclusion. Ultimately, 21 articles were selected for inclusion in the quantitative synthesis and meta-analysis [17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37]. All the included articles were peer-reviewed publications. While grey literature were included in the search and selection process, none was eventually included in the quantitative synthesis or meta-analysis.

Study Designs

An overview of the 21 included articles is provided in Table 1 (dermoscopic interventions) and Table 2 (clinical interventions). The following study designs are represented in this review: 3 randomized controlled trials (RCTs) [18,19,20], 1 prospective cohort study [29], and 17 prospective cross-sectional studies. [17, 21,22,23,24,25,26,27,28, 30,31,32,33,34,35,36,37]

Table 1 Overview of the educational interventions included in this review that train PCPs in the use of dermoscopy for skin cancer diagnosis, listed by diagnostic algorithm (if applicable) and then by descending publication year
Table 2 Overview of the educational interventions included in this review that teach PCPs clinical or naked-eye methods for skin cancer diagnosis, listed by diagnostic algorithm (if applicable) and then by descending publication year

Study Populations

This review encompassed studies from 12 different countries: Australia [17, 20], Belgium [28, 34], Canada [24], Colombia [25], Ireland [31], Italy [18, 33], the Netherlands [19, 27], Serbia [26], Spain [18], Switzerland [32], the UK [35], and the USA [21,22,23, 29, 30, 36, 37]. All articles in this review recruited practicing PCPs. Five included APPs, such as NPs [17, 22, 29, 30, 37] and PAs [22], with one of these comprising only NPs [37]. Three included PCP trainees [27, 28, 31] with two of these comprising only PCP trainees [27, 31]. Three also included minority (< 50%) non-PCPs (e.g., general surgeons, plastic surgeons) in the educational cohort. [26, 30, 36]

Educational Interventions

To improve diagnostic accuracy by PCPs, many interventions taught skin cancer detection with a dermoscopy training component [17,18,19,20,21,22,23,24,25,26,27]. In terms of delivery method, most interventions evaluated this review provided live/synchronous instruction with the exception of three that provided web-based/asynchronous instruction only [29, 30, 36]. Of these three, two utilized an online curriculum called INFORMED (INternet curriculum FOR Melanoma Early Detection) that was rigorously designed to improve the skin cancer diagnostic skills of PCPs but is no longer available. [29, 30]

Diagnostic Algorithms

For skin cancer diagnosis, diagnostic algorithms refer to mnemonic aids that may be used during the evaluation of concerning skin lesions. These algorithms tend to emphasize the most salient diagnostic features. Clinical algorithms identified in this review included the ABCD(E) algorithm [18, 26, 28, 29] and the ugly duckling sign [28, 29]. Since these algorithms are commonly taught in medical education, it is likely that most training programs included some mention of ABCD(E) and the ugly duckling sign even if not explicitly stated. Dermoscopic algorithms identified in this review included the 3-point checklist [17, 18], 7-point checklist [19], BLINCK algorithm [17], Menzies method [17, 20], and triage amalgamated dermoscopic algorithm (TADA) [21,22,23,24]. Each algorithm/method is described in further detail in Online Resource 5. In addition to dermoscopic patterns and features, BLINCK also considered findings from the patient’s history and clinical examination (e.g., whether the spot is different from others, the patient’s degree of concern). [17]

Sensitivity and Specificity

In sum, this review encompassed over 58,610 assessments of skin lesions by about 1529 PCPs worldwide. Pooled sensitivity analyses relied on the total number of TP and FN across all participants for a given diagnostic algorithm, and pooled specificity on the total number of TN and FP. Pooled analyses results for both the dermoscopic and clinical interventions can be found in Online Resource 6.

Among the dermoscopic interventions, TADA was found to exhibit a high pooled sensitivity (91.7%) and high pooled specificity (81.5%) on over 10,800 total skin lesion assessments by 278 participants including NPs and PAs. In addition, the dermoscopy training programs developed by Bandic et al. [26] (121 skin lesion assessments by five participants including two general surgeons) and De Bedout et al. [25] (952 skin lesion assessments by 21 rural PCPs) had relatively high post-intervention sensitivity (89.1% and 83.8%, respectively) and specificity (92.0% and 77.9%, respectively).

Among the clinical interventions, those developed by Oliveria et al. [37], Orfaly et al. [30], and Harris et al. [36] all demonstrated sensitivity outcomes exceeding 90% (100%, 93.3%, and 91.0%, respectively). Of these three, the Orfaly et al. [30] and Harris et al. [36] interventions also exhibited high specificity (84.6% and 89.0%, respectively), while the Oliveria et al. [37] intervention exhibited a lower specificity (73.7%). Of note, the Orfaly et al. [30] intervention utilized the INFORMED curriculum.

Relevant outcomes of studies reporting both pre- and post-intervention data were visualized using forest plots. Figure 2 displays pre- vs. post-intervention data for dermoscopic interventions in terms of ORs. A positive OR suggests that outcomes favor the post-intervention condition, whereas a negative OR suggests the contrary. Educational cohorts that used TADA demonstrated improvement in sensitivity for skin cancer on post-intervention assessments compared to baseline [21, 22, 24]. Among the TADA interventions, specificity for skin cancer either remained the same [24] or increased [21, 24]. The clinical interventions evaluated in this review also generally resulted in improvements in sensitivity for skin cancer without significant loss of specificity. Forest plots for the clinical interventions are displayed in Online Resource 7.

Fig. 2
figure 2

Forest plot of pre- vs. post-intervention sensitivity and specificity outcomes for dermoscopic educational interventions. Abbreviations: TADA, triage amalgamated dermoscopic algorithm

Discussion

In regions with limited access to specialized dermatology care, PCPs engage in skin cancer detection, diagnosis, and management [38], but there is currently no standardized curriculum available to PCPs that teach early skin cancer detection [16]. The potential negative consequences of insufficient training—whether overlooking skin cancers or needlessly excising benign lesions—highlight the need for evidence-based programs designed to improve skin cancer diagnosis by PCPs [9]. To determine the most effective strategy for skin cancer education in PCPs, we conducted a systematic review of educational interventions and diagnostic aids evaluated in cohorts with over a majority (≥ 60%) PCPs. We determined the effectiveness of a particular curriculum/algorithm using participants’ post-intervention sensitivity and specificity outcomes. To our knowledge, this is the first meta-analysis of sensitivity and specific outcomes for PCP-targeted training programs on skin cancer detection.

In this review, we identified several interventions that resulted in relatively high (> 70%) sensitivity and specificity for skin cancer in PCPs [18, 21,22,23,24,25,26, 28, 30, 33, 34, 36, 37]. However, our analyses were complicated by heterogeneity in some educational cohorts. PCPs overall differed in terms of their career stage (e.g., trainee vs. attending physician), specialty (e.g., family medicine vs. internal medicine), years of experience evaluating skin lesions, and previous skin cancer detection or dermoscopy training [21]. Some educational cohorts also included non-PCPs [23, 26, 30, 36]. Two interventions, in particular, included a number of dermatologists in their cohorts without segregation of data: the Orfaly et al. intervention [30] with ≤ 5 (≤ 12.5%) dermatologists and the Harris et al. intervention [36] with 8 (2.3%) dermatologists. These two interventions were included in data analysis owing to the relatively small number of dermatologists in each cohort, but the presence of the dermatologists’ data may have positively skewed educational outcomes.

Of the 21 interventions included in this review, 52.4% (11/21) provided dermoscopy training, or instruction in the use of a dermatoscope (a non-invasive visualization tool) for skin examinations. With appropriate training, the use of dermoscopy in evaluating suspicious skin lesions improves the ability of PCPs to accurately diagnose and appropriately triage patients [39, 40]. Ongoing efforts are seeking to develop consensus-based proficiency standards for dermoscopy that are specific to the practice needs of PCPs.

In evaluating the effectiveness of a particular curriculum/algorithm, it is important to consider its sensitivity or specificity for skin cancer given the clinical relevance of these measures. While auditing participants’ real-world clinical assessments of skin lesions would constitute best practice, educational outcomes are often evaluated using assessments containing clinical and/or dermoscopic images of skin lesions. In this review, only three (14.3%) studies used participants’ in-person evaluations of suspicious skin lesions in real-life clinical practice [18, 19, 26], and the remainder used sets of clinical and/or dermoscopic images. Ideally, image sets used in these assessments would undergo formal validation by a panel of experts who determine whether images are of appropriate quality for PCP learners and whether image sets are of similar difficulty. Otherwise, lesions later deemed “problematic” may complicate analyses, as was the case for one article [36]. Classification of benign and malignant diagnoses on these assessments should also be consistent, especially for suspicious or borderline lesions (e.g., squamous cell carcinoma in situ, keratoacanthoma, actinic keratosis, dysplastic nevus).

Study Limitations

Our meta-analysis was limited by the number of articles providing enough data to calculate TP, FN, TN, and FP across all participants. In this review, many of these values were estimated based on the reported number of participants, number of test items, and percentage of malignant and benign lesions diagnosed correctly. However, some participants may have submitted incomplete assessments with missing answers, so TP, FN, TN, and FP may be slightly overestimated for some studies. In addition, this review only included articles published in the English language and may have missed reports of educational interventions and/or diagnostic algorithms published in other languages. Relevant articles of interest may have also been inadvertently overlooked during the screening process.

Conclusions

While PCPs play an important role in skin cancer detection in underserved areas, they may require additional training to accurately diagnose and appropriately manage skin cancer. To determine the effectiveness of different training programs, it is important to evaluate participants’ diagnostic performance in terms of sensitivity and specificity. However, this review identified relatively few PCP-targeted skin cancer educational interventions reporting these clinically relevant outcomes. To support further rigorous investigations of cancer detection education for PCPs, future studies should utilize validated instruments with a sufficient number of test items and segregate outcomes data between PCPs and non-PCPs. Among the training programs evaluated in this review, those that implemented TADA were found to demonstrate high sensitivity and specificity for skin cancer among PCP participants.