FormalPara Key Points

The primary applications of automated image analysis in psoriasis involve detecting and outlining lesion borders, distinguishing psoriatic lesions from other skin conditions, objectively calculating area involvement and severity scores, and selecting treatments while predicting their response.

Currently, two commercial systems utilize total body photography, automated image segmentation, and semi-automated Psoriasis Area and Severity Index (PASI) calculation to enhance clinical patient care.

Key challenges for future successful AI implementation include the need for model validation and generalizability, efficient integration into clinical workflows, and the establishment of standardized imaging protocols.

1 Introduction

Artificial intelligence (AI) is a branch of computer science concerned with replicating human cognitive functions and analyzing large amounts of data [1]. As a field with primarily visual diagnoses and a large patient base, dermatology has recently experienced one of the most rapid developments in medical AI applications, particularly in the area of computer-guided image classification [2]. The development of convolutional neural networks (CNNs) for melanoma detection has provided groundbreaking work, as algorithms have shown great potential to improve human accuracy in the dignity assessment of melanocytic lesions [3,4,5,6]. For example, in a study by Haenssle et al. comparing the diagnostic performance of 58 international dermatologists with a CNN in melanoma detection using dermoscopic images, most dermatologists were outperformed by the algorithm [3]. In a real-world setting, the combination of human and artificial intelligence, or augmented intelligence, has been shown to increase diagnostic sensitivity and specificity in the evaluation of melanocytic lesions by integrating CNN classification into clinical decision making [6]. Naturally, these advancements in the early recognition of skin cancer have spurred research into CNN applications for other dermatoses, such as psoriasis.

Psoriasis is a common chronic immune-mediated inflammatory skin disorder that affects approximately 2–3% of the general population worldwide [7]. Onset can occur at any age and the disease is not yet curable [8]. Clinical presentation is variable and may include palmo-plantar, scalp, intertriginous, and nail involvement. Plaque psoriasis most commonly presents with sharply demarcated, silvery, erythrosquamous plaques on the extensor surfaces of the elbows and knees and the lumbosacral region (Fig. 1A). Other less common subtypes, such as erythrodermic (Fig. 1B) or guttate psoriasis (Fig. 1C), as well as genetically and phenotypically distinct pustular psoriasis (Fig. 1D), add to the diagnostic complexity [8, 9]. Since psoriasis is typically diagnosed visually and is easily photographed by healthcare providers and patients, the resulting image repositories lend themselves to analysis by AI [1]. In addition to cutaneous manifestations, patients are susceptible to multiple comorbidities, such as psoriatic arthritis and cardiometabolic syndrome, and most patients experience a decrease in quality of life, with an associated higher risk of developing depression [8, 10,11,12].

Fig. 1
figure 1

Clinical examples of different psoriasis subtypes. A Plaque psoriasis. B Erythrodermic psoriasis. C Guttate psoriasis. D Pustular psoriasis. Clinical image courtesy of the University Hospital Basel

To objectively report the extent of skin involvement and treatment response remains a major challenge in routine practice and research trials [13,14,15]. In the absence of established biomarkers, a variety of clinical scoring tools are currently used, most commonly the Psoriasis Area and Severity Index (PASI), Body Surface Area (BSA), and Physician’s Global Assessment (PGA) [13]. Major weaknesses of these include low efficiency, low intra- and inter-rater reliability, and questionable accuracy [13,14,15]. Since no single assessment tool has been shown to be superior or to fulfil ideal validation criteria, combinations are often used, depending on the application [13]. As treatment decisions and regulatory drug approvals are largely based on such measures, accuracy and consistency are paramount and could potentially be greatly improved through automated calculation. In addition, reimbursement for expensive biological treatments in most countries is based on minimum score ratings, for which BSA and PASI > 10 are considered cut-offs for severe disease [16].

In recent years, immunological targeting of key pathogenetic cytokines with biological therapies has revolutionized the therapeutic management of severe psoriasis [8]. In addition to well-established treatments such as methotrexate, tumor necrosis factor alpha (TNFα) inhibitors (adalimumab, certolizumab pegol, etanercept, infliximab), interleukin (IL)-17 inhibitors (brodalumab, ixekizumab, secukinumab), IL-23 inhibitors (guselkumab, risankizumab, tildrakizumab), IL-12/23 inhibitors (ustekinumab), and Janus kinase (JAK) inhibitors such as tofacitinib are increasingly being prescribed [8]. As early systemic treatment with IL-12/23 or IL-23 inhibitors appears to be protective by reducing the risk of arthritis progression [17], and treatment with TNFα inhibitors is suggested to reduce occurrence of cardiovascular events [11, 18], timely diagnosis and accurate severity assessment are increasingly critical. In addition, apart from facilitated diagnosis and treatment surveillance, objective image-based AI support could hopefully lead to a fairer distribution of resources and improve the quality of clinical trial data [16].

With such high hopes for AI to address unmet needs in the management and treatment of psoriasis and ultimately provide faster, cheaper, and more accurate results, some important questions remain—where are we now and what might our near future really look like? What challenges do we still face and how might they be overcome? After providing a basic introduction to the concept of image-based AI, the aim of this article is to provide an overview of current developments and their potential in psoriasis applications. Subsequently, we will discuss the remaining hurdles to implementation of AI for routine use and research purposes.

2 Overview of Image-Based Artificial Intelligence (AI)

Successful interpretation of computer-generated results and useful assistance for clinical decision making requires that dermatologists first acquire an understanding of the basic concepts of image-based AI.

In our review we will focus on machine learning (ML), currently the most commonly used subset of AI for medical applications regarding psoriasis [1]. ML allows a computer program to extract data patterns and attributes in an automated learning process in order to complete a given task [19]. ML that uses deep neural networks (DNNs) enables complex predictions by processing data in a similar way to biological neurons [1, 19]. Specifically, CNNs—a type of DNN architecture designed to process input data with a grid pattern—have proven well suited for medical image classification tasks [20,21,22].

In simple overview, a CNN consists of an input layer, multiple hidden layers, and an output layer (Fig. 2). The input layer receives the input image pixel values, which are passed on to a series of hidden convolution and pooling layers [21].

Fig. 2
figure 2

Exemplary architecture of an image-based convolutional neural network (CNN). Feature extraction: The input layer receives the pixel values of the input image. In the convolution layer, filters are used to scan the image in many sections to detect features such as edges or shapes, like a magnifying glass that highlights important details. In the activation layer, a mathematical function is applied to handle complex patterns by introducing non-linearity, like a light switch that highlights important details. The pooling layer zooms in on the big picture, summarizing information and reducing data size to make further processing steps more efficient. Classification: In the fully connected layer, all previously detected features are combined to make a final classification, or diagnosis. This result is presented in the output layer with a probability score. Clinical image courtesy of the University Hospital Basel, used with patient permission. ReLU rectified linear unit

A convolution layer typically uses combinations of linear mathematical ‘convolution’ functions as filters while scanning the input image by employing small, learnable parameter grids named ‘kernels’ to identify features and extract them [22]. In simple terms, these filters can be thought of as a magnifying glass that scans and focuses on small image sections to identify features such as edges or textures. As an analogy, one could imagine a resident dermatologist first examining a patient for clinical findings.

Next, an activation layer applies a non-linear mathematical function (most commonly the rectified linear unit, or ReLU) to the previous output to introduce complexity to the network and therefore allow more intricate tasks to be performed [22]. More simply, this layer decides which patterns are important by using ReLU to highlight significant features and ignore irrelevant information. In our analogy, the new resident would further consult with a senior dermatologist to determine which clinical findings on the skin are relevant.

Pooling layers apply mathematical functions to reduce the dimensionality of the feature maps produced by convolutional layers (e.g., by selecting the maximum or average value of the current view) [23]. In other words, this layer zooms out to see the bigger picture by focusing on the most important features while reducing the size of the data to make the information easier to manage. In our analogy, the senior dermatologist would summarize the resident`s findings in a short report.

The feature map resulting from the entire extraction process is then flattened into a one-dimensional vector and mapped by fully connected layers with learnable weights to the final network outputs, that is, class probabilities of dermatological diagnoses [22]. The learnable kernels and weights of the model are then optimized in an automated training process with the goal of reducing the differences between the real image classification, or ground truth, and output classifications calculated by the models [22]. In our analogy, this step can be imagined as a panel of expert dermatologists reviewing the summary report from the senior physician and integrating all available information to make a final diagnosis of the skin disorder of the patient.

Model training can be performed in a supervised, semi-supervised, or unsupervised manner [21]. In supervised learning, which is most commonly used for image classification tasks, training inputs are pre-labelled to provide the correct output for the model for trial and error improvement of the classification error [19, 21]. In unsupervised learning, unlabeled training data sets allow pattern discovery without human guidance in the form of a ground truth [19, 21]. As a combination of these two forms, semi-supervised learning is helpful in reducing the burden of labelling [19].

If the training data set size is too small, the model may overfit, meaning that the model only reflects the test distribution and does not generalize well to unseen input [23]. To counteract this issue, image augmentation techniques such as flipping, color adjustment, cropping, rotation, translation or noise injection can be applied to the training set to achieve more accurate model predictions [23]. The current approaches of image data augmentation techniques and their effects on model performance have recently been extensively reviewed [24]. For example, Krizhevsky et al. developed the AlexNet CNN architecture based on training on the ImageNet dataset [24, 25]. The authors increased the size of the dataset by 2048 times through image augmentation by randomly cropping, rotating, and color adjusting the original images, which helped reduce the error rate of the model by over 1% by avoiding overfitting [24, 25].

3 Current Applications and Potential of Image-Based AI in Psoriasis

Main automated image analysis applications in psoriasis include detecting and outlining lesion borders, differentiating psoriatic lesions from other skin conditions, objectively calculating area involvement and severity scores, as well as selecting treatments and predicting their response.

3.1 Image Segmentation of Lesions

In addition to correctly identifying psoriasis on skin photographs, a critical step in performing next-level tasks such as assessing disease severity is the automated detection and delineation of individual lesions. Manual image segmentation is a tedious task for dermatologists, so researchers have focused on developing automated image segmentation algorithms. A major advantage for this feat is that psoriatic lesions are usually easy to distinguish from the surrounding unaffected skin. However, challenges arise from poor image quality, including insufficient illumination, blur, or artifacts such as camera reflections, as well as the polymorphic appearance of lesions [26]. Previous algorithms often relied on feature engineering (e.g., feature-based Bayesian framework), lacked accuracy, or failed to segment challenging input images correctly (e.g., Markov random field combined with a support vector machine), limitations that have been partially overcome by the use of CNNs [1, 26]. Dash et al. developed PsLSNet, a 29-layer deep U-net-based CNN (designed for image segmentation, featuring a U-shaped architecture that effectively captures context in images and enables precise localization), which automatically extracts spatial information and was validated on 5241 images from 1026 psoriasis patients, including more challenging images [26]. Results showed an accuracy of 94.8%, outperforming all previous approaches [26]. In addition, two deep learning models (DLMs) based on a U-net architecture with a ResNet backbone (which enables training of very deep models with hundreds or thousands of layers) were developed and trained by Amruthalingam et al. to anatomically map and segment hand eczema lesions with high accuracy [27]. According to the authors, this model could also be applied to psoriasis, as both conditions can present very similarly with red, scaly patches and plaques on the dorsal and palmar aspects of the hands [27].

At the histopathological level, CNNs are expected to provide future clinical support by automatically analyzing skin biopsy images. As a first step, a U-net-based CNN was applied by Pal et al. to successfully segment psoriasis skin biopsy images into epidermis, dermis, and non-tissue, which is a prerequisite for the development of more sophisticated models that can recognize characteristic pathological features of the disease within each skin layer [28]. Such forms of image segmentation are not only valuable at the microscopic level, but can also be applied to macroscopic images to evaluate the presence of lesions, as well as disease extent and severity, as outlined in the following sections.

3.2 Diagnosis and Subtype Classification

For proper treatment, psoriasis must first be correctly diagnosed. In clinical routine, diagnosis is usually based on an inspection of the entire skin surface, including scalp and nails, while taking into account the patient’s medical and family history. Significant advances have been made by several research groups in developing image-based AI algorithms trained on large datasets of annotated psoriasis images to extract quantitative image features and automatically detect and classify lesions [29,30,31,32,33].

Aggarwal [29] was able to improve the performance of a CNN model discriminating five dermatological diseases (acne, atopic dermatitis, impetigo, psoriasis, and rosacea) by augmenting the input data with image transformations such as zooming, shearing, rotating, and horizontal and vertical flipping. Zhao et al. developed a two-stage CNN using 8021 images to discriminate nine different diagnoses based on clinical photographs, which made 9% fewer errors in diagnosing psoriasis compared with 25 dermatologists using a test set of 100 images (accuracy of CNN: 0.96, mean human accuracy: 0.87) [30]. Using Xiangya-Derm, the largest dermatology data set of the Chinese population with over 150,000 clinical images of 571 different skin diseases, Huang et al. developed a CNN to differentiate six common skin diseases, outperforming the accuracy of 31 dermatologists by 6.6% [31]. Several other CNNs have been developed to discriminate psoriasis from other dermatological diagnoses, with overall accuracy mostly comparable to or better than dermatologists [32, 33]. However, there is a lack of research on real-world applicability and open-source training data for currently published algorithms.

Furthermore, image-based AI applications need not be limited to the analysis of macroscopic images. Dermoscopic images offer high-resolution visualization of the skin, revealing subtle details such as vascular or pigment patterns through magnification of epidermal and upper dermal layers, potentially enhancing diagnostic accuracy depending on the clinical task. However, acquiring and interpreting these images requires time, specialized equipment, and expertise. For CNN classification purposes, dermoscopic image data sets tend to be more standardized, improving model generalizability.

In contrast, macroscopic images are more accessible, faster to acquire, and provide a broader clinical overview of lesions, making them preferable for initial screenings. Based on macroscopic assessment, clinicians can determine whether additional dermoscopic examination is necessary. A combined approach, utilizing both macroscopic and dermoscopic images, can be advantageous in providing both context and detail.

For instance, differentiating between psoriasis and seborrheic dermatitis on the scalp can be challenging using macroscopic assessment alone. Dermoscopy can offer additional diagnostic clues, such as the presence of annular and hairpin blood vessels indicative of psoriasis, or unstructured white areas and atypical vessels suggestive of seborrheic dermatitis, aiding in more accurate diagnosis [34]. Yu et al. trained GoogLeNet, a 22-layer deep CNN pre-trained on the ImagNet dataset, to differentiate scalp psoriasis from seborrheic dermatitis using dermoscopic images [34]. The algorithm outperformed five dermatologists with varying levels of experience with a 26.7% higher sensitivity and 6.8% higher specificity (sensitivity: CNN 96.1%, dermatologists (mean) 69.4%; specificity: CNN: 88.2%, dermatologists (mean) 81.4%) [34]. Furthermore, non-qualified physicians were able to achieve diagnostic performance similar to that of dermoscopy-proficient dermatologists through assistance from the model (mean sensitivity 79.1%, mean specificity 81.9%) [34].

This suggests that physicians without specialized training (e.g., in remote areas) or teledermatological applications could directly benefit from additional AI expertise to optimize patient management with dermatologists referred to when needed. The Telemedicine Working Group of the International Psoriasis Council recently determined that managing psoriasis through teledermatology is feasible in most cases, with exceptions for special affected areas such as the genitals or scalp [35]. A previous study has demonstrated that both online and in-office dermatologic follow-ups for psoriasis result in comparable improvements in psoriasis severity and Dermatology Life Quality Index scores [36]. While diagnostic AI holds significant potential to enhance these services, further studies are necessary to assess its implementation and effectiveness.

In terms of subtype classification, a CNN was used by Aijaz et al. to differentiate plaque, guttate, inverse, erythrodermic, and pustular psoriasis with high accuracy (84.2%) [37]. The training sets used included 80% of 172 images of normal skin and 301 images of psoriasis from the Dermnet dataset, while the remaining 20% were used for validation and testing [37]. Plaque and guttate psoriasis images were overrepresented in the dataset (plaque: n = 99, guttate: n = 96), followed by pustular (n = 48), erythrodermic (n = 33), and inverse psoriasis (n = 25) [37]. Regarding the classification performance for individual subtypes, the highest accuracy was achieved for inverse psoriasis (100%), followed by a sensitivity of 96.5% for normal skin (28/29), 87.2% for guttate (34/39), 85.2% for erythrodermic (23/27), 73.3% for pustular (22/30), and 70% for plaque psoriasis (28/40) [37].

A major limitation of these reported results is the lack of external test sets with diverse patient populations in different clinical settings, which would provide more insight into the generalizability of algorithms and their potential for real-world clinical use. In addition to psoriasis subtypes, other differential diagnoses presenting with red, scaly plaques such as atopic dermatitis, tinea corporis, mycosis fungoides, pityriasis rosea, or cutaneous lupus erythematosus must be distinguished from psoriasis by AI. To make an accurate diagnosis, CNNs must be trained using large datasets containing these differential diagnoses to recognize subtle differences in appearance and distribution patterns. As dermatology training sets become larger and include more images of psoriasis subtypes, differential diagnoses, and diverse patient populations, future algorithms are expected to become more comprehensive. In addition to diagnostic applications, AI has great potential to facilitate the assessment of the extent and severity of psoriasis, as detailed in the following section.

3.3 Assessment of Disease Extent and Severity

Automated assessment of psoriasis disease extent and severity has the potential to significantly reduce physician workload while ensuring a high degree of standardization and reproducibility.

3.3.1 Clinical Scores

Dermatologists currently mainly use the PASI, BSA, or PGA systems to grade clinical severity of plaque psoriasis [2, 14].

PASI is most commonly used in research studies and assesses the intensity of erythema, induration, and desquamation on different anatomical areas using a scale from 0 to 72 (maximum disease activity) [38]. It is often used as a standard measurement tool in the validation of new scores and usually correlates well with physician-based assessments, as measured by Spearman or Pearson correlation coefficients [13]. For example, Bozek and Reich evaluated the reliability of PASI, BSA, and PGA in the examination of nine patients by ten dermatologists, with each subject being assessed twice by the physicians [14]. Significant Pearson correlations were observed between all three scales, and no assessment instrument was significantly superior [14]. Major criticisms of the PASI score include its complexity, extensive time requirements, high variability, low responsiveness in mild disease, and non-linear scale [13,14,15]. Since PASI uses a discontinuous score from 0 to 6 to assess area involvement (0: 0%, 1: 1–9%, 2: 10–29%, 3: 30–49%, 4: 50–69%, 5: 70–89%, 6: 90–100%), changes within a score interval are not adequately reflected [39]. To address these inaccuracies, the linearly increasing PrecisePASI score was developed to accurately reflect the severity of lower BSA ranges by using the actual percentage of area involvement as opposed to imprecise area class intervals [39].

BSA calculation is often included in the assessment of psoriasis severity and can be estimated using the ‘rule of nines’ or the number of patient hand areas affected (with one hand representing approximately 1%) [13]. While computation is easily feasible in clinical routine and results in a linear measure, BSA is prone to overestimation and inter-rater reliability is variable [13].

PGA provides an ordinal 5- to 7-point rating ranging from ‘clear’ to ‘very severe psoriasis’, with good reliability independent from observer experience [13]. PGA has been shown to display the highest inter-rater reliability in comparison with BSA and PASI by Bozek and Reich (coefficients of variation [%]: PGA 29.3, PASI 36.9, BSA 57.1) [14]. It can be used statically to assess a single time point or dynamically for baseline comparison. Disadvantages include the high inter-rater reliability and lack of body surface area assessment [14]. Given these limitations, a more reproducible, standardized, and time-efficient estimation of disease severity is needed, which could be provided by image-based AI algorithms.

3.3.2 Automated Severity Scoring of Plaque Psoriasis

A prerequisite for automated severity scoring is the implementation of an accurate image segmentation algorithm [1, 26,27,28]. With the advancement of ML methods, CNNs (i.e., using U-net models) have already been developed that can estimate BSA at the level of a dermatologist [40]. However, the automated assessment of individual clinical PASI subcriteria from two-dimensional images is more technically challenging, especially with regard to three-dimensional features such as induration. Schaap et al. achieved this feat by using a CNN structure that takes ordinal scales into account and trained a separate network for each anatomical region (trunk, arms, and legs) and each PASI subscore category (erythema, induration, desquamation, and area), resulting in 12 CNNs [41]. The models were able to demonstrate similar performance to dermatologists in the scoring of erythema, scaling, and induration, while outperforming physician assessment in image-based area scoring [41]. A single-shot PASI system (SS-PASI) was developed by Okamoto et al., which assesses a simplified psoriasis severity score from a single input image of the trunk, since photographs of this anatomical area are usually readily available, fairly standardized, and show a large skin surface [42]. The CNN performed consistently with SS-PASI scores of human raters (13 dermatologists, 9 medical students) using a test set of 10 images that were excluded from the training images [42]. However, since the training set used by the authors contained only 670 psoriasis images, risk of overfitting is possible [19, 43].

While these and further examples from research applications and have previously been reviewed by Liu et al. [1], we would like to focus on currently available clinical tools.

3.3.3 Commercially Available Systems For Semi-Automated Severity Scoring

The use of total body photography (TBP) lends itself to automated psoriasis severity calculations in routine practice. Currently, there are two commercially available systems that use standardized photo documentation, automated segmentation, and subsequent semi-automated computer-assisted PASI calculation for patient assessment and follow-up.

3.3.3.1 Automated Total Body Mapping

FotoFinder ATBM® Systems GmbH (Bad Birnbach, Germany) uses Automated Total Body Mapping (ATBM) to provide a standardized, two-dimensional overview of the skin surface by allowing patients to assume various anatomical positions in front of a dynamic mount with a cross-polarized, xenon-flash, high-resolution camera [44]. Using FotoFinder’s PASIscan® analysis software, the underlying psoriasis type can be selected and automated lesion segmentation is performed to estimate PASI pre-score values, including affected body surface area of the head, arms, trunk, and legs, as well as erythema, plaque thickness, and scaling [44]. These values can then be manually adjusted by the physician for final PASI calculation, which may be particularly necessary for areas covered by hair, such as the scalp, or body parts covered by underwear. During follow-up, images can be viewed side by side for direct comparison and improvement is automatically quantified by PASI 50, 75, 90, or 100 (indicating 50%, 75%, 90%, or 100% improvement from baseline) [44]. The accuracy and reproducibility of this algorithm was evaluated in a comparative observational study involving three trained physicians and 120 plaque psoriasis patients, which showed a high level of human–AI agreement and demonstrated superior repeatability of AI assessment compared with physicians [45]. Based on the promising precision and reproducibility, it may be recommended for use in clinics with financial access to such technologies or for research trials after further studies have been conducted. Limitations include the inability of some patients (especially the elderly) to reach predefined positions for image acquisition, and the time resources and/or additional personnel required to capture respective image series [45]. In addition, lack of automated psoriasis subtype identification and body sites such as the genital area or hairy scalp that still require additional, thorough clinical examination by a dermatologist are a main limitation for the development of a fully automated score calculation.

3.3.3.2 3D Total Body Photography

In recent years, 3D TBP has been commercially developed using the VECTRA® WB360 (Canfield Scientific, Parsippany, New Jersey, USA) and overcomes some of these limitations. This system uses images captured instantaneously by 92 cameras in a single anatomical position to create a digital avatar of the patient’s skin surface from two-dimensional images in macro-quality resolution, excluding plantar surfaces, mucous membranes, and areas covered by hair (Fig. 3). A psoriasis assessment tool has recently been developed for the software that allows automated segmentation of the 3D avatar and calculates the lesion coverage of each anatomical region (head and neck, arms, trunk, legs, and whole body) [46]. Physicians can then manually score the erythema, induration, and desquamation of each region to calculate an automated whole body PASI score. Potential benefits include a simplified, more time-efficient image acquisition process. This novel algorithm has, however, not yet been validated in clinical trials. For melanoma screening, it has already been shown that patients prefer the 3D TBP system to the 2D-TBP system, mainly based on the more time-efficient, facilitated imaging process [47]. Further real-world comparative studies are needed to determine patient and physician preferences for psoriasis applications and to demonstrate true benefit of the Canfield algorithm in clinical use. Limitations of this system include its high acquisition cost and the significant space needed for setup, which restrict its clinical availability mainly to larger centers. Additionally, time and personnel resources are required to manually score erythema, induration, and desquamation for each region to calculate the whole-body PASI score. Automatic psoriasis subtype identification, similar to Automated Body Mapping, is currently not yet possible. Furthermore, special areas such as the scalp or plantar surfaces are not imaged and must be examined separately, limiting the potential use in remote settings (Fig. 3).

Fig. 3
figure 3

VECTRA® WB360 avatar of a psoriasis patient captured by 3D total body photography. Clinical image courtesy of the University Hospital Basel, used with patient permission

3.3.4 Automated Severity Scoring of Other Psoriasis Subtypes

While the above-mentioned algorithms focus mainly on severity analysis of plaque psoriasis, research has recently shifted towards other subtypes. Several well-established clinical scores have been developed to assess disease severity in psoriasis subtypes such as generalized pustular psoriasis (e.g., Generalized Pustular Psoriasis Area and Severity Index [GPPASI]), or for involvement of specific locations such as the nails (Nail Psoriasis Severity Index [NAPSI]) [48, 49]. Similar to plaque psoriasis assessments, calculation in a clinical setting can be tedious and time consuming, a task that could potentially be facilitated and standardized by the use of AI.

Folle et al. used a transformer DLM, which uses self-attention mechanisms to weigh the importance of different parts of the input image, to automatically quantify NAPSI scores with high agreement with human annotations (Pearson correlation of 90%) [49]. Amruthalingam et al. quantified pustular psoriasis efflorescences using a DLM to objectively evaluate disease activity [50]. A very high agreement was reached between the model’s predictions and expert labelling using a test set (intraclass correlation coefficients [ICC]: 0.97 for count and 0.93 for surface percentage) [50]. Reliability was confirmed by application to an unstandardized test set with multiple pustular disorders (Spearman correlation [SC] coefficients compared with dermatologist evaluation: 0.66 for count and 0.80 for surface percentage) [50].

While an automated severity score of plaque psoriasis would certainly meet the most common demand, we believe that it is important to continue a parallel investigation of AI applications in these rarer subtypes. If the accuracy and reliability of such algorithms continue to improve and even surpass human performance in future studies, we predict that semi- to fully automated severity scoring will soon serve as the gold standard in centers where respective technologies are available and for clinical trial assessments. By offering the advantages of consistency, objectivity, efficiency, precision, and scalability, AI could potentially overcome the limitations of current clinical assessment scores.

3.4 Treatment Selection and Response

Predicting treatment response and personalizing drug selection has great potential to improve the quality of life of psoriasis patients and optimize long-term outcomes. Currently, clinical treatment strategy is based on disease severity, subtype, location, presence of psoriasis arthritis and other co-morbidities, as well as patient preference and satisfaction [8].

Several AI applications have been developed that attempt to identify potential biomarkers and predict individual short- and long-term response to biologics [1, 51]. For example, the quantification of systemic inflammatory proteins measured before and four weeks after initiation of systemic treatment with tofacitinib and etanercept was used to develop an ML model that accurately predicted long-term response [52]. Unsupervised cluster analysis has been used to categorize psoriasis patients into three subgroups based on their lesional and non-lesional skin transcriptome to predict treatment effects of methotrexate and various biologicals using an ML algorithm [53].

Since AI has the capacity to analyze extensive datasets including patient records, clinical photographs, and molecular characteristics, personalized treatment plans may very well be our near future as new patterns continue to be discovered. ML approaches have already been used to show which patients with psoriatic arthritis would benefit from a higher starting dose of secukinumab [54]. We anticipate that image-based AI will also play a central role in the development of automated treatment decision algorithms for psoriasis patients. By integrating imaging data with clinical and genetic information, AI models could identify optimal treatment regimens tailored to individual patient characteristics, improving therapeutic efficacy and reducing potential side effects. Features such as the clinical phenotype, lesion distribution, and severity could be extracted from photographs using CNNs to serve as input for such treatment recommendation models. In addition, potentially influential variables for treatment success, such as patient age, gender, ethnicity, comorbidities, co-medication, or previous treatments, as well as molecular profiles, could be considered to optimize treatment choice once further research has been conducted.

4 Remaining Challenges

While integration of image-based AI into routine management of psoriasis and clinical trials yields great potential, many hurdles must still be overcome.

First, the achievable levels of sensitivity and specificity of ML algorithms are highly dependent on the quantity and quality of the input data [55]. With the widespread implementation of electronic medical records and routine photographic documentation of dermatological diseases, the exponential amount of training data has greatly enhanced the ability of ML algorithms to learn and perform complex tasks [2]. In terms of quality, this feat is somewhat hindered by the current lack of standardized conditions, as images are captured by physicians and patients in various settings using different photographic devices, lighting, backgrounds, color calibration, and angles [2, 56]. The routine use of standardized total body photography systems such as the VECTRA® WB360 (Canfield Scientific) or the ATBM system (FotoFinder) partially addresses this problem, but these systems are expensive, require additional staff and spatial resources, and are therefore often only available in specialized centers. Because complex algorithms such as CNNs require extensive datasets to achieve generalizable outcomes, current training and validation image sets remain heterogeneous, making it more difficult for algorithms to distinguish between real and artificial discrepancies. In addition, currently available training sets lack healthy patient images that allow algorithms to distinguish lesions from intact skin without potentially introducing biases from other features such as anatomical localization. With the many algorithms and methods currently published, there is currently no accepted psoriasis-specific open-source dataset that can be used to compare performance.

Failure to train a model with the appropriate input data would result in incorrect diagnostic classification and severity scoring. The diversity of the training set data is additionally critical in the development of a generalizable algorithm. However, patients with skin of color, elderly patients, children, and women are often underrepresented in training image repositories, leading to potentially erroneous results when models are applied to these patient populations [57,58,59]. For example, psoriasis in the pediatric population is more prevalent on the face and flexures than in adults, and plaques are often smaller and thinner, potentially leading to misclassification of the diagnosis [60]. Additionally, psoriasis manifests differently in various ethnicities and populations. For example, in skin of color (Fitzpatrick scale IV–VI), erythema may be less apparent and appear violaceous or hyperpigmented, potentially leading to severity underscoring or incorrect image segmentation if a model was trained with only lighter skin types on the Fitzpatrick scale (I–III) [61]. In general, the Fitzpatrick scale is widely criticized for its subjectivity and development with only White patients, as its reliance on terms such as ‘burn’ or ‘tan’ inadequately describe the effects of UV radiation on darker skin tones, calling for the use of more objective measures such as spectrophotometric assessments when labelling image sets [62]. AI models that were only trained on images from one population are therefore at risk of bias and inaccuracy when being generalized. In addition, the performance of algorithms regarding postinflammatory hypo- or hyperpigmentation after successful treatment should be assessed in clinical application, since residual discolorations may affect results if this aspect has not been considered in the training process.

In order to interpret outputs, physicians need to understand the capability and limitations of AI models, which is especially critical for applications involving treatment decisions. In addition, especially for neural network-based models that often make ‘black box’ decisions, the lack of explainability can be detrimental to medical applications, as physicians need transparency to trust and integrate AI assessments into their clinical decisions. For medical image analysis tasks, several interpretability methods have been developed and recently reviewed, including attribution maps that highlight the important regions of an input image, language descriptions that provide written justifications, or internal network representations that depict different features learned by filters in the CNN [63]. A truly comprehensive algorithm, which has not been developed to date, needs to be transparent to clinicians and validated in a broad real-world setting to ensure applicability across all skin types, ages, genders, and clinical phenotypes.

Many of the image-based AI algorithms developed to date have not yet been studied in a clinical setting, so their real-world accuracy and utility remain uncertain. Clinical trials with sufficient statistical power and validation studies are needed to evaluate true performance in clinical practice. Psoriasis lesions vary significantly in size, appearance, and anatomical location, so AI models must prove their ability to handle this variability and complexity in real-world applications. Furthermore, it remains to be determined how real-world image transformations (e.g., due to slight movement of the patient or changes in lighting) affect consistency. For melanoma risk scoring of digital dermoscopic images using CNNs, it has already been shown that slight user-induced image changes can significantly alter classification results during repeated imaging [64]. Therefore, additional evaluation of the robustness of psoriasis AI models should not be overlooked in future clinical trial design.

Finally, patient and physician acceptance of new technologies must be considered for successful implementation. It is critical to seamlessly integrate image-based AI applications into the clinical workflow without adding complexity to the patient care process, which could negatively impact perceptions. Compliance with regulatory standards and ethical considerations regarding patient privacy, patient consent, and image data protection must be ensured for the responsible use of image-based AI in healthcare. Overcoming these challenges and optimizing clinical workflows will require close collaboration between deep learning engineers, physicians, and researchers. We believe that interdisciplinary communication is essential to the development and implementation of accurate, robust, reliable, and ethical algorithms with maximum clinical utility.

As a future outlook, CNNs may soon be replaced by a new state-of-the-art technology for medical image classification tasks. Compared with CNNs, the use of Vision Transformer (ViT) algorithms has already shown promising results and requires a simplified training process with much smaller data sets [49, 65].

5 Conclusions

Dermatology is undergoing a paradigm shift with the rapid development of image-based AI. When applied to psoriasis, there is great potential to facilitate diagnosis, standardize and streamline management, and optimize treatment of the disease. Despite the promising outlook, many challenges remain, including validation of current models, integration into clinical workflows, current lack of diversity in training set data, and the need for standardized imaging protocols. However, given the current pace of technological development, a revolution in the field has already begun, as exemplified by the commercial availability of two semi-automated PASI score calculators based on total body photography. Based on previous efforts to use AI to identify potential biomarkers and predict treatment response to biologics, it is anticipated that augmented intelligence will soon become an integral part of treatment and disease management. We expect to see a new diagnostic era in the care of psoriasis patients in the coming years due to the unprecedented capabilities of AI. As research and innovation in this area continues, patient outcomes are expected to improve substantially while reducing the burden on healthcare systems.