1 Introduction

Malignant melanoma (MM) is one of the most life-threatening skin cancers. Although it is the least common of all skin cancers, its incidence rates have risen faster than any other common cancers during the last 30 years. Recent statistics show that more than 1800 people in the UK are killed by this disease every year; its incidence rates have quadrupled since the 1970s [5]. Fortunately, MM can be treated successfully if it is detected and excised at an early stage.

There has been an increasing interest in early diagnosis of malignant melanoma using computer-assisted techniques in recent years [3, 7, 20, 24, 28]. Most computer-assisted diagnosis systems are based on the ABCD features of malignant melanoma, i.e. asymmetry of lesion shape [1012, 25, 26, 32], border irregularity [1, 9, 22, 26, 27], colour variegation [2, 6, 8] and large diameter (typically over 6 mm). Although the discriminating capabilities of the ABCD features are indicative, they are far from convincing [31]. This may due to the fact that they are primarily 2D features, which are prone to environmental effects and cannot fully describe a lesion’s distinctive 3D characteristic. Therefore, new features that can provide additional information are needed to be fused for an improved diagnosis.

Previous research [1418, 34] has found useful MM features through analysis of 3D surface textures, in the form of surface normals in the tilt and slant direction, the so-called skin tilt pattern and skin slant pattern. It has demonstrated better classification results than traditional 2D-textured-based features [17]. However, whether a combination of the 3D features with the classic 2D ABCD features can improve the existing diagnosis based purely on ABCD features has not been studied before. This motivates this study to carry a multivariate study on the combinations of the 3D surface texture features with the classic 2D ABCD features. The multivariate study firstly assesses the discriminating capability of each individual feature; secondly, it uses a forward selection scheme to select the best subset features. It is envisaged that the 3D skin surface texture features (3D surface normal features), which are related to a lesion inherent topographic information, will be complementary to the 2D ABCD features and that the fusion will be useful for improving the existing computer-assisted diagnosis of MM based on the ABCD features. In addition to the previously proposed 3D features, namely the overall skin tilt/slant pattern disruptions, this work will propose two new 3D features, so-called the most tilt/slant pattern disruptions. Both the previous and the proposed 3D features will be used in this feature combination study. Also, a comprehensive feature enhancement scheme consisting of both a preprocessing Gaussian filter and a postprocessing feature-preserving anisotropic filter is proposed here.

2 Methods

2.1 Photometric stereo and 3D surface texture

The 3D skin texture was acquired from a six-light photometric stereo device, whose theory is explained briefly here. For an ideal Lambertian surface, the image irradiance equation can be expressed as [19]:

$${i(m,n) = \rho (m,n) \cdot \frac{ - p(m,n)\cos \alpha \sin \beta - q(m,n)\sin \alpha \sin \beta + \cos \beta }{{\sqrt {p^{2} (m,n) + q^{2} (m,n) + 1} }}}$$
(1)

where α and β are slant and tilt directions of the illuminants, the partial derivatives, p = dz/dx and q = dz/dy are the x-axis (indexed by m) component and y-axis (indexed by n) component of the surface gradients at image position (m, n), respectively, and ρ is the surface reflection rate (albedo). At least three images with each acquired under a different illuminant are required to solve the three variables, i.e. p, q and ρ in Eq. 1. Since there are three extra images under another three different illuminants, those abundant information can be used to detect problematic pixels under specular and shadows and remove them from the computation. As a result, the recovered surface normals and reflectance images are free from those environmental effects.

Figure 1 depicts a six-light photometric stereo device known as the Skin Analyser [33, 34], which is used as the data acquisition system. Its schematic configuration is illustrated in Fig. 1(left), and the developed device is shown in Fig. 1(right). When used in clinical trials, it is placed with its axis perpendicular to the skin surface and a camera takes six images with each under a different LED illumination. The entire operation takes less than 1 s, so it meets the demand of the static set-up required by photometric stereo. All of the following experiments were carried out using this device. Figure 2 illustrates the actual six-light photometric stereo system on the left with a sample lesion image in the middle and its recovered surface normals on the right.

Fig. 1
figure 1

Left Schematic to scale; Right developed hand-held colour photometric stereo device known as the ‘Skin Analyser’. Only two of the six LED light sources are shown in either picture

Fig. 2
figure 2

Left Six-light photometric stereo imaging system in operation; Middle one of six skin lesion images taken by the Skin Analyser, Right recovered surface normal map of the lesion

2.2 3D skin surface texture features

2.2.1 Skin tilt pattern

It has been observed that MM tends to disrupt skin’s naturally formed and regularly shaped surface patterns by forming new irregularities or disruptions [23, 30]. As an example, Fig. 3 illustrates a MM’s 2D image and its 3D reconstructed image using the surface normals acquired by the six-light photometric stereo device. 3D skin surface disruptions can be clearly seen in the 3D image. In order to estimate these skin surface disruptions, a reference skin model is needed whose surface normals can be used as a reference to be compared with the actual surface normals of a lesion.

Fig. 3
figure 3

Left A malignant melanoma’s 2D colour image, Right its reconstructed 3D profile using the surface normals acquired by our six-light photometric stereo device

A natural choice would be a 2D Gaussian function, which allows us to adaptively select the best (or closest) model according to the surface characteristics for each lesion. Reasons for this can be explained as follows. Firstly, it is the most frequent distribution in real life and is widely used in various parametric statistical hypothesis and analyses; it is envisaged that anything abnormal such as MMs is likely to exhibit large deviations from those normal statistics. Secondly, as shown in Fig. 4(left) , its flexibility and variability allow it to approximate a wide range of 3D topographies, including topographies with sharp protrusions where the variances are set small and the amplitude large, near-flat topographies where the variances are large and the amplitude low, and hemispherical topographies where only the central part of the Gaussian envelope is used. Thirdly, a Gaussian distribution has a symmetrical contour, so it allows an asymmetry analysis of the 3D data. Due to the abnormal reproduction of melanocytic cells, it is envisaged that many MMs tend to have asymmetrical and irregular shape, so the symmetrical contour of the Gaussian distribution is capable of detecting these abnormalities. Fourthly, the transition in surface gradient from pixel to pixel on a Gaussian envelope is smooth; therefore, the similarity between neighbouring surface normal patterns is high, which is useful to simulate the regular skin patterns.

Fig. 4
figure 4

Left Cross section of 2D Gaussian envelopes can be very curved providing small variance(s) to near flat providing large variance(s). Right The disruptions in 3D surface normals are estimated as the difference in direction δ between a lesion’s acquired surface normal \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{N}\) and the corresponding simulated surface normal \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{{N_{s} }}\) by a possible best-fit 3D skin model generated by a Gaussian envelope. Here, the green curve indicates the lesion topography, while the overlapped blue curve indicates a possible best-fit Gaussian envelope and the irregular red curve indicates the surface disruptions (colour figure online)

Let \((m_{c}^{*} ,n_{c}^{*} )\) be the centre of a 2D Gaussian envelope; by projecting the Gaussian envelope onto the image plane, we obtain an isotropic distribution of tilt directions centred at \((m_{c}^{*} ,n_{c}^{*} )\). In relation to the skin reference model, our objective is to find a surface description that is closest to a lesion, so the centre has to be the one whose Gaussian envelope best fits a lesion’s tilt directions acquired from the Skin Analyser. This can be described as

$${\left( {m_{c}^{*} ,n_{c}^{*} } \right) = \arg \hbox{min} \sum\limits_{{\left\{ {(m,n) \in S_{l} } \right\}}} {\left\| {\varphi (m,n) - \varphi^{*} (m,n)} \right\|} }$$
(2)

where \((m_{c}^{*} ,n_{c}^{*} )\) is the estimated centre of the Gaussian distribution using least-square estimation, S l denotes the lesion region, ||.|| denotes the Euclidean distance, φ and \(\varphi^{*}\) are the acquired and the estimated tilt direction (pattern). The star sign “*” denotes an estimated variable. Upon the estimation of the distribution centre, the associated differences in the tilt direction are used to estimate the skin tilt pattern disruptions. The overall disruptions in skin tilt pattern (OT) is defined as (the average of) the sum of differences between the skin tilt patterns φ min of the best-fit Gaussian function and the acquired skin tilt patterns φ.

$${{\text{OT}} = \frac{{\sum\limits_{{(m,n) \in S_{\text{l}} }} {\left\| {\varphi_{\hbox{min} } (m,n) - \varphi (m,n)} \right\|} }}{{S_{\text{l}} }}}$$
(3)

where S l is the number of pixels within the lesion. Another feature called the most disrupted tilt (MT) is estimated as

$${\text{MT}} = \left\| {\varphi_{\hbox{min} } (m,n) - \varphi (m,n)} \right\|$$
(4)

2.2.2 Skin slant pattern

So far, for finding the centre location of the Gaussian distribution, all the computations have been limited to the xy plane and the tilt direction. To determine the exact topography of the Gaussian distribution, the slant directions should also be used, since the topography of a Gaussian distribution is dependent on its variance and amplitude. Accordingly, the best-fit Gaussian topography can be estimated as

$${(A,\sigma ) = \arg \hbox{min} \sum\limits_{{(u,v) \in S_{\text{l}} }} {\left\| {\theta (m,n) - \theta *(m,n)} \right\|} }$$
(5)

where θ denotes the acquired lesion’s skin slant pattern and \(\theta^{*}\) denotes the estimated skin slant pattern. Both patterns can be represented in terms of surface gradients as

$${\theta = \cos^{ - 1} \left( {\frac{1}{{p^{2} + q^{2} + 1}}} \right)}\;\;{\text{and}}\;\;{\theta^{*} = \cos^{ - 1} \left( {\frac{1}{{\sqrt {\left( {Z_{x}^{*} } \right)^{2} + \left( {Z_{y}^{*} } \right)^{2} + 1} }}} \right)}$$
(6)

where (p, q) are the acquired surface gradients of a lesion from the Skin Analyser, \((Z_{x}^{*} ,Z_{y}^{*} )\) are the estimated surface gradients in the x-axis and y-axis of the Gaussian functions, S l denotes the lesion region, ||.|| denotes the Euclidean distance, \(A^{*}\) and \(\sigma^{*}\) are estimated amplitude and variance of the Gaussian function.

In estimating the parameters of the best-fit Gaussian function, a nonlinear optimisation method, Levenberg–Marquardt (LM) method was used to solve the problem. Levenberg–Marquardt method refers to a standard routine optimisation scheme that is highly efficient in estimating the parameters that solve the least-square estimation problems. In certain computational software that enables the nonlinear numerical analysis such as MATLAB®, LM method has already been implemented and included in the options of the least-squared-based curve-fitting functions, such as lsqcurvefit in MATLAB. To speed up the estimation process, sometimes a good initialisation of the parameters is needed. This can be achieved by firstly searching through several values within the possible range of the parameters, and using the parameters with the lowest estimation error to provide an initial guess of the parameters.

Upon the estimation of the parameters, \(A^{*}\) and \(\sigma^{*}\) of the resultant Gaussian topography or portions of the Gaussian topography, the associated differences in the slant direction are used to estimate skin slant pattern disruptions. The overall disruptions in skin slant pattern (OS) are defined as (the average of) the sum of the differences between the skin slant patterns θ min of the best-fit Gaussian function and the acquired skin slant patterns θ.

$${{\text{OS}} = \frac{{\sum\limits_{{(m,n) \in S_{\text{l}} }} {\left\| {\theta_{\hbox{min} } (m,n) - \theta (m,n)} \right\|} }}{{S_{\text{l}} }}}$$
(7)

where S l is the number of pixels within the lesion. Another feature called the most disrupted slant (MT) region is estimated as

$${\text{MS}} = \left\| {\theta_{\hbox{min} } (m,n) - \theta (m,n)} \right\|$$
(8)

To depict this skin disruption estimation process more vividly, Fig. 4(right) illustrates that the acquired topography of the lesion (shown as a green curve) can be approximated by a 2D Gaussian function (shown as a blue curve), which is the 3D skin model best fits the acquired surface normals. Through this way, the actual disruptions in the 3D surface normals (shown as an irregular red curve) can be estimated by subtracting the acquired surface normals with those simulated by the best-fit skin model, without the influence of the underlying non-flat topography.

As an example from a real lesion, Fig. 5 illustrates the skin slant patterns along a sample row of a non-flat lesion of domed shape with its simulated best-fit skin slant patterns generated by a 2D Gaussian function and the estimated skin slant pattern disruptions on top of the lesion, estimated as the Euclidean differences between the acquired and the simulated skin slant patterns. Surrounding skin is not used as the skin slant pattern disruptions can be found by comparing with the best-fit slant pattern model. Being smooth, symmetrical while fitting closely to a lesion’s acquired skin slant patterns, the simulated best-fit skin slant patterns will be able to sense and detect the subtle variations in skin slant patterns without the influence of non-flat surface topography.

Fig. 5
figure 5

Left Skin slant patterns along a sample row of an elevated lesion of nodular shape; Middle its corresponding skin slant patterns generated by a best-fit 2D Gaussian function for the lesion region; Right the skin tilt pattern disruptions are estimated as the differences between the Left and the Middle figures for the lesion region

2.2.3 Feature enhancement

In view of the noise effects, an enhancement scheme, which consists of a preprocessing Gaussian filter and a postprocessing anisotropic nonlinear diffusion, is employed to enhance both the tilt and the slant pattern features, respectively. In the preprocessing step, the idea of applying Gaussian smoothing is to reduce the very high-frequency noise only at the expense of slightly reduced 3D skin surface texture information. Although there is a trade-off between reducing the high-frequency noise and preserving high-frequency 3D skin texture, it is envisaged that by properly choosing the smoothness scale (i.e. the variances), the 3D skin texture can be enhanced without losing much useful information. Here, the Gaussian smoothing function is applied directly to the three channels of surface normals, (n x , n y , n z ) separately, i.e.

$$\begin{aligned} n_{x}^{*} = n_{x} * G(u,v) \hfill \\ n_{y}^{*} = n_{y} * G(u,v) \hfill \\ n_{z}^{*} = n_{y} * G(u,v) \hfill \\ \end{aligned}$$
(9)

where * denotes the convolution operator and \(G(u,v,(x_{c} ,y_{c} ),\sigma )\) is a 2D Gaussian function, which has the following form,

$$G(u,v) = e^{{ - \left( {\frac{{(u - x_{c} )^{2} + (v - y_{c} )^{2} }}{{2\sigma^{2} }}} \right)}}$$
(10)

where (x c , y c ) is the centre of the Gaussian window function, and σ is the variance that controls the strength of the Gaussian smoothness function. The size of Gaussian window should be small enough to be sensitive in reducing noise locally and big enough to generate a smooth Gaussian envelope. We use the (2σ + 1) rule (which covers 95 % of a Gaussian envelope) to select the window size as 5. To reduce the local skin surface noise while preserving the local 3D surface texture features, it is important to choose a small σ. In our experiments, σ is chosen as 1.

In the postprocessing step, anisotropic nonlinear diffusion is applied here to reduce the noise effects on the skin tilt/slant pattern disruption [16]. Anisotropic diffusion refers to an iterative technique that is able to detect and enhance a local surface’s prominent features in both homogeneous and inhomogeneous texture regions [35]. This is a very attractive and an important property considering the fact that many benign lesions, which are covered by a fine network of skin patterns, are likely to have a homogeneous surface texture. Also many MMs, which can cause erosions and disruptions of skin patterns and forming new lines of varying directions, are likely to have inhomogeneous surface texture.

This approach involves adaptively choosing the filtering smoothness strength so that intra-regions become smooth, while edges of inter-regions are preserved. The degree of smoothness is an often decided by a non-negative decreasing function such as a sigmoid function, in which a threshold is used to judge whether the local feature is a signal or noise. A diffusion equation is used here to smooth out the noise within the local skin tilt/slant pattern disruption in successive iterations.

$$(X_{\Delta } )_{t} = {\text{div(}}{\mathbf{D}} \cdot \nabla X_{\varDelta } ) = {\text{div}}\left( {\left[ {\begin{array}{*{20}c} {D_{11} } & {D_{12} } \\ {D_{21} } & {D_{22} } \\ \end{array} } \right] \cdot \left[ {\begin{array}{*{20}c} {\nabla (X_{\varDelta } )_{x} } \\ {\nabla (X_{\varDelta } )_{y} } \\ \end{array} } \right]} \right)$$
(11)

where \(\nabla ()_{x}\) and \(\nabla ()_{y}\) denote the gradient operator in x-axis and y-axis, \(X_{\Delta }\) denotes skin tilt/slant pattern feature, D is the diffusion tensor, which controls the smoothing strength, and is defined as a function of the structure tensor, i.e.

$${\mathbf{D}} = \left[ {\begin{array}{*{20}c} {{\mathbf{v}}_{1} } & {{\mathbf{v}}_{2} } \\ \end{array} } \right] \cdot \left[ {\begin{array}{*{20}c} \chi & 0 \\ 0 & 1 \\ \end{array} } \right] \cdot \left[ {\begin{array}{*{20}c} {{\mathbf{v}}_{1} } & {{\mathbf{v}}_{2} } \\ \end{array} } \right]^{T} = \left[ {\begin{array}{*{20}c} {D_{11} } & {D_{12} } \\ {D_{21} } & {D_{22} } \\ \end{array} } \right]$$
(12)

where v 1 is the principal direction vector of the local signal variation, \(\chi\) is the smoothing strength, which is in the range of [0 1] and is defined using the exponential curve as

$$\chi = e^{{ - \frac{|\nabla X_{\varDelta}|}{K}}}$$
(13)

Here, the smoothing strength is chosen adaptively according to magnitude of local signal variation, represented by \(\left| {\nabla X_{\varDelta}} \right|.\) The parameter K is a threshold to judge whether the local structure is a feature or noise. In our experiment, K is empirically chosen as 0.4. For \(\left| {\nabla X_{\varDelta}} \right| < < K\), the local structure is deemed to be noise, and a large smoothing strength is applied, while for \(\left| {\nabla X_{\varDelta}} \right| > > K\), the local structure is seen as local feature, and very small or no smoothing should be applied to preserve the local feature. Finally, using the Euler forward difference approximation, the diffusion equation of Eq. 11 is expanded as

$$X_{\Delta }^{(t + 1)} = X_{\Delta }^{(t)} + \tau \left( {\frac{\partial }{\partial x}\Big(D_{11} (X_{\Delta } )_{x}^{(t)} }\Big) + \frac{\partial }{\partial x}\Big( {D_{12} (X_{\Delta } )_{y}^{(t)} } \Big) + \frac{\partial }{\partial y}\Big( {D_{21} (X_{\Delta } )_{x}^{(t)} } \Big) + \frac{\partial }{\partial y}\Big( {D_{22} (X_{\Delta } \Big)_{y}^{(t)} }\right)$$
(14)

The iteration step τ is chosen as a value smaller than 0.5/N d [37] where N d is the number of signal dimensions. Applying separately on the tilt/slant pattern feature means, N d is equal to 1, so τ is chosen as a value smaller than 0.5. The central finite difference is used to evaluate the partial differential equation of Eq. 14.

2.3 ABCD features

2.3.1 Asymmetry

In order to determine the asymmetry of a lesion, we have to find the centre of the lesion region, which is defined through moments (see Appendix 1). The two principal centroidal axes, which are 90° apart, are used to approximate the best axis of symmetry. Reflecting the lesion area by the two axes will result two non-overlapping area differences, which are equal to zero if the lesion is perfectly symmetrical and nonzero (as in most cases) if the lesion is asymmetrical. The least of the two differences ΔS min is used to calculate the asymmetry index (AI), which is defined as the ratio of the non-overlapping area to the original lesion area.

$${{\text{AI}} = \frac{{\Delta S_{\hbox{min} } }}{{S_{\text{l}} }} \times 100}$$
(15)

2.3.2 Border

The border irregularity index (BI) is defined as the roundness ratio [29] as

$${{\text{BI}} = \frac{{P^{2} }}{{4\pi S_{\text{l}} }}}$$
(16)

where P and S l denote the perimeter and the area of the lesion, respectively. If x i , i = 1, 2,…, N are sample points of the boundary, then the perimeter is given by

$${P = \sum\limits_{i = 1}^{N - 1} {\left\| {x_{i + 1} - x_{i} } \right\| + \left\| {x_{N} - x_{1} } \right\|} }$$
(17)

where ||.|| denotes the Euclidean distance. In a digital image, the area of the lesion can be evaluated by counting the number of pixels within the lesion. The ratio is smallest when the border profile is a circle, while it gets larger as the shape of the border deviates from a circle to indicate the increasing irregularities of the border.

2.3.3 Colour

With the possibility of reducing variations caused by different people and other environmental effects, “relative colour” instead of “absolute colour” is also used here [36]. It is defined as the normalised value of a colour component within the lesion subtracted from the normalised value of that colour component in the background skin.

$$\left[ {\begin{array}{*{20}c} {r^{{\prime }} } \\ {g^{{\prime }} } \\ {b^{{\prime }} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {r_{\text{lesion}} - r_{\text{skin}} } \\ {g_{\text{lesion}} - g_{\text{skin}} } \\ {b_{\text{lesion}} - b_{\text{skin}} } \\ \end{array} } \right]$$
(18)

where (r′, g′, b′) denote the relative colours in red, green and blue colour components, (r lesion, g lesion, b lesion) represent the existing lesion colours, (r skin, g skin, b skin) represent the average colour values of the surrounding skin computed based on [8]. Variegated colours within the lesion imply high variances in the respective red (R), green (G) and blue (B) colour components. So three colour features are selected as the standard deviations σ r , σ g , σ b in the red, green and blue relative colour spaces.

2.3.4 Diameter

The diameter of the lesion in pixel is defined as the longest distance between two sample points on the lesion boundary while the line between the sample points must pass through the centre of the mass (m c, n c).

$${D = \hbox{max} \left\{ {\left\| {x_{i} - x_{j} } \right\|{\text{where }}(m_{c} ,n_{c} ) \in \overline{{x_{i} x_{j} }} } \right\}}$$
(19)

where x i and x j denote the two boundary sample points, and the line segment between x i and x i must pass through the centre of the mass. Given the knowledge of an image’s magnification specification, the scale calculated by Eq. 19 can be converted from pixels to millimetres.

2.4 Feature selection

The 10 features used for the combination study will be the asymmetry index (AI), the border irregularity index (BI), the standard deviations σ r , σ g , σ b in the relative red, relative green and relative blue colour spaces, the diameter D and the proposed tilt pattern features including OT and MT and slant pattern features including OS and MS.

Although a large number of independent features are available for lesion classification, not all of these features contributed equally well to solve the classification problem. Sometimes the best classification result is not determined by the complete set of the input features {x(1), x(2),…, x(M)} where M is the number of features, and it is decided only by a subset of them {x(1), x(2),…, x(m)} where m < M. The purpose of the feature selection scheme is to select the optimal combination of features, which gives the best classification results. One way is to exhaustively evaluate all possible combinations of the input features. However, computational cost of this exhaustive search scheme is prohibitively high.

One commonly used feature selection scheme is forward selection. The forward feature selection procedure begins by evaluating the classification performances of all feature subsets that consist of only one input feature so that we can find the best individual feature, X(1). Next, it finds the best subset consisting of two features: the winner of one input feature, X(1), and one other feature from the remaining (M − 1) input features. So there are a total of (M − 1) pairs. After that, the input subsets with three and more features are evaluated. According to forward selection, the best subset with m features is the m-tuple consisting of X(1), X(2),…, X(m), while overall the best feature set is the winner out of all the M steps.

2.5 Lesion classification

Because feature selection is the main focus of this paper, we did not choose a very complex classification system such as ensemble classifiers [31] (in fact, the design of ensemble classifiers [21] should be another topic to be discussed). Instead, a single support vector machine (SVM) classifier [4], which has the advantages of simplicity and efficiency, while giving good classification power is used in this combination study. Specifically, the nonlinear SVM with a multilayer perceptron kernel function is chosen. The theory of SVM requires the input vectors to be nonlinearly mapped to a very high-dimension feature space, typically much higher than the original feature space. In this feature space, data from the two classes can always be separated by a hyperplane. The support vectors are those transformed training vectors that are equally close to the hyperplane and therefore are the most informative for defining the optimal separating hyperplane for the classification task and the most difficult patterns to classify. Among many hyperplanes that might classify the data, only the hyperplane that maximises the margin between the two classes is used for classification. Therefore, learning is formulated as an optimisation problem with the target of maximising the distance from the hyperplane to the support vectors, or equivalently maximising the nearest distance between a point in one separated hyperplane and a point in the other separated hyperplane.

3 Results

A total of 46 lesion subjects were collected over a period of 2 years at the collaborating dermatological clinics using the Skin Analyser. Consent forms were signed by participating patients involved in the study. For confidentiality reasons, each subject lesion collected was assigned a unique number and kept anonymously. The research ethics committee of the NHS (UK) approved our methods of using the clinical subjects in this work. Of the total 46 lesions, 12 are MMs and 34 are from nine other types of benign lesions. The 34 benign lesions include both non-melanocytic lesions such as four dermatofibromas (DFs), five intra-dermal naevi (IN), three hyperkeratotic squamous papillomas (HSP), eight seborrhoeic keratoses (SKs) and also melanocytic lesions such as two dysplastic naevi (DN), eight compound naevi (CN), two congenital naevi (CGN), one junctional naevus (JN) and one blue naevus (BLN). Inclusion of pigmented non-melanocytic lesions in lesion classification has largely been ignored by previous computer-based diagnosis systems. However, some pigmented non-melanocytic lesions can even be mistaken for melanocytic lesions even by experienced specialists [28]. Therefore, they should be included to test the accuracy of classification systems. All the lesions acquired are used for the classification experiments, and we did not artificially choose skin lesions for classification.

It is understood that the sample size is relatively small due to patients’ attendance rate at the collaborating clinics; therefore, we need to employ a convincing method to test the classification result. Leave-one-out cross-validation (LOOCV) is chosen because it can give an unbiased classification performance for each feature or subset feature. In a LOOCV scheme, a classifier will be tested on the one sample but trained on all but the one testing sample. So training and test samples are independent with each other. The LOOCV performance for the feature or subset feature is evaluated as the average classification result of all samples.

Some lesion classes have only a very small number of samples, including BLN, CGN, JN and HSP; therefore, the classification results may not be representative of the true discriminating power of the feature, and the 34 benign lesions are subsequently split into four sample groups. Sample group 1 includes only melanocytic lesions: one JN and eight CN based on the fact that both lesions are typically small, smooth and slightly raised. Sample group 2 includes only non-melanocytic lesions: eight SKs which are among the most common classes of benign lesions and have a distinct appearance from other benign lesions. Lesions in this class can vary significantly in visual appearance including size, shape, colour and texture. Sample group 3 includes only non-melanocytic lesions (five IN and four DFs) based on the fact that they both have raised and nodular shape. Sample group 4 is made up of a combination of non-melanocytic lesions of three HSP with melanocytic lesions of two DN, two CGN and one BLN.

Regarding each feature’s discriminating capability, its classification performance with regard to each sample group using a nonlinear SVM classifier with LOOCV is listed in Table 1. Among the 10 features, the best classification performance is achieved by MT for group 1, MT and MS for group 2, asymmetry for group 3 and OS for group 4. If judged by the overall classification performance for all lesion samples, the OS feature has demonstrated to be the best one amongst all the 10 features. The reason why the asymmetry feature has shown the best performances for group 3 and the relative red colour feature has shown good result for group 4 is not surprising as group 3 includes IN and DF mainly have a round or nodular shape, while group 4 contains the BLN and the CGN mainly have a uniform or even colour distribution.

Table 1 Classification performance ranking in percentage [listed as sensitivity/specificity (overall classification rate)] for the 4 sample groups and overall samples for the 10 features using a nonlinear SVM classifier

For single feature, the proposed 3D skin surface texture features have provided the best classification results for group 1, 2 and 4. At the same time, their classification results are also comparable to that of the asymmetry for group 3. If judged by the overall performances, the proposed OS has demonstrated the best classification results among all the 10 features. In general, the 3D features have shown better classification results than the 2D ABCD features, a finding consistent with [38] which indicates that the 3D features are better than the well-established border [22] and colour features using single classifier systems. The authors acknowledge that the 2D features used in this study are classic but rather simplistic compared to the 3D features, so using another set of 2D features might give different classification results. However, since each feature will only focus on one property of pigmented lesions, the purpose of this paper is not to select a single gold feature but to assess the discriminating power of the combined features, so the conclusions drawn from the single-feature classification experiment should be seen as indicative not conclusive.

Regarding the possible combinations of feature subset, the forward scheme as mentioned in Sect. 2.4 is used to select the optimal feature subset for (1, 2,…, m) features. Because OS is the best feature in the one-feature experiment, it is used in the subsequent two-feature subset selection steps. Then, the best two features are kept for selecting the best three features. The procedure repeats until the combination of m features (here m = 6) has been computed. Table 2 lists the classification performances for both the combination with 3D features (indexed as “a”) and the combinations of only the 2D features (indexed as “b”). For the former, the combinations beyond six features are not listed because the forward selection cannot select any new ones different to the existing six features. Also, inclusion of more features cannot improve the classification results indicating data redundancy. It can be seen that the classification results improve as the number of features increases until five features then starts to degrade with six features.

Table 2 Classification performances in percentage [listed as sensitivity/specificity (overall classification rate)] of the 4 sample groups and overall using a nonlinear SVM classifier for the best feature subsets (with feature size 2–6) selected by the forward feature selection scheme

For combined features, the best classification result is achieved as the combination of the five features, which gives a very promising classification result of 87.8 %. It is a substantial improvement over (1) overall 78.0 % achieved by the best single feature, (2) overall 83.1 % achieved by the best combination of the 2D features and (3) overall 79.3 % achieved by the best combination of the 3D features. Looking at the results for individual group, the best result by the 2D and 3D combination is also better than the best result by only the 2D combination in two out four groups, and both combinations match each other in one group. In the other group (group 4), the latter only performs slightly better in specificity, and the former’s result of 91.7 % specificity is also a very good result. Both combinations share the same good 100 % sensitivity in this group.

Comparing the classification results when the number of features is 2, 3, 4, 5 and 6, the combinations of the 2D and 3D features selected by forward selection scheme have shown better performances than those of only the 2D features. The best 2D and 3D combination has five features including OS, AI and three colour components but without the BI and the diameter feature. The best 2D combination also has five features, including AI, BI and all three colour component features but without the diameter feature. This suggests that (1) using all the ABCD features in either combination does not give the best classification result and (2) the 3D skin surface texture features can improve the 2D ABCD-based classification if used in combination with colour and asymmetry features. Therefore, it can provide complementary, useful and very discriminating information. Since ABCD features represent different properties of a lesion, it is interesting to see whether the two 2D features (diameter and BI) not selected by the feature selection can have some values in improving individual group’s classification results in the 2D and 3D feature combination.

The best result of five or six combined 2D and 3D features including the diameter is listed in Table 3 \({\text{a}}^{1*}\) and \({\text{a}}^{1**}\), respectively. Some interesting observations can be made here: the best result with five features performs better overall than the one with six features. However, the latter achieved 100 % sensitivity and very high specificity (83.3, 83.3 and 91.7 %, respectively) for 3 out of 4 groups. The only group it performs worse than both the former and the solely 2D features is group 2 which consists of SKs. However, SKs belong to a benign lesion class, that has a distinct visual appearance, different to other benign lesions and its size tends to be larger compared with many other benign lesions. In fact, the single-diameter feature performed poorly in group 2 (with accuracy below 50 % as listed in Table 1). It is this weakness that decreases the overall performance of the combination. However, in the clinics, a majority of SKs can be identified successfully by trained dermatologists. Therefore, if they can be excluded in the first place manually, the classification result by this 2D and 3D feature combination is significant in the context that it can help doctors to distinguish more difficult lesions from group 1, 3 and 4. This again proves that including the 3D feature (OS) in the combination can improve the 2D ABCD classification.

Table 3 Classification performances in percentage (listed as sensitivity/specificity (overall classification rate)) between 2D + 3D feature combinations and only 2D feature combinations using a nonlinear SVM classifier

In the next experiment, we assessed whether BI can be useful in the 2D and 3D combination. The best results by the five and six features including BI are listed in Table 3 \({\text{a}}^{2*}\) and \({\text{a}}^{2**}\), respectively. In both results, the combination with BI has shown very promising result for group 2, 3 and 4 (all with 100 % sensitivity and high specificity). It is significant that it is able to classify group 2, which has not been satisfactorily classified by all the other combinations so far. The group that this combination does not perform well is group 1. Group 1 that consists of junctional and compound naevi that are typically small; therefore, they are more likely to be classified correctly with the assistance of the diameter feature. However, inclusion of the diameter would likely to have difficulties in classifying other benign lesions such as SKs which tend to be large in size and more likely to be larger than 6 mm and therefore classified as MMs. On the other hand, shape-based features such as border irregularity are likely to suffer more from noise effects for small lesions than for larger lesions, therefore preventing it from making the correct classification. Indeed, in Table 1, the single border feature performed poorly for group 1 (overall accuracy below 50 %); it is the only group that this feature is not so capable of correct differentiation. In the most likely cases, many CN and JN are probably estimated as having large border irregularities due to noise effects, therefore preventing the distinction from many MMs.

4 Discussion

Regarding different features or feature combinations, their strengths and weaknesses are discussed in this section.

4.1 Asymmetry

Asymmetry has demonstrated itself to be a useful feature, particularly in discriminating round and nodular lesions, i.e. IN and DFs of group 3, from MM. On the other hand, the asymmetry feature has shown poor performance in discriminating the benign lesions in group 4, which include CGN and DN that are even considered as difficult by the dermatologists at the collaborating clinic. Reasonably good classification is achieved for sample group 4. Overall, the asymmetry feature has demonstrated the second best classification performance among the ABCD features next to relative red.

4.2 Border

Border irregularity has performed reasonably well for sample group 4 while behaves poor results for the others. Therefore, the simple border feature used in this paper is not sufficient for the differentiation between MM and benign lesions. Although more sophisticated border features [1, 22, 27] may lead to better classification results, they still suffer from the drawbacks below: firstly, although most MMs would have irregular border profiles, many benign lesions would also have large border irregularity indices [9]. Secondly, the border feature is very sensitive to imaging noise [19, 33, 34], this is particularly true for small lesions where the signal to noise ratio is low. Thirdly, the ground-truth border profile drawn by the dermatologists may even be different from person to person [9], seriously affecting the subsequent lesion analysis and classification. A recent finding [38] indicates that border features of [22] demonstrated inferior classification performances using a single nonlinear classifier than both the 3D features and the 2D colour features.

4.3 Colour

Colour has also shown the best classification performance for sample group 4. This is understandable as group 4 contains BLN and CGN (benign lesions with mainly uniform and even colour), which can be easily distinguished from most variegated-coloured MMs. Overall, the standard deviation of the relative red colour has demonstrated the best overall classification performance amongst all the colour features, which is consistent with other research on relative colour features [6]. It is also the best feature amongst all the ABCD features.

4.4 Diameter

Being the simplest and the most straightforward feature among the ABCD features, diameter shows some promises in classifying group 1 and group 3. In particular, results for group 1 are more understandable as it consists of CN and JN, which are typically small compared to many MMs. However, diameter alone is not capable of differentiating between MMs and benign lesions. This is because some benign lesions may be in variable size such as SKs in group 2, IN and DFs in group 3 and HSP in group 4, making it difficult to give the correct classification results.

4.5 Single feature versus combined features

Comparing Tables 1 and 2, all the five combinations of the 2D and 3D features selected by the forward selection scheme outperform any single feature’s overall classification result. Here, a sign test [13] is used to validate our claim that the former is superior to the latter in classification performance. The null hypothesis is that their classification performances are equivalent. By counting the number of wins or losses or ties, the former has won all the five cases. This gives a p value of 1/25 \(\left( {\begin{array}{*{20}c} 5 \\ 0 \\ \end{array} } \right)\) = 0.031, which is enough to reject the hypothesis. Therefore, the classification performances are different between the former and the latter. Judging on the classification results, the former is indeed better than the latter. Therefore, it is fair to say that each single feature has its own limitations and shortcomings for lesion classification, so there is not a gold feature that can give the best classification between MMs and benign lesions without assistance from other features.

4.6 2D feature combination versus 2D and 3D feature combination

Judging on the performances in Table 2, the 2D and 3D feature combinations outperform their solely 2D combination counterparts when the number of features in the combination is from 2 to 6, respectively. A sign test again is used to validate our claim that the former is better than the latter. Here, the null hypothesis H 0 is: the two types of combinations are equivalent in performance. The alternative hypothesis H 1 is: one type of combination is better in classification performance than the other. As this is a sign test, a straightforward way to compare the overall performance. In all the five feature combinations, the one with the 3D features has shown better overall performances than without. So the corresponding in this sign test p value is 1/25 \(\left( {\begin{array}{*{20}c} 5 \\ 0 \\ \end{array} } \right)\) = 0.031, which indicates that there is enough evidence to reject the null hypothesis. Therefore, the alternative hypothesis is valid indicating one combination is better than the other. Judging from the classification performances in Table 2, this further validates our claim that the combinations with the 3D features are better than the combinations with only the 2D features.

4.7 3D feature combination only

The combinations of only 3D features are also studied here for comparisons. The best classification in this category ended with two features (OS + MS), and adding more 3D features cannot improve the classification result. Adding the most disrupted feature in the slant pattern (MS) is able to improve both sensitivity and specificity slightly than using only the overall disrupted feature (OS). This is also a promising result as both features reflect surface variations (disruptions) in the z-axis (slant pattern). Therefore, they are the features unique in 3D and can potentially reveal more complementary information in addition to the 2D features than the tilt pattern features (OT + MT), which reflects surface variations in the 2D xy plane. Due to the number of 3D features available, we are unable to compare the classification performances with the other two-feature sets [i.e. (1) 2D features alone and (2) combined 2D and 3D features] beyond two features. Therefore, it remains to be seen whether more useful features can be found in 3D to further improve its classification performance.

4.8 Values of border and diameter

From the results in Table 3, it can be seen that the other two features (1) border irregularity and (2) diameter not selected by the forward selection in Table 2 have also justified their values. If used separately in combination with the AI, colour and 3D features, they can be useful in improving the correct classifications of non-melanocytic benign lesions from SKs and melanocytic benign lesions (CN and JN) from MMs, respectively. If non-melanocytic SKs are considered as less difficult to be correctly identified by many trained dermatologists and can be excluded manually beforehand, then the inclusion of the diameter feature in the 2D and 3D combination has more significance in assisting dermatologists in recognising other more difficult melanocytic and non-melanocytic benign lesions. If classifying SKs automatically is also important to reduce the cost of human involvement, it seems that a more sophisticated classification system involving multistage ensemble design maybe a right way forward. In this case, the border feature can be used to exclude non-melanocytic SKs as malignant in the first stage, and the diameter feature can be used to exclude many small benign melanocytic lesions such as CN and JN.

5 Conclusion

A computer-assisted diagnosis system of malignant melanoma consists of three steps (1) data acquisition, (2) feature extraction/selection and (3) classification. Improvements on lesion classification can be made on all three steps. This paper is focused on the second step, feature extraction and selection. An experimental study is conducted on the many possible combinations of 3D features with the traditional 2D features in current use. Judging on classification performances using a single nonlinear SVM classifier, the many possible feature combinations have demonstrated that the 3D features are useful in improving existing classifications based purely on (1) single feature and (2) combinations of the 2D features.

Out of all the feature combinations, the one including both the 3D feature, the overall skin slant disruption and the 2D features, three colour channel features, and the asymmetry index feature has shown the best overall classification rate. However, the other two unselected ABCD features including border irregularity and diameter have also demonstrated their values. Inclusion of border in the 2D and 3D feature combination has shown promising results with 100 % sensitivity and high specificity for 3 out of 4 lesion groups. The exception is the group with small compound and junction naevi whose noises are likely to hamper the correct estimation of the border irregularity feature. Inclusion of diameter in the combination has also shown very satisfactory results with 100 % sensitivity and high specificity for 3 out of 4 lesion groups. The exception is the group with SKs, which tend to be large in size, and therefore more likely to be mistaken with many MMs which are also large in size.

Future work can also be carried out to improve the third step, classification by using more sophisticated classifiers such as multistage ensemble classifiers with each stage designed to exclude either SKs or CN/JN, respectively. Another reason for an ensemble design is that based on the current study, a classifier can perform well against a class of lesions if it was trained with the same class lesions or classes that have similar appearances. In a clinical trial where no lesion class is known beforehand, a confidence vote of multiclassifiers seems reasonable where each classifier is trained against different lesion classes and a confidence score is collected to determine the final result. Also, we acknowledge that the experimental data used in this study are relatively small compared to others used in the literature. Therefore, a larger data set is desirable to arrive at more reliable results in future studies. Nevertheless, based on this study, the 3D features have demonstrated clearly its additive values in improving the existing 2D ABCD-based computer-assisted diagnosis of malignant melanoma.