1 Introduction

Body temperature regulation and infection and injury protection are two crucial functions of the skin. Melanoma can develop from malignant skin cells and impact body areas that typically receive little to no exposure to sunlight [1]. A total of 5.4 million distinct nations report cases of skin cancer each year [2]. Some studies project that from 95,360 cases in 2017 to 207,390 cases in 2021, skin cancer cases in the USA will rise [3].

The mortality rate of skin cancer is decreased by early detection and prevention [3]. Dermatologists with the necessary education and expertise can diagnose skin cancer. In order to choose skin lesions, it is necessary to have clinical information about the patient because similar pixels and textures call for the visibility of aspects that are undetectable to the unaided eye [4].

Using conventional techniques like colour, diameter, and asymmetry, dermatologists can identify skin cancer. Imaging methods save time and money while enabling manual image inspection that is more precise than with conventional models [5].

Fuzzy logic often deals with data coming from recognition and computational recognition, such as uncertainty, partial truth, inaccuracy, or ambiguity. One of the many well-known reasons why realistic estimations of a factor are all integers in the range of zero and one, both inclusive, is fuzzy logic [6, 7]. It is used to cope with imperfect real notions whose true value can range from being completely clear to being completely incorrect. The numerical quality 0 or 1 is the factor that is really estimated from the basic Boolean operations by differentiation. Systems based on decision-making, pattern recognition, control, and optimization can be developed using fuzzy logic [8]. The fundamental idea comes from the visual analysis performed by dermoscopy. The infection or lesion site is distinct from normal or healthy skin. Better segmentation techniques are possible because of this disparity. For many computational issues, fuzzy logic enables inaccurate human intervention [9]. By resolving conflicts and evaluating prospective improvements, it also has beneficial consequences for a number of criteria. There were 8,441 fatalities from skin lesions in the USA in 2009, according to statistics [8]. Skin imaging is possible using dermoscopy, a non-invasive method. For the diagnosis of skin lesions, it has evolved into a standardisation tool in recent years. An optically-based magnification has been applied to the area of interest. As a result, the surface's underside appears significantly more in these photos than in typical microscopic ones [9]. It is an issue that dermoscopy lowers diagnosis accuracy in dermatologists who are inexperienced [10]. Require new computerised tools to understand images rather than methods to correct these diagnostic flaws. These mistakes result from the subjectivity of visual interpretation and the intricacy of the occurrences [11]. Future technology will allow for the central detection of many lesion items.

For AI modelling, there are two methods. Built around learning and norms. According to some rules that were coded by humans, the first rule-based technique creates outputs that are specified. The second model makes use of artificial intelligence, a notion employed in a variety of fields, including mining, ecology, and urban planning. Machine learning and deep learning are the two types of artificial intelligence that are now available [12]. Data identification and prediction are made possible through machine learning. In general, the accuracy of the representation of the input data has a significant impact on how effective ML algorithms are. Performance is improved by a good data representation over a bad one. Traditional neural networks are where deep learning (DL) is derived from, although it outperforms earlier models. In addition, DL constructs multilayer learning models using both graph and transformation techniques. These methods use a multilayer architecture for their data representation, with the first layer extracting low-level features and the last layer extracting high-level features. In large data research, deep learning also offers a number of benefits [13]. The convolutional neural network (CNN) is one of the DL prototypes that has proven to perform very well in developing video and image processing for GPU (graphics processing unit) computing systems [14]. Performance and execution time are improved on training and test datasets by using quicker region of interest (ROI) selection and CNN-based classification techniques [15]. Melanoma is thus classed as having a high probability [16].

The following is an overview of the main contributions of this work:

  • Fuzzy-based capsule neural network (fuzzy CapsNet) model is a technique suggested by Bayseian, which has better loss performance.

  • This work assesses the generative model's precision and evaluates the findings of the enhanced fuzzy-CapsNet method against those of the current lesion classification methods.

  • In addition, this study sought to comprehend the model's efficiency in categorising lesion segments with higher accuracy and faster processing times than other models.

2 Literature survey

When applying CNN transfer learning, the associated image size influences the accuracy of skin lesion classification. Furthermore demonstrated is the superior performance of image cropping over image scaling. A straightforward ensemble technique that combines the output of three fine-tuned CNNs and six scale-cropped images yields the best classification performance, which is shown in the final section [17]. Results of an upgraded use of the whale optimization technique on a CNN in terms of efficiency. With this method, one can lessen the difference between a network's output and desired output by determining the best weights and biases for the network [18].

In suggested DSNet, a network for skin lesions that automatically segments semantic information. Here, employ separable depth wise convolutions to lower the network's weight and parameter count [19]. The original model's parameters serve as the beginning values for pretrained AlexNet with transition learning. The final three layers' weights start off at random. We utilised the recently released dataset ISIC 2018 [20] to evaluate the suggested method.

Current computer-assisted diagnosis using thermal imaging often involves a number of steps, including picture segmentation, feature extraction, and classification [18,19,20]. Based on a feature-based image matching pipeline, we first discuss feature detection, interpretation, and matching strategies, ranging from manual to trainable methods, and provide an overview of the theoretical and practical development of these techniques. Pattern analysis can frequently be used to explain how skin lesions develop their horn-like appearance [21]. Lesion segmentation utilising clearing histogram thresholding is proposed for a skin lesion detection system. Using the use of the ABCD rule, we create a model that can segment and categories skin images [22].

It has been suggested to employ the histogram-based cluster estimate (HBCE) algorithm to predict the number of clusters needed in the NCM technique (NCM) [21] in a new skin lesion detection approach dubbed HBCENCM. A number of computer algorithms have been created to categorise different forms of skin lesions utilising image processing, feature descriptors, and pattern analysis. These algorithms are based on existing machine learning techniques.

An approach to segmenting and classifying data based on deep learning has been suggested [22]. A Resnet50 feature pyramid network (FPN)-based MASK R-CNN architecture with Resnet50 is used to segment skin lesions. After that, the final mask is created by mapping the linked layer-based characteristics. Activations in the classification stage are based on higher feature representations and are constructed as a 24-layer convolutional neural network architecture. A softmax classifier is then used to finalise the classification after receiving the best CNN features [23].

The CAD system functions in an experimental scenario, but from their study [24] states that it needs rigorous validation in a real clinical setting. This is due to the fact that there are numerous tiers of hyperparameters that require human tuning and configuration. A strategy for classifying tumours as malignant or benign was put out and analysing digital dermoscopy images. A three-step process is used in this. An autonomous neural network was used to extract the lesions initially by using similarity-based graph neural network (SGNN). Then, compile border, colour, and texture features. Classifying the lesion items is done by the network ensemble classifier. To analyse dermoscopy images, we recommend combining Fisher vectors (FV), deep convolutional neural networks (CNN), and linear support vector machines [25].

3 Dataset

In this investigation, we made use of the ISIC dataset from the "2017 ISBI Challenge on Skin lesion Analysis into Melanoma Detection." Over 2000 usable dermoscopy images are included in the dataset. The images in this dataset have a variety of noise abnormalities, including low contrast, hair, black boxes, vignetting surrounding the image, and noise that makes it difficult to distinguish lesion boundaries. The effectiveness of automatic image preparation models can be thoroughly tested using these photos as a platform. Various noise artefacts are present in a dermatological map, as shown in Fig. 1.

Fig. 1
figure 1

Sample images of ISIC 2017 skin cancer dataset

The International Skin Imaging Collaboration's 2018–2019 Archives and PH2 (accessed 18 March 2022) are two skin imaging datasets that were used to identify skin melanoma. The ISIC 2018 comprises 10,015 training photos, 1512 test images, and lesion classifications for melanoma, melanoma nevus, basal cell carcinoma, actinic keratosis, benign keratosis, dermatofibroma, and vascular areas. The ISIC 2019 dataset, which consists of 25,531 images, is divided into nine categories: melanoma, melanocytic nevi, basal cell carcinoma, actinic keratosis, benign keratosis, dermatofibroma, vascular region, caudate nucleus and none of the others. 8238 test and practise photos are included.

There are 25,331 dermoscopy-labelled images in the 2019 ISIC library, and they are divided into 8 different categories (i.e. types of skin lesions) as shown in Fig. 2. At various locations over time, images have been gathered using a range of tools. 450 × 600 to 1024 × 1024 pixels, then, make up the resolution range.

Fig. 2
figure 2

Sample images of ISIC 2019 skin cancer dataset

The PH2 benchmark dataset is used to assess how well the suggested system performs. It consists of 200 8-bit RGB digital pictures with a total resolution of 768 × 560 pixels. There were 80 common spots, 80 atypical spots, and 40 melanomas among the different skin lesions visible in the photos as shown in Fig. 3. Melanocytic lesions are more prevalent in the datasets for PH2 and ISIC used to diagnose melanomas. The two overlook double-melanocytic lesions in favour of melanocytic lesions. The training photos are not a good representation of the actual data because the images in the dataset are clinical skin scans rather than dermoscopic images.

Fig. 3
figure 3

Collection of images from PH2 dermoscopic dataset with various noise artefacts

4 Preprocessing: automated image preprocessing model

To support efficient lesion identification, our proposed methodology consist of three parts: removal of dark frames and vignette effect model, removal of thin and thick hair artefacts model, and removal of surgical ink artefacts model.

4.1 Removal of dark frames and vignette effect artefacts model

Dark frames and vignette effects are often observed in dermoscopy images. Vignette is darkening effect caused by reduction of brightness and saturation around the periphery (corners) of the image. It is important to remove these noise artefacts because often they are considered as part of lesion region and affect the performance of lesion edge estimation. In our proposed methodology, we first determine the height and width of original image and then crop the original image by performing image slicing over rectangular bounding box.

This helps to remove the unwanted vignette effect and dark frames from dermoscopy images. After the cropping, images obtained may vary in size. Also images obtained from the dataset may not be in standard sizes. It is always better to rescale images before it is fed to any computer aided classification model. We resize the image to the expected size of 255 × 255 pixels. Figure 4 shows input image cropped and resized to required 255 × 255 pixels.

Fig. 4
figure 4

a Original image, b Cropped image, and c Pixel resized image

4.2 Removal of thin and thick hair artefacts model

In recent years many techniques such as Dull Razor, mathematical morphology operations, inpainting method, and thresholding methods have been evolved to eliminate hair artefacts from dermoscopy images [26]. In this phase of our proposed methodology, we develop a model utilising various imaging techniques to successfully eliminate both thin and thick hair noise from dermoscopy image without affecting quality of image. Further this model can be utilised by researchers to aid in proper segmentation of lesion region and accurate melanoma classification.

A colour image in RGB colour space is transformed to greyscale in this step of the suggested procedure. Dermatologists can more easily distinguish lesion margins and other crucial details by converting colour photos to greyscale. The inherent complexity of greyscale images is also lower than that of colour images. This is merely a result of the fact that each pixel requires less data because they are various tones of grey. The weighting approach, which distributes the red, green, and blue channels according to each wavelength, is the most popular way to turn an RGB colour image into a greyscale image.

Using the black-hat technique, a lighter background is used to draw attention to darker things (hair) (skin tone). It's crucial to pick the shape and size characteristics of the structural parts properly in order to tell hair structures apart from greyscale photos. The rest of the image's exploration of the hair structure makes effective use of elliptical structural elements. Empirically, it has been found that the effectiveness of hair removal is greatly influenced by the proper choice of the structural elements' core (size) diameter (Figs. 5).

Fig. 5
figure 5

a Original image with hair, b Greyscale image, c Black-hat filtered image, d Binary thresholding applied on image, and e Image without hair objects

4.3 Removal of surgical ink markers artefacts model

Purple blots are the most common way dermatologists identify worrisome skin lesions. Ink markers were found to be more distinct in the malignant lesion dataset as compared to the benign lesions, which also had an impact on accurate segmentation, when we examined a dataset of dermoscopic images accessible for melanoma identification. It is crucial to realise that these ink marks may connect the marks incorrectly as being a part of a skin lesion, raising false positive results and preventing accurate diagnosis, before adopting a machine learning system. To suggest an automated marker ink removal model to effectively remove purple ink traces from dermoscopic images. In the suggested technique, we first transform RGB photos into a hue-saturation value colour space that facilitates colour-based image segmentation for purple or blue ink indications. You may extract colour (hue) independently of saturation and pseudo-illumination using the HSV colour representation. The majority of the blue or purple ink markers in the dataset fell into the lower and higher purple bands because of how the bands were placed. To create a final image free of ink smudges, repeat the inpaint procedure to recover the original image from the masked image. In Fig. 6, an image before and after using the automated model is compared.

Fig. 6
figure 6

a Images with ink markers and b Preprocessed image without markers

5 Adaptive fuzzy Gray-Level Co-Occurrence Matrix (GLCM) segmentation

In most cases, the gray-level co-occurrence matrix (GLCM) technique effectively eliminates artefacts using arithmetic. Images also allow for the easy differentiation of textures. To facilitate analysis, images can be divided. Hence, by combining cleanup methods for segmentation, we employ GLCMs to extract features [9]. A given precise differential zone can be used to precisely measure the frequency of pixels using GLCMs. Here the single pixel value is considered to ask, and another pixel value is considered as l and the neighbourhood detachment of m in the \(\vartheta\) direction. Regularly m acquires a single value, and \(\vartheta\) can gain the directional benefit. Then the directional value obtained can remove the attributes of the images which will be used for the process of the segmentation.

According to the following Eq. (1), the GLCM procedure is:

$$R\left(k,l\right)=\frac{G\left(k.l.m,\vartheta \right)}{{\sum }_{k=1}^{H}{\sum }_{l=1}^{H}G\left(k,l,m,\vartheta \right)}$$
(1)

where \(G(k,l,m,\vartheta )\) is the frequency of the specific component having the pixel values of l and m, and r (kl) was the component of the k and l.

Algorithm

Here consider \(A=\left\{{a}_{1}, {a}_{2},\dots {a}_{m}\right\}\) are the input of the attributes given.

Step 1 Choose the inputs randomly. Hence, the number of the clusters here is considered four means, \(A=\left\{{a}_{1}, {a}_{2}, {a}_{3},{a}_{4}\right\}\).

Step 2 Finding any irregularities.

Step 3 Calculate the fuzzy grouped area organisation.

$${\lambda }_{\mathrm{FC}}=1/\sum_{L=1}^{N=1}{\left(\frac{{r}_{\mathrm{fc}}}{{r}_{\mathrm{fl}}}\right)}^{2}/y-1$$

where R by calculating the similarities between any two pixels, \({r}_{\mathrm{fc}}\) represents the set of pixels belonging to the ith region found by the proposed algorithm and \({r}_{\mathrm{fl}}\) represents the set of pixels belonging to the ith region in the segmented ground truth image.

Step 4 Segmentation process begins

6 Dice coefficient

Dice coefficient:

It is usual to use the dice coefficient as a statistic to assess how well split outcomes were produced. To determine how similar two intervals are to one another, it is mostly used to compute the dice distance between them. Dice coefficient shouldn't be greater than 1. A dice coefficient usually ranges from 0 to 1. The issue that the foreground ratio is too low is addressed by the use of dice loss. This metric, which assesses the degree of overlap between two samples, has values ranging from 0 to 1 (where 1 denotes complete overlap), and is described as follows:

$$\mathrm{Dice}=\frac{2\mathrm{TP}}{\mathrm{FN}+2\mathrm{TP}+\mathrm{FP}}$$
(2)

Intersection over Union (IoU):

Intersection over Union (IoU) is the task of generating the prediction range. We must display the range of discovered objects in the training set photos and assess the connection between ground truth and predictions in order to detect objects of varied sizes and shapes using IoU.

$$\mathrm{IoU}=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FP}+\mathrm{FN}}$$
(3)

7 Fuzzy-based capsule neural network (F-CapsNet) classification

7.1 Proposed system

For map-linear classification tasks where the hyperplane must be adjusted to the training dataset, perceptrons are frequently utilised. It can classify brand-new, unknown samples using this updated hyperplane. This is accomplished by applying the training dataset to minimise the error function and the error in the hyperplane. \(\in \left(w\right)=-\sum_{i\in M}{t}_{i}{w}^{T}{x}_{i}\), M is the collection of incorrectly categorised samples, and \(\in \{-\mathrm{1,1}\}\) is the sample's class. The classes are entirely separated by the hyperplane if \(\in\) (w) = 0. In the following each iteration, this minimization procedure is typically repeated in order to reach the minimum value \(\in\) (w). In order to update the weights, get the w vector that repeats k + 1. \({w}_{k+1}={w}_{k}+\Delta w\) (update weights). The learning rule that was used to determine how valuable it was to update the weights each increment is shown in Eq. (4).

$$\Delta {w}_{i}=\eta \left({\mathrm{true}}_{j}-{\mathrm{pred}}_{\mathrm{j}}\right){x}_{i}^{j}$$
(4)

If \({\mathrm{true}}_{\mathrm{j}}\) denotes the class label that was actually assigned, predj denotes the anticipated class label, and \(\eta\) denotes the learning rate. Initialising the weights to small random integers [or 0] is the first step in the perceptron learning process. The weights are modified until the smallest error is attained after calculating the output values for each training input sample (i.e. backpropagation).

The network has links to the layer above it at each level going forward. For map learning, networks with feedforward neural networks are frequently utilised. It is possible to learn complex tasks because to the architecture of multilayer perceptrons, which pulls more significant characteristics from input patterns. In order to improve model predictions (i.e. minimise network error), gradient descent can be used to locate local minima of a function. The Eq. (5) is shown below,

$${w}_{\mathrm{new}}={w}_{\mathrm{old}}-\eta \times \frac{\mathrm{d}}{\mathrm{d}w}F(w)$$
(5)

where the gradient is represented by d/dw F(w) is the derivative of the objective function F(w), and w is the weight and \(\eta\) denotes the learning rate.

The fuzzy set A  U is defined as a collection of ordered pairs (xiA (xi)), where U = x1, x2, x3,…, Xn specifies the discourse universe. Wherever xi = U, A: A's membership function U → [0, 1] and A's degree of membership to x and μA ∈ (x) [0, 1] are both defined. These fuzzy sets are unable to model a wide range of uncertainty because of their clearly defined properties. The fuzzy set member functions of the second generation, on the other hand, are likewise ambiguous and may display various degrees of uncertainty. The type-II fuzzy set A′ is described by the type-II membership function x, y. The definition of x ∈ U, μ ∈ [0, 1] is in Eq. (6):

$${A}^{{\prime}}=\left\{\left(x,\mu \right),{\mu }_{{A}^{{\prime}}}\left(x,\mu \right)|\forall x\epsilon U,\mu \epsilon \left[\mathrm{0,1}\right]\right\}$$
(6)

where 0 ≤ μA'(x, μ) ≤ 1, before determining the degree, define the type-I fuzzy set using the Sort-II fuzzy set footprint of uncertainty (FOU), which is the separation between the child element and the parent element. It created via assignment; the child and parent components of each element. Simply setting it up allows for easy setup. A type-II fuzzy set has the following Eq. (7).

$${A}^{{\prime}}=\{\left(x,\mu \right)\cup \left(x\right), x,{\mu }_{L}\left(x\right))|{\mu }_{L}\left(x\right)\le \mu \left(x\right)\le \mu \cup \left(x\right), \mu \epsilon [\mathrm{0,1}]\}$$
(7)

where μL and μU denote the lower and higher degrees of membership, respectively, of the initial membership function (x), which is defined as in Eqs. (8) and (9):

$${\mu }_{L}\left(x\right)={\left[\mu \left(x\right)\right]}^{\alpha }$$
(8)
$${\mu }_{U}\left(x\right)={\left[\mu \left(x\right)\right]}^{1/\alpha }$$
(9)

where α ∈ (1, ∞). Considering that the image data 42 in this instance has no significance, ≫ 2 equals 2.

\(X={x}_{1},{x}_{2},\dots {x}_{i},\dots .{x}_{n}\) is made up of n pixels, and has the following properties: \(f\left({x}_{1},{y}_{1}\right),\dots \dots , f\left({x}_{i},{y}_{j}\right):i\left[1,\dots N\right], j[1,\dots N]\), where V = v1, dimension,….., Vc is the p-dimensional feature space, which has a set c centre.

An objective function J is minimised by X and divided into c clusters via the statistical approach known as "fuzzling." The Eq. (10) is shown below:

$$J=\sum_{j=1}^{n}\sum_{i=1}^{c}{\left({u}_{ij}\right)}^{m}{\left|\left|{x}_{j}-{v}_{i}\right|\right|}^{2}$$
(10)

where 1 ≤ m ≤ ∞ is the fuzzifier as set to 2, and vi is the ith centroid corresponding to cluster Ci, uij \(\in\) [0, 1] is the fuzzy membership of xj to cluster ci, and mm is the distance norm, such that in Eq. (11):

$${v}_{i}=\frac{1}{{n}_{i}}\sum_{j=1}^{n}{\left({u}_{ij}\right)}^{m}{x}_{j} \quad\mathrm{where}\quad {n}_{i}=\sum_{j=1}^{n}{\left({u}_{ij}\right)}^{m}$$
(11)

The method starts with a random selection of a sample of c items that represent the mean (central) of the c clusters. The affiliation value, or uij, is calculated using the relative distance (also known as the Euclidean distance), between the item xj and the centre. To compute cluster centres, locate all object members. Sixth, the procedure is finished if the centre of the current iteration and the previous iteration is identical.

The multilayer perceptron that this work proposes incorporates members of each input sample from the class of interest (normal and abnormal) throughout the learning process. By minimising the impact of confusing features, member values are also beneficial for sloping descent during weight updating (learning) (i.e. features with 0.5 members).

7.2 Capsule network classifier

Networks for capsules put forth the capsule idea. Vectors made out of clusters of neurons are called capsules. Each neuron represents a parameter that has been instantiated in the capsule. The amount of neurons determines the capsule's size. The probability that a specific object will appear is represented by the capsule's length.

In order to create hierarchical representations, CapsNet transmits images to layers, just as CNN. The primary and secondary capsule layers, however, are the only layers in the original form of CapsNet, as opposed to the many levels in deep CNNs. Low-level features can be found in the base hierarchy. The ability to forecast the existence and pose information of objects in the image is possessed by the second layer. An introduction: a capsule is a collection of neurons whose results are interpreted as several characteristics of a single object. Both the posture matrix and the activation probability are present in each capsule. An ordinary neural network would perform the same tasks as this one. The likelihood that the entity the capsule represents is present in the current input can be calculated from the length of the output vector of the capsule. Layers of capsules are possible. In our architecture, the standard capsule (appearance change, compressed output of the last convolutional layer) layer and the CancerCaps layer were used (i.e. capsules representing 4 types of images). The default wrapping layer can be followed by a large number of convolutional layers. Max-pooling layer, however, does not exist. Instead, in order to lessen the dimensionality, I employ convolutions with a stride bigger than 1. (if the stride is 2, the dimensionality is reduced by a factor of 2). To identify the input image's class, one uses the CancerCaps output.

Multilayer networking is what CapsNet is. Convolutions or elementary capsules are used to refer to the lowest layer capsules. A fragment of an image is used as its input (you can think of it as a receptive field). The subsequent detection process finds particular patterns (circles, triangles, ellipses, etc.). Convolutional layers are used in place of these capsules. These layers contain vectors that serve as storage for feature information. The instantiation parameters of an entity are, in other words, represented by a neuron's activation (pose, size, position, orientation, etc.). A routing capsule is the upper capsule, which uses dynamic routing to find larger and more complicated items.

The convolutional layer extracts the essential elements of the image using a conventional convolutional network with a ReLU activation function. After applying an initial preprocessing step, the input picture given to the input layer is shrunk to 28 × 28. The xi value of every pixel is 1. The input layer is 28*28*1 pixels in size. Convolutional and encapsulation layers are the next layers in CapsNet, used mostly for entity detection needed for classification. Typically, a completely connected layer with softmax activation makes up the last layer in the suggested manner. The likelihood that the softmax activation function returned for each input is now categorised into one of the two classes that are mutually exclusive. Dermoscopy analysis and inspection can be improved by automatically classifying skin cancer from photographs of the target lesions, as skin lesions may have small changes in appearance as shown in  Figs. 7 and 8.

Fig. 7
figure 7

Segmentation results. a Input image, b Masked image, c Segmented image, and d Hair removed Image

Fig. 8
figure 8

Type-II fuzzy set

In Fig. 9, a simple capsule network with two convolutional layers and one capsule layer is depicted. Low-level features from the initial input image are extracted by the first convolutional layer. Total channel count is 256. 28*28*1. ISIC images are processed by each channel using a 9*9 convolution kernel. Convolutional capsules, which have a total of 6*6*32 capsules, make up the second layer's design. A vector of 8 dimensions is produced by each capsule. The primary capsule is the 32-capsule collection. There are eight convolutional units for each basic group of capsules.

Fig. 9
figure 9

A basic CapsNet architecture

The third layer of the digital capsule layer is a fully connected layer with a total of 10 capsules; each capsule is a 16-dimensional vector that accepts input from the layer below and executes classification operations. The probability of the object being present, or the likelihood that the classification would produce a result, is calculated in the final layer by measuring the length of each capsule.

A reconstruction loss is applied at the network's conclusion to motivate the digit capsule to encode the instantiated input digits' parameters. Only the successfully predicted digit capsules are utilised to recreate the input image, with all other vectors being set to zero if they failed to predict the correct digit capsules during training. In order to reduce the sum of squared deviations between the pixels in the reconstructed image and their counterparts in the original image, the output of the digital capsule is given to a decoder with three fully linked layers.

7.3 Fuzzy-based CapsNet (F-CapsNet)

The sigmoid function is suitable for binary classification and provides continuous values in the range [0, 1] that represent the probability of a class in the binary classification problem. As the sigmoid function introduces nonlinearity in the hidden layers, it allows the neural network to learn more complex features.

$$\mathrm{sig}\left(x\right)=\frac{1}{1+{\mathrm{e}}^{-x}}$$
(12)

Assuming that ϕ is the fuzzy sigmoid activation function, the type-II fuzzy sigmoid activation function can be represented in Eq. (13 and 14).

$${\varphi }_{L}\left(x\right)={\left[\frac{1}{1+{\mathrm{e}}^{-x}}\right]}^{\alpha }$$
(13)
$${\varphi }_{U}\left(x\right)={\left[\frac{1}{1+{\mathrm{e}}^{-x}}\right]}^{1/\alpha }$$
(14)

where \({\varphi }_{L}\) and \({\varphi }_{U}\) are the lower and upper sigmoid activation functions, respectively.

The proposed fuzzy gradient descent is defined as Eq. (15),

$$w=w-\mathrm{mean}({\left|{u}_{1}-{u}_{2}\right|}^{2})\times \eta \times \frac{\mathrm{d}}{\mathrm{d}w}\varphi$$
(15)

where w denotes the weight and u1 and u2 denote the class1 and class2 memberships of the neurons, respectively. It employs the squared difference of memberships as a fuzzy parameter, which is represented by a single number in the mean, and φ is a type-II fuzzy sigmoid function. Because |u1 − u2|2 evaluates to 0 for fuzzy nodes, this has no effect on the weighted update. By using member evaluations that are a part of the optimization, one can access the learning process' contributions in the input samples. Only the variations between the actual and anticipated values are used to represent actions. Incorporating degrees of membership in optimization will determine how input samples contribute to the learning process based on their ambiguity, such that more ambiguous features will have less effect on learning, and will rather be based on more non-ambiguous features The cost function used in our work is simply represented as the difference between the actual values and the predicted value.

After applying a linear mapping to the feature vector to calculate the feature score, the CNN then uses that information to calculate the loss. Losses should be kept to a minimum for accuracy. The classification of lesion segments based on intensity and score vectors performs better with improved loss functions of the existing CNN algorithms.

In order to analyse images, CNNs are extremely powerful since they can recognise patterns in images. It takes labelled data to learn a CNN. Consequently, this is known as teaching learning. A concealed layer and a fully connected layer are the two parts of CNN. When classifying, employ fully connected layers rather than concealed layers to extract characteristics. CNNs are not entirely connected like other neural networks. Because of this, the model becomes simpler to train and less complex.

Capsule layers are the names given to a CNN's hidden layers. There are other layers in addition to capsules. As with any hidden layer, the capsule layer modifies the input and sends the result to the following layer. Each capsule layer's number of pattern-detection filters must be specified. Many edges, shapes, textures, and objects in an image can all be patterns that are picked up by filters. Edge detection is the process through which a filter recognises an edge. Filters can recognise rectangles, circles, edges, and other shapes. Filter complexity increases with network depth. Specific things can be found through more advanced filtering. A tiny matrix with random integer values serves as the representation for the filter. When the capsule layer receives an input the filter slides over each pair of pixels with the filter's pixel size until it covers the entire image's pixel blocks with the filter's pixel size. The term "encapsulations" refers to such slides. This layer produces a new representation of the complete filter matrix along with the initial input inside the input after enclosing the entire image. This will be input to the subsequent layer. The network's deeper layers have more sophisticated filters.

8 Experimental results

8.1 Training and testing PH2, ISBI 2017 and ISIC 2019 dataset

In this part, the performance metrics for the melanoma detector are presented together with the experimental findings. The classifier was tested using 20,250 photos of lesions that were either melanoma or non-melanoma. The control dataset serves as the classifier's training ground, and the PH2, ISBI 2017, and ISIC 2019 datasets serve as its testing grounds. Approximately 2530 photos are taken up by the test data for melanoma and non-melanoma images. The three datasets used for training, validation, and testing illustrate the proposed feature distributions in the table for certain malicious and non-malicious images. For 24-bit RGB skins, the resolution spans from 540 × 722 to 4499 × 6748 as shown in Fig. 10.

Fig. 10
figure 10

The proposed F-CapsNet model for classification

Table 1 lists 80 atypical moles, 80 normal moles, and 40 melanoma cases from the dataset PH2, which contains 200 pictures in total. 2750 images total 2000 instructional, 2000 test, 600 instructional, and 150 instructional make up ISBI 2017. In Table 2, the verification values are displayed. The ISIC 2019 dataset initially comprises of 25,331 photos, which are broadly split into 4522 melanoma and 20,809 non-melanoma image categories, as shown in Table 3. The 12,778 non-melanoma photos and the 4522 melanoma images were divided into three parts: training, testing, and validation. Of each portion, 10% were used for testing. Select black and non-black photos from the three datasets used for training, validation, and testing are plotted in Table 4 using the suggested working distributions.

Table 1 PH2 dataset distribution for validation
Table 2 ISBI 2017 dataset distribution for validation
Table 3 ISIC 2019 dataset distribution for validation
Table 4 PH2, ISBI 2017, and ISIC 2019 datasets distribution for validation

8.2 System requirements

A computer with an i7 processor, the simulation programme MATLAB 2018a, 32 GB of RAM, and a 4 GB GPU is used for all work and calculations.

The improved technique is the automated image in the direction of the hair. The proposed method can be divided into two stages. In the first step, we analysed the skin lesion data set and implemented a method to classify skin width images into high and low contrast images based on automated intelligent histograms. In the second stage, only low contrast recognition images were preprocessed. Next, to improve the low contrast of the lesion area by using Laplacian filtering (FlLpF) along HSV colour transformation and contrast stretching and log transformation as shown in Fig. 11.

Fig. 11
figure 11

Results of skin lesion automated image preprocessing: first row: dermoscopic images. second row: contrast stretched output image

After examining the lesion location, we analysed the separation technology based on accuracy, JAC and DIC indicators into two data sets in Table 5 and Fig. 12.

Table 5 Results of modified AD-GLCM segmentation metrics (%) on PH2 dataset
Fig. 12
figure 12

The results of the proposed segmentation with the basic truth image: a Original image, b Separate view, c The segmented image drawn in the original image, and d An image of the basic truth

It has been found that utilising the PH-2 data set for simulations led to the highest average classification accuracy, which was 98.45%. With the JAC and DIC data sets, the accuracy was 90.15% and 94.77%, respectively.

The average accuracy, JAC, and DIC are 98.79%, 92.57%, and 95.144%, respectively, for simulations using the ISIB2017 dataset in Table 6. The PH-2 dataset and ISIC-2019 dataset are both surpassed by these collections.

Table 6 Results of modified AD-GLCM segmentation metrics (%) on ISIB2017 dataset

To assess the results, we contrast them with deep learning systems already in use and tested on the ISIC-2019 database are in Table 6. Rules JI (0.716) and DC (0.796) were both reached using AD-GLCM segmentation. Compared to the framework below, our suggested approach is more in depth, but we also get better results by including balance data and lowering additional variance.

8.3 Classification result

The proposed F-CapsNet classifier and other classification techniques, including tree, SVM, KNN, and others, are compared with the proposed recognition results. The accuracy, sensitivity, specificity, and AUC were compared, along with the programme performance, using the execution time (msec) as a benchmark. In the table, patent photos from the datasets PH2, ISBI 2017, and ISIC 2019 are compared to the classifier mentioned in Table 7.

Table 7 Results of modified AD-GLCM segmentation metrics (%) on ISIC 2019 dataset

Prior to applying the F-CapsNet classifier, all 512 × 512 images across three distinct datasets are classified. The settings for which F-CapsNet has been trained are: group size = 64, subsection = 16, increment = 0.6, distribution = 0.0008, and learning rate = 0.002. The number of epochs for simulation was 50 epochs.

Tables 8, 9 and 10 offer an in-depth evaluation of the state of the art using the PH2, ISBI-2017, and ISIC 2019 datasets. For all presented datasets, the suggested method produces the highest level of classification accuracy. Using colour and texture features, the PH2 dataset could only be classified with a maximum accuracy of 97.51%; whereas, the suggested method can achieve a classification accuracy of 98.42%. This shows that proposed F-CapsNet classifiers model results are much better than existing algorithm and the outcomes conclude the robustness and higher efficiency of F-CapsNet classification method.

Table 8 State of art comparison of the F-CapsNet classifiers on PH2 dataset
Table 9 State of art comparison of the F-CapsNet classifiers on ISIB 2017 dataset
Table 10 State of art comparison of the F-CapsNet classifiers on ISIC 2019 dataset

The tabular data demonstrates how the aforementioned studies have enhanced the practises and produced more accurate lesion classification outcomes. The findings of the suggested method demonstrate that electronic performance beats all conventional deep learning techniques when compared to these contemporary classification methods. With an accuracy score of 98.42% and a specificity score of 97.79% on the PH2 dataset, the method outperformed the leading contribution. Jac and Dice fared substantially better than the rest with scores of 90.15% and 94.77%, but only Xie's stimulating work managed to obtain an accuracy percentage of 96.56%. The proposed task's average computation time is 4.68 ms.

As demonstrated in the tabular data, the suggested method beats all conventional deep learning methods when results are compared to those of current contemporary classification methods. A 99.16% accuracy score and a 97.58% sensitivity score were used to evaluate the method's performance on the ISIB 2017 dataset, which showed that it performed better than the best contribution. The proposed work takes 9.81 ms on average to compute.

Also, compared to other methods, the maximum accuracy achieved by the proposed F-CapsNet method in the ISBI-2017 data set is 99.16% and the accuracy achieved with ISIC 2019 is 99.45% and the accuracy of PH2 dataset is 99.42%. The findings show that the technique not only successfully distinguishes benign moles from malignant melanoma on autopilot, but also consistently outperforms other cutting-edge standards. The comparison demonstrates unequivocally that the suggested classification approach beats all already used techniques for other classifiers. In comparison to previous effective classifiers, this one not only produces outstanding results for all parameters, but also reduces the amount of time needed to find melanoma using the suggested strategy. Skin lesion identification takes less time and is more effective when F-CapsNet is used as the classifier. The accuracy of the suggested method is increased by using a pretreatment model for picture refining following automatic hair removal and an appropriate segmentation technique.

8.4 Discussion

It is suggested in this paper to use F-CapsNet to automate the detection of skin lesions. In order to segment lesions, we apply adaptive fuzzy GLCM and fuzzy logic for border detection and segmentation. At several points during the lesion categorization process, we compare the upgraded F-CapsNet with the best available technology. It turns out that the suggested model is superior to other models in terms of classification accuracy and speed, with fewer false positives and false negatives.

Choosing discriminative features has a significant impact on the accuracy of skin lesion identification and classification [27]. There isn't much information on this subject in the literature yet, and lesion boundary detection uncertainties aren't covered in detail. For instance, the authors classified, segmented, and refined demographic photos with an accuracy of 91.82% using a fixed wavelet grid network and orthogonal matching. We achieve 93.83% accuracy in colour texture extraction from dynamic photos by combining SVM, SMOTE, and an ensemble classifier. The GLCM (grey level co-occurrence matrix) approach can also be used to extract colour, texture, and SVM features.

By using threshold-based segmentation, ABCD feature extraction, and multiscale lesion deflection approaches, numerous research have increased the accuracy of skin malignancy prediction. The skin lesion categorization problem is resolved using a multitrack CNN model. For classes 5 and 10, the model had accuracy rates of 85.8% and 79.15%, respectively [28]. On the other hand, ensemble-based deep learning has enhanced performance in the classification of skin lesions with an accuracy of about 90%. The fact that a single model was used in all of the aforementioned investigations may have impacted the models' accuracy. Stacking many models helps to increase accuracy.

We used two simultaneous methods in our investigation, based on Delaunay triangulation, to find skin lesions. Melanoma is detected and classified using a backpropagation multilayer neural network employing three-dimensional colour-textured properties of dermoscopic pictures. For the ImageNet dataset, the transition learning approach utilising CNN models produced an accuracy of 88.33% thanks to pretrained models like Resnet-101, BASNet big, and Google Net. The drawback of all these approaches is that accurate medical diagnosis necessitates long-term real-time analysis. These restrictions are removed by our method of lesion boundary detection via blurred image processing. With no data pretreatment or human feature selection, they deployed the Resnet50 model, which drastically decreased model accuracy and lengthened processing times. Using the ISBI-2017 dataset, 99.45% accuracy was attained, 99.42% accuracy was attained on the PH2 dataset, and 99.16% accuracy was attained on the ISIC 2019 dataset using the method described by F-CapsNet. To enhance classification performance, we reduce overfitting on the SVM classifier training dataset and use the same dataset for both new and old models. Lesion categorization was enhanced, processing time was cut by 20–30 ms, accuracy rose by 2–3%, and accuracy was improved.

9 Conclusion

The effectiveness of a melanoma detection system is examined in this research in relation to the effect of noise artefacts. The three-step automatic picture preprocessing method, adaptive blur-GLCM partitioning, and classification using a blur-based capsule network are the guiding concepts of the suggested model. We provide an automatic image preprocessing approach that can successfully eliminate impediments including vignetting effects, hair, and writing traces in damaged photos. In this study, we examine the performance of adaptive fuzzy GLCM in detecting lung cancer using skin tilt, and we evaluate its effectiveness and efficiency. The experimental findings demonstrate how highly efficient the hypothesised mechanism is. Extensive simulation study was carried out, and the outcomes were assessed at several times in order to ensure improved results for the F-CapsNet approach. For the classification of medical images, this research suggests an enhanced capsule network. The F-CapsNet technique's effectiveness is assessed using the ISIC 2017 Challenge, 2019 Challenge, and PH2 datasets. The suggested technique has an average accuracy of 99.16 per cent for the ISBI 2017 test dataset and a 99.45 per cent accuracy for the ISBI 2019 test dataset. Additionally, the PH2 test dataset shows that the suggested approach has an average accuracy of 98.42%.