Introduction

Cerebrovascular disease (CVD) is one of the leading causes of death and permanent damage in the world [1]. CVD due to Carotid Artery (CA) disease is the most common cause of death in developed countries following heart disease and cancer [2]. According to North American Symptomatic Carotid Endarterectomy Trial data, the risk of ischemic CVD in the first year is 11% in CA stenosis of 70–79%, and this risk reaches 35% when there is 90% or more stenosis [3, 4]. Approximately 75% of all patients with CA stenosis are asymptomatic. Several studies have shown that patients with internal CA stenosis benefit from endarterectomy compared to medical treatment with 50–99%, 60–99%, and 70–99%, depending on whether they are symptomatic [5,6,7]. Internal CA stenosis is one of the most important causes of ischemic cerebral palsy [8]. 80% of the ischemic processes occur due to atherosclerosis.

CVD is the primary pathological damage to one or more blood vessels of a brain region that are permanently or temporarily affected by ischemia or bleeding [9]. These diseases are caused by the obstruction or bleeding of the vessels feeding the brain and give symptoms related to the damaged brain area. CA passes through the two sides of the neck and takes clean blood to the brain. The first branch separated from the main veins is divided into two vessels. One to the right arm and the other is the right carotid vein, which carries blood to the right side of the brain. The left CA leading to the left brain is separated from the main vessels as a single vessel as shown in Fig. 1a [10].

Fig. 1
figure 1

a Carotid artery interior structures b Ischemic stroke and hemorrhagic stroke [10, 11]

CA disease is the atherosclerotic disease of our jugular veins, which occur on both sides of our neck, which may result in stroke. A sudden occlusion of these vessels results in stroke [12]. CA disease is a form of CVD, which results most importantly from atherosclerosis. Oil particles, cholesterol, calcium and some other substances and cells accumulate in the artery wall causing set to be called plaque. These plaques increase their volume over time or cause clotting on the irregular surface and cause complete occlusion of the vessel. Fully occluded jugular veins or small clots that are detached from the atherosclerotic plaque may obstruct the small veins in the brain and can cause stroke [12].

If measures are not taken and not intervened early, stroke becomes the first cause of disability [13]. Every year approximately 16 million people in the world are undergoing a stroke [9]. The number of deaths because of CVD is 64,780 in Turkey at the beginning of the 2000s. First five diseases distribution causing death in Turkey is shown in Table 1.

Table 1 First five diseases causing death number and percentage distribution (Turkey, 2004) [14]

80% of all strokes are occlusive strokes and 20% are brain hemorrhages [15]. Occlusive stroke and brain hemorrhage shapes [11] are shown in Fig. 1b.

Ultrasonography (USG) is the most sensitive and reliable method for the morphological evaluation of CA. In addition, colored and spectral Doppler USG can show flow changes caused by vascular lesions in real time [1]. CA USG is a common imaging study for the diagnosis of CA disease [16]. Doppler USG is a method of examining vascular structures with sound waves, which provides hemodynamic information about carotid and vertebral arteries. Although its sensitivity is 92.6% and specificity is 97%, angiography is accepted as the gold standard [17]. Doppler equation is defined as:

$$ {f}_d={f}_t-{f}_r=2{f}_tv\kern0.15em \cos \kern0.15em \theta /c $$
(1)

where ft and fr are the transmitted and received ultrasound frequencies, v is the speed of the target, c is the velocity of the sound in the environment, and θ is the angle between the ultrasound beam and the direction of movement of the target [18].

Various methods of early diagnosis and treatment of CA disease have been tried in different studies. These studies have mostly been used to perform segmentation operations on different numbers of patient images using different machine learning algorithms [19,20,21,22,23,24,25,26,27]. Segmentation process is a problem of image analysis and it is the process of preparing the image to display and diagnostic stages of image processing [28]. In addition to these studies, Intima Media Thickness (IMT) measurements were also carried out to predict automatic artery recognition and lumen diameter changes in some studies. [29,30,31]. Moreover, a cloud computing based platform creation study was conducted for risk assessment and stroke follow-up of CA patients [32]. Risk assessment is the most important stage of the disease. For this stage, classification studies were carried out using various methods such as Neural Networks (NN) [33,34,35], Support Vector Machine (SVM) [36, 37], and Enhanced Activity Index [38].

Different studies have begun in the field of medicine by using deep learning (DL) algorithms. Automatic abdominal multiple organ segmentation [39], classification and division of microscope images [40], segmentation of biomedical images [41], segmentation of brain tumor [42], identification of metastatic breast cancer [43], mitosis detection in breast cancer histology images [44], detection of diabetic retinopathy in retinal fundus photographs [45], diagnosing melanoma skin lesions [46] false positive reduction in the detection of pulmonary nodules [47] and automated detection/diagnosis of seizure using encephalogram signals [48] are some of these studies. Although it is known that the performance of deep architectures in the mentioned areas is impressive, the use of DL in the medical field is not yet sufficient. DL is widely used in areas such as computer identification, image processing, natural language processing and speech recognition. Such new approaches are necessary in automated medical decision-making systems because non-automated processes are more expensive, demand intensive labor, and therefore subject to human-induced errors.

The study of IMT classification of the CA has not been done with DL methods yet. DL studies have been carried out so far only at the image segmentation level. In this study, a proposed model using Convolutional Neural Networks (CNN) from DL algorithms determined the increased IMT and arterial narrowing which is one of the causes of CA disease.

Deep learning

Deep Learning is a new and promising field for machine learning to solve artificial intelligence problems. It is a subspace of machine learning and a field of application of Deep Neural Networks (DNN). In this area, instead of the customized algorithms for each study, it is aimed that the solutions are based on learning data sets and also cover larger data sets. Artificial Neural Networks (ANN) were inspired by the human brain. They are information processing structures consisting of process elements which are connected to each other by changing weights and each of which has their own memory. ANNs, having superiorities such as learning, generalization, non-linearity, fault tolerance, harmony, parallelism, are used in many different application areas such as medical applications like image and signal processing, disease prediction, engineering, production, finance, optimization and classification [49]. In DNN, there are two or more secret neural network layers and more extensive relationships are established from simple to complex data. Each layer tries to establish a relationship between itself and the previous layer. Thus, a more detailed examination of the inputs is made and decisions that are more accurate are made. As shown in Fig. 2a ANN produces an output with applying weights and activation function to the given inputs. Figure 2b shows a DNN structure with three secret layers.

Fig. 2
figure 2

a ANN structure b DNN structure

Different activation functions can be used when forming the structure of DNN. These functions may vary according to the type, structure, size and model of the data. The activation function determines the output that the cell will produce in response to the input itself. A non-linear function is usually selected. The main activation functions used are given as follows:

$$ \mathrm{Sigmoid}\ \mathrm{activation}\ \mathrm{function}:f(x)=\frac{1}{1+{e}^{-x}} $$
(2)
$$ \mathrm{TanH}\ \mathrm{activation}\ \mathrm{function}:\tanh (x)=\frac{2}{1+{e}^{-2x}}-1 $$
(3)
$$ \mathrm{ReLU}\ \mathrm{activation}\ \mathrm{function}:f(x)=\left\{\begin{array}{c}0\ for\ x<0\\ {}x\ for\ x\ge 0\end{array}\right. $$
(4)

The related curves of Sigmoid, TanH and ReLU are shown in Figs. 3a, b. and c respectively.

Fig. 3
figure 3

a Sigmoid activation function b TanH activation function c ReLU activation function

The most well-known of DL algorithms is CNN and often used in image classification problems. Kernels mostly used with dimensions of 3 × 3, 5 × 5, 7 × 7 on each layer in CNN as shown in Fig. 4a. Then the pooling process is done on the outputs of these kernels, which is shown in Fig. 4b. The data in the kernel is filtered by pooling. The max pooling method is the most commonly used pooling method, and the largest value is taken in the matrix with this method.

Fig. 4
figure 4

a Convolutional layer (3 × 3) b Pooling (2 × 2)

The convolution of two functions (fg) in the finite range [0, t] is defined as follows:

$$ \left[f\ast g\right](t)\equiv {\int}_0^tf\left(\tau \right)g\left(t-\tau \right) d\tau $$
(5)

where [f ∗ g](t) means the convolution of the functions f and g [50]. Alternatively, convolution is calculated in an infinite range mostly as:

$$ \left[f\ast g\right]\equiv {\int}_{-\infty}^{\infty }f\left(\tau \right)g\left(t-\tau \right) d\tau $$
(6)
$$ ={\int}_{-\infty}^{\infty }g\left(\tau \right)f\left(t-\tau \right) d\tau $$
(7)

Proposed strategy

A DL model has been created for the classification of images. In order to improve the performance of the model, hyper-parameters in the DL model were optimized with repeated analysis/test studies. The following steps were followed when creating a DL model.

In the first stage, which is called definition, the model parameters required for the DL model, such as numpy, os, matplotlib and sklearn, were included in the program. The image resolution is fixed at 128 × 128 and the image channel is determined as one since the grayscale image format will be used. The path, from which the images were taken, was identified and a second path was defined, in which the new images were saved after image preprocessing. Parameters such as batch size, number of classes, number of epochs, number of filters, pool size and convolution filter size, to be used in the model are defined in this part.

At the image pre-processing stage, the images taken from the image folder were resized to the 128 × 128 resolution, then converted to the grayscale image format and saved in the folder in the second path. The images were saved sequentially with the names they were previously classified. Then all images were flattened as a float and an array of images was created which was stored in a matrix. These stored images were first labeled “1” up to 203rd image and the next images were labeled “0”. After the labeling process was completed, the images in the memory were mixed randomly and the sequential image list was changed to prevent the model from memorizing the data and to increase accuracy.

The image sequence obtained from the last step of the image pre-process stage was divided into two as train/test. While 80% of the images were used for training, the remaining 20% of the images were used for the test. During the definition of the training images, random selection was performed by preserving the proportions of “1” and “0” tagged images in the total image. There are 400 IMT US images in train dataset. 162 of the 400 IMT US images in the train dataset consist of images labeled as “1,” while the remaining 238 images are labeled as “0”. There are 101 IMT US images in test dataset, which is also selected randomly. 41 of the 101 IMT US images in the test dataset consist of images labeled as “1,” while the remaining 60 images are labeled as “0”. By this way, it is aimed to provide the learning sensitivity of the model on the images in the training set and to increase the accuracy of the model. In addition, the possibility of memorizing the data or over-fitting the data is prevented.

The model in the study was formed by determining the optimum parameters after repeated tests. CNN, which is the most known and frequently used in image classification problems, is used in the model. The model is designed sequentially. First, 256 filters were added to the model and the images were passed through a “3 × 3” convolution filter. There are eight convolution layers in total in the model. After each convolution layer, the “ReLU” activation function was added and the activation result was determined as the input of the new layer. After every two convolution layers, a maxpooling layer was added to prevent over-fitting and a pooling operation of “2 × 2” dimensions was applied. After the pooling layer, a drop out layer with “0.5” rate was added to the model. After these processes were completed, the model was flattened with the fully connected layers through 256, 128, and 2 outputs respectively. ReLU activation and drop out layers were added between each fully connected layer. The softmax activation function was used in the last layer of the model. When compiling the model, the loss parameter was selected as binary cross-entropy which is computed as follows:

$$ L=-\frac{1}{n}{\sum}_{i=1}^n\left[{y}^{(i)}\log \left({\hat{y}}^{(i)}\right)+\left(1-{y}^{(i)}\right)\log \left(1-{\hat{y}}^{(i)}\right)\right] $$
(8)

where n is number of samples and y is output of the related neuron [51]. The optimizer parameter of the model was selected RMSprop (learning rate, lr = 0.00001) and the metric is defined as accuracy. The summarize of the model is shown in Table 2.

Table 2 Summarized model parameters

Results and Discussion

In the study, to test the proposed model, from June 2018 to January 2019, 501 images of 153 patients were obtained from the patients who were treated at the Radiology Department of Ankara Training and Research Hospital. The images were taken with The Ethics Approval Certificate of Gazi University Ethics Commission dated 08/05/2018 and numbered 2018–217. Toshiba Aplio 400 Ultrasound device was used for ultrasound imaging. The images are classified as “IMT: 1” and “IMT: 0” by two doctors who are experts in the Department of Radiology at Ankara Training and Research Hospital. The summary and features of database are shown in Table 3.

Table 3 Image database

In order to use the DL model that was created in the study on images, a system with the following features: i5–3.50GHz, 16 GB RAM and Nvidia graphical card (GP102, TITAN Xp) was used. The DL model was created using the Keras DL library with Tensorflow in the Python programming language on the Ubuntu operating system. Studies have been carried out using The Scientific Python Development Environment (Spyder) interface.

After operation of the proposed model, graphics and weights were recorded and the accuracy and loss parameters of the model were visualized. In order to evaluate the performance of the model, Receiver Operating Characteristic (ROC) Curve and Confusion Matrix were also created. Weights were recorded after training of the model for use in estimation model.

The accuracy of the model is 89.1% while loss of the model is 0.292. Loss function is an important indicator for CNN model, because it is used to measure the inconsistency between predicted value and actual label. It is a non-negative value, where the robustness of the model increases along with the decrease of the value of loss function [51]. The accuracy and loss graph of the model are shown in Fig. 5a and b respectively.

Fig. 5
figure 5

a Accuracy graph of the model b Loss graph of the model

Our model starts to learn from the training data after 25 epochs. The accuracy of the model increases and the loss parameter decreases after this epoch. The classification results can also be represented in the so-called confusion matrix, also known as contingency table [52,53,54]. It is a square matrix (G x G), whose rows and columns represent experimental and predicted classes, respectively. The confusion matrix contains all the information related to the distribution of samples within the classes and to the classification performance [55]. Calculations such as accuracy, error rate (misclassification rate), true positive rate (also known as sensitivity or recall), false positive rate, true negative rate (also known as specificity), precision, prevalence and f1 score can be done with confusion matrix.

The confusion matrix shows where the classification model is confused when it makes predictions. It gives insight not only into the errors being made by the classifier model but more importantly the types of errors that are being made. There are two possible predicted classes: “positive” and “negative”. For this carotid classifier, “positive” would mean IMT and “negative” would mean no IMT. In this CA IMT case, confusion matrix results are shown in Fig. 6.

Fig. 6
figure 6

Confusion matrix of the model

Depending on the results shown in Fig. 6, out of those 101 cases, the classifier predicted “positive” 36 times, and “negative” 65 times. In reality, 41 patients in the sample have the IMT, and 60 patients do not have the IMT. In confusion matrix, TP means observation is positive, and is predicted to be positive. FN means observation is positive, but is predicted negative. TN means observation is negative, and is predicted to be negative. FP means observation is negative, but is predicted positive.

Accuracy calculation gives the results of overall, how often is the classifier correct. For this carotid classifier (cc), accuracy is calculated as follows:

$$ {\displaystyle \begin{array}{l} Accuracy\ (Acc)=\frac{TP+ TN}{TP+ TN+ FP+ FN}\\ {}{Acc}_{cc}=\frac{90}{101}=\mathrm{0,891}\end{array}} $$
(9)

Error rate in general, is a measure of how often the classifier has incorrectly predicted. In addition, it is known as misclassification rate. Error rate is equivalent to one minus accuracy, and is also calculated as follows:

$$ {\displaystyle \begin{array}{l} Error\ Rate\ (ER)=\frac{FP+ FN}{TP+ TN+ FP+ FN}\\ {}{ER}_{cc}=\frac{11}{101}=\mathrm{0,108}\end{array}} $$
(10)

True positive rate indicates when it is actually positive and how often does it predict positive. It is also known as “Sensitivity” or “Recall”. Recall can be defined as the ratio of the total number of correctly classified positive examples divide to the total number of positive examples. High recall (small number of FN) indicates the class is correctly recognized. Recall is usually used when the goal is to limit the number of FN and is calculated as follows:

$$ {\displaystyle \begin{array}{l} Recall=\frac{TP}{TP+ FN}\\ {}{Recall}_{cc}=\frac{33}{41}=\mathrm{0,804}\end{array}} $$
(11)

False positive rate gives the results of negative actual value’s positive prediction. It is calculated as follows:

$$ {\displaystyle \begin{array}{l} False\ Positive\ Rate\ (FPR)=\frac{FP}{FP+ TN}\\ {}{FP R}_{cc}=\frac{3}{60}=0,05\end{array}} $$
(12)

True negative rate indicates when it is actually negative and how often does it predict negative. It is also known as “Specificity” and is equivalent to one minus FPR. Specificity is calculated as follows:

$$ {\displaystyle \begin{array}{l}\ True\ Negative\ Rate\ (TNR)=\frac{TN}{FP+ TN}\\ {}{ TN R}_{cc}=\frac{57}{60}=0,95\end{array}} $$
(13)

Precision is a measure of how accurately all classes are predicted. It is also known as positive predictive value. In order to get the value of precision, the total number of correctly classified positive examples is divided to the total number of predicted positive examples. High Precision indicates an example labeled as positive is indeed positive (small number of FP). Precision is calculated as follows:

$$ {\displaystyle \begin{array}{l} Precision=\frac{TP}{TP+ FP}\\ {}{Precision}_{cc}=\frac{33}{36}=\mathrm{0,916}\end{array}} $$
(14)

Prevalence is estimation of how often “positive” value is found at the end of the prediction. It is calculated as follows:

$$ {\displaystyle \begin{array}{l} Prevalence=\frac{TP+ FN}{TP+ TN+ FP+ FN}\\ {}{Prevalence}_{cc}=\frac{41}{101}=\mathrm{0,405}\end{array}} $$
(15)

F1-score is the harmonic mean of precision and recall. It is a measure of how well the classifier performs and is often used to compare classifiers. If it is only tried to optimize recall, the algorithm will predict most examples to belong to the positive class, but that will result in many false positives and, hence, low precision. On the other hand, if it is tried to optimize precision, the model will predict very few examples as positive results, but recall will be very low. Therefore, F1-score is useful when it is needed to take both precision and recall into account. F1-Score is calculated as follows:

$$ {\displaystyle \begin{array}{l}F1\ Score=\frac{2 TP}{2 TP+ FP+ FN}\\ {}F{1}_{cc}=\frac{66}{77}=\mathrm{0,857}\end{array}} $$
(16)

After calculation of all parameters of confusion matrix, overall average performance measurements of the model for both of the classes “IMT:0” and “IMT:1” are shown in the Table 4.

Table 4 Performance measures

It is seen in confusion matrix that the sensitivity of the model is 89% and specificity is 88%. There were 101 images to test the model. While testing after the training of the model, the number of images in both classes was determined by maintaining the ratio in the total image. The model is correctly predicted eight of nine test image classes as seen in the Table 5.

Table 5 Test and predict results

ROC analysis contributes to the process of clinical decision-making when the diagnosis process will take a long time, the cost will be high, special method-equipment and qualified human resources will be needed by determining appropriate cut-off values for indicators that will be determined in short-time, low-cost, and easily obtainable [56]. Sensitivity and specificity curves provide the comparison of the success of different tests in correct clinical diagnosis. In this CA IMT case, ROC curve is shown in Fig. 7.

Fig. 7
figure 7

ROC curve for model

The outputs of layers, while CNN model working on the IMT USG images, were also saved to show sample feature extraction. The original input image and feature extraction steps for 1st, 5th, 10th, 15th, and 20th layers activation outputs are shown respectively in Fig. 8.

Fig. 8
figure 8

Feature extraction a Input Image b 1st Layer c 5th Layer d 10th Layer e 15th Layer f 20th Layer

Various studies have been carried out on CA images in the literature as shown in Table 6. These studies can be examined in two different sections:

  • CA Intima Media Segmentation Studies

  • CA Intima Media Classification Studies

Table 6 Carotid artery image processing studies

According to Table 6, although the study of IMT classification of the CA has not been done with DL methods, DL studies have been carried out so far only at the image segmentation level. Classification studies were mostly carried out through machine learning methods with SVM and NN. These studies produced different results from each other. The classification studies carried out through NN were in the range of 71–73% when working with more than 200 images [33, 34], while this rate was 99.1% in a study with 54 images [35]. It is understood that by working with the NN, the accuracy rate decreases as the number of images increases. Instead, while working with DL methods, more input data means better accuracy because model learns features itself from input data. The CNN model proposed in this study achieved better accuracy rate of 89.1% from more image data. The use of SVM is slightly different from this situation. In different studies ranging image number from 270 to 350, performance rates of different ratios ranging from 73% to 83% were obtained with SVM [36, 37]. Although SVM methods produced better results than NN, the CNN method proposed in this study achieved better results than SVM methods. These results are an indication of a better achievement than previous studies when compared to the results given in the Table 6. Although the parameters such as number of patients, number of images, image quality etc. in all studies given in the Table 6 are different, it can be seen that our model is positive compared to the methods in other studies.

Conclusion

In this study, a new method for the classification of CA IMT on ultrasound images was proposed. The proposed method was to classify IMT for early diagnosis and treatment of CVD. The proposal is based on a model in the field of DL, a new subfield of machine learning.

The model is tested on a database of 501 images from 153 patients treated in Ankara Training and Research Hospital during 8-months period. CNN algorithm, which is frequently used in image classification problems, is used in the model. The accuracy of the model was compared with the classification of the doctors. The results showed that the created DL model achieved a classification performance of 89.1%. The proposed model has 89% sensitivity and 88% specificity for IMT classification. The performance of our CNN model, based on the DL method, has been remarkably significant in the classification of the CA IMT. To summarize, the main contributions and developments of the proposed method are that this is an ongoing study of previous segmentation studies; it shows high performance in high number of images in classification studies, high performance in image quality differences and significant reliability and precision in the classification of IMT.

It is also important to emphasize that these positive aspects are very important for a method designed to help prevent CVD. The study showed that DL methods can produce effective results in medical research.