Introduction

The brain tumor is an unusual growth of cells in the brain. It is broadly classified into benign and malignant type. The benign tumors are non-cancerous tumors and malignant are cancerous tumors [1]. Basically, these tumors are analyzed according to the grading system. The grade 1 and 2 comes under benign tumors while grade 3 and 4 comes under malignant tumors. The malignant tumors are fast growing and affect the brain. These tumors spread into the different parts of the brain and spinal cord [2]. Identification of tumor position and type in the early stage is very important in medical field. A typical brain image with benign tumor is shown in Fig. 1.

Fig. 1
figure 1

Typical MR brain image with benign tumor

Magnetic resonance (MR) imaging provides high quality brain images. MR image includes structural MRI and functional MRI. The sMRI provides anatomical structure of the brain and fMRI provide metabolic function of the brain [3, 4]. In this paper, sMRI is used to identify the brain tumors. Brain tumor detection approach includes three stages: preprocessing, classification and post processing. 2D-DWT and PCA is used in preprocessing stage. Wavelet transform plays a vital role in MR image feature extraction. MR images can be analyzed at different levels of resolution using wavelet transform [5]. The features of the MR images can be extracted using 2D-DWT. But, this technique requires large storage [6]. The principal component analysis is used to eliminate the above limitation. PCA efficiently reduces the dimensions of the feature vector and provides most important features to classifier [7].

Most of the researchers used machine learning algorithms for detecting brain tumors from MR images. Machine learning algorithms like support vector machine (SVM), decision trees (DT), random forest (RF), Naive Bayes (NB), and hidden Markov models (HMM) play major roles in biomedical image processing. SVM is a popular linear classifier in machine learning. It provides higher accuracy with less data [8]. If the dataset contains less data, it classifies the images efficiently [9]. But it fails to classify images, if the dataset is too large. Moreover, the SVM classifier performs well for linearly separable data. Most of the tumor features like intensity, shape, and texture are nonlinear. In this paper, the proposed kernel SVM (KSVM) classifies the nonlinear MR image features by mapping extracted features into a multi-dimensional space. This mapping feature allows KSVM to obtain a hyperplane that extracts tumor features. Also, it integrates multiple extracted features into a single classification framework. It uses various kernels like linear, polynomial, and GBR kernels. These kernels are used to obtain the nonlinear relationship between the MR image features. Unlike SVM, the proposed KSVM is robust to noise. In order to overcome overfitting, stratified K-fold cross-validation is used in KSVM [10].

This paper is organized in 7 sections. The two-dimensional discrete wavelet transform is presented in Sect. "Discrete Wavelet Transform". The dimensionality reduction using PCA is introduced in Sect. "Principal Component Analysis". The kernel support vector machine classification is interpreted in Sect. "Kernel Support Vector Machine". Section "K-Fold Cross-validation" presents Stratified K-fold cross validation. The results are discussed in Sect. "Results and Discussion". Section "Conclusion" concludes the study with quantitative analysis.

Discrete Wavelet Transform

Feature extraction is procedure of extracting statistical data from MR image like edges, contrast, region of interests, color features, shape, ridges, contrast, texture, etc. [11]. In this paper, DWT is used to obtain the qualitative features like gray-level co-occurrence matrix and wavelet coefficients. In general, it is used to convert spatial domain image into frequency domain. It provides Frequency information along with time information. MR image wavelet coefficients can be extracted using DWT. The primary advantage of DWT is that it gives high temporal resolution for high frequency components and good frequency resolutions for low frequency components. DWT extracts the MR image features using analysis filter banks and decomposition operation. It contains a high pass and low pass filters for every decomposition level. The high pass and low pass filter give the detail and approximation coefficients of a MR image respectively. The 2D-DWT is derived from two separate 1D DWT [12]. In multi-resolution analysis, 2D-DWT decomposition MR image decomposed into two levels. In every decomposition level DWT obtains four sub bands Low-Low (LL), Low-High (LH), High-Low (HL) and High-Low (HL) respectively. The LL subband contains approximation coefficients, where LH, HL, and HH sub-bands contains detailed coefficients. The accurate approximate coefficients can be obtained by extending the number of decomposition levels.

$$\begin{aligned} \xi (k, l)&= \xi (k)\xi (l) \end{aligned}$$
(1)
$$\begin{aligned} \psi ^{H}(k, l)&= \psi (k)\xi (l) \end{aligned}$$
(2)
$$\begin{aligned} \psi ^{V}(k, l)&= \xi (k)\psi (l) \end{aligned}$$
(3)
$$\begin{aligned} \psi ^{D}(p, q)&= \psi (p)\psi (q) \end{aligned}$$
(4)

\(\xi (.)\) and \(\psi (.)\) are \(1-\)D scaling function and wavelet function, respectively. \(\psi ^{H}\) obtains variation along the columns, \(\psi ^{V}\) obtains variation along the rows, and \(\psi ^{D}\) obtains variation along the diagonals (refer Eqs. (14)) [13]. Equations (5, 6) show the 2D-DWT scaled and translating basis functions.

$$\begin{aligned} \xi _{j,m,n}(k,l)&= (2)^{\frac{b}{2}}(2^{b}k-p, 2^{b}l-q) \end{aligned}$$
(5)
$$\begin{aligned} \psi ^{j}_{i,m,n}(k, l)&= (2)^{\frac{b}{2}}(2^{b}k-p, 2^{b}l-q) \end{aligned}$$
(6)

Where kl are the translation quantites, pq are the spatial quantities, and j is the scale. The transformed image of f(kl) of size \(P \times Q\) is represented in Eqs. (7, 8)

$$\begin{aligned} Z_{\phi }(j_{0}, p, q)&=\frac{1}{\sqrt{(}PQ)}\sum _{i=0}^{a-1}\sum _{j=0}^{b-1}f(k, l)\xi _{j_{0}, m, n}(k,l) \end{aligned}$$
(7)
$$\begin{aligned} Z_{\psi }(j, p, q)&=\frac{1}{\sqrt{(}PQ)}\sum _{x=0}^{p-1}\sum _{y=0}^{q-1}f(k, l)\psi ^{i}_{j_{0}, m, n}(k,l) \end{aligned}$$
(8)

Where \(Z_{\phi }(j_{0}, p, q)\) are the approximate coefficients, \(Z_{\psi }(j, p, q)\) are the detailed coefficients, and \(j_{0}\) is an arbitrary scale.

Principal Component Analysis

Excessive features increase the computation complexity. Also, excessive features lead to curse of dimensionality. Number of features must be reduced to avoid the above stated problems [14]. The dimensionality reduction can be done using principal component analysis (PCA), linear discriminate analysis (LDA), and multifractal detrended fluctuation analysis (MFDFA). Among the above algorithms PCA is an efficient and powerful tool. It is a popular non-parametric method for dimension reduction. PCA is an unsupervised learning algorithm and it does not require any knowledge about the data. But, LDA requires labelled data for each data point. LDA can be applied to multivariate normal distribution data only. Also, it assumes that the covariance matrices of all the labels are equal. MFDFA reduces the data by calculating the fitting power-law function and fluctuation function of the input data. These calculations required a lot of time compared to PCA. Moreover, MFDFA is noise sensitive. Due to these limitations, LDA and MFDFA are not used in data reduction. Dimensionality reduction is achieved by transforming the extracted features into a new reduced dataset according to the variance of the input data [15]. Because of using variance parameter, it extracts the most of the relevant information from confusing data sets. The input data must be normalized with unity variance and zero mean before performing dimensionality reduction. After performing PCA the below mentioned features were formed [16].

Contrast (C): The contrast measurement of MR image given by Eq. (9).

$$\begin{aligned} C =\sum _{i=0}^{p-1}\sum _{j=0}^{q-1}(k-l)^2 f(k,l) \end{aligned}$$
(9)

Energy (E): The energy of the image is given in Eq. (10).

$$\begin{aligned} E =\sqrt{\sum _{i=0}^{p-1}\sum _{j=0}^{q-1}f^{2}(k,l)} \end{aligned}$$
(10)

Correlation (COR): The correlation finds the spatial features dependencies between the neighbor pixels. It is given in Eq. (11).

$$\begin{aligned} \gamma _{fg} = \frac{\sum _{x=0}^{M-1}\sum _{y=0}^{N-1}(f_{min}-\bar{f})(g_{min}-\bar{g})}{\sqrt{\sum _{x=0}^{M-1}\sum _{y=0}^{N-1}(f_{min}-\bar{f})^2\sum _{x=0}^{M-1}\sum _{y=0}^{N-1}(g_{min}-\bar{g})^2}} \end{aligned}$$
(11)

Where \(f_{min}\) is the minimum pixel in the image \(f(x, y), g_{min}\) is the minimum pixel in the image \(g(x, y), \bar{f}\,and\, \bar{g}\) are the mean of the input and output images respectively.

Homogeneity (HOM): The homogeneity is a measurement of local uniformity in MR image and shown in Eq. (12).

$$\begin{aligned} H =\sum _{k=0}^{P-1}\sum _{l=0}^{Q-1}\frac{1}{1+(k-l)^2} f(k,l) \end{aligned}$$
(12)

Entropy (E): The entropy is used to calculate the designated interference of the MR image and as shown in Eq. (13).

$$\begin{aligned} ENT =\sum _{i=0}^{P-1}\sum _{j=0}^{Q-1}f(k,l)\log _{2}f(k,l) \end{aligned}$$
(13)

These extracted qualitative features were given to the Kernel SVM for classification of brain tumor images.

Kernel Support Vector Machine

Brain tumor classification using machine learning algorithms such as support vector machine (SVM), decision trees (DT), random forest (RF), Naive Bayes (NB), and hidden Markov models (HMM) plays a major role in biomedical image processing. In general, MR images have a greater number of image features or voxels, resulting in high-dimensional data. In general, the RF algorithm fails to classify the high-dimensional data. HMMs are mostly used in the analysis of temporal or sequential data such as time-series or brain activity patterns. It is more suitable for functional MR images rather than structural MR images. In this paper, structural MR images were used for identifying brain tumors. The tumor’s presence in the brain is decided by the imaging features like shape, intensity, and texture. These image features are complex and nonlinear. The nonlinear relationship between input features and class labels can be effectively handled by SVM. Among all the improved SVM algorithms, kernel SVM (KSVM) is most effective and popular algorithm. KSVM is mostly used in biomedical image processing, bioinformatics and natural language categorization. It provides global and unique solution for convex quadratic optimization using tunable parameters [17]. The flowchart of the proposed algorithm is shown in Fig. 2.

Fig. 2
figure 2

Workflow of the proposed algorithm

In general, SVM uses hyperplane to classify input data. Traditional SVM algorithm fails to classify the data, if the data is located in different locations of the hypersurface. In the above case kernel strategy is used with SVM to classify the data. KSVM is obtained by replacing dot product with a nonlinear kernel function [18]. The kernel function is given in Eq. (14).

$$\begin{aligned} k(p_{i}, p_{j}) = \varepsilon (p_{i})\varepsilon (p_{j}) \end{aligned}$$
(14)

Where \(p_{i}\) is an n-dimensional vector, \(\varepsilon (p_{i})\) and \(varepsilon(p_{j})\) are the mapping function and penalty function, respectively.

Transformed space w is given in Eq. (15).

$$\begin{aligned} w = \sum _{i}\eta _{i}\lambda _{i}\xi (x_{i}) \end{aligned}$$
(15)

Where \(\eta _{i}\) is a regularization parameter, \(\lambda _{i}\) is a step length, and \(\xi (x_{i})\) is a penalty coefficient.

Dot product is represented in Eq. (16).

$$\begin{aligned} \varepsilon (x) = \eta _{i}\lambda _{i}k(p_{i}, p) \end{aligned}$$
(16)

In a transformed space KSVM provides maximum-margin hyperplane compared to SVM. The KSVM transformation is nonlinear and input data is transformed to higher dimensional space [19].

KSVM uses:

Linear kernel: It is represented in Eq. (17).

$$\begin{aligned} k(p_{i}, p_{j})= p_{i}^{T}p_{j} \end{aligned}$$
(17)

Polynomial kernel: It is represented in Eq. (18).

$$\begin{aligned} k(p_{i}, p_{j})=( \alpha p_{i}^{T}p_{j}+a)^b \end{aligned}$$
(18)

Where ab are constants and \(\alpha\) is an error term associated with kernel function.

Gaussian radial basis kernel: It is used when there is no prior knowledge about the input data. It is represented in Eq. (19).

$$\begin{aligned} k(p_{i}, p_{j})= e^{\left( \frac{\Vert p_{i}-p_{j}\Vert ^2}{2\sigma ^2}\right) } \end{aligned}$$
(19)

Where \(\Vert p_{i}-p_{j}\Vert\) is the Euclidean distance between \(p_{i}\) and \(p_{j}\), and \(\sigma ^2\) is the variance. The kernel parameters must be adjusted before the data training. The training data base has abnormal and normal images.

K-Fold Cross-validation

The common problem in ML classifier algorithm is overfitting. In order to eliminate the above problem cross validation is introduced. Cross-validation methods are broadly classified into three types: leave-one-out validation, \(k-\) fold cross validation (KFCV) and Random sub-sampling. Second method is widely used because of its simplicity. In KFCV method data is dividing into k subsets. Out of k subsets \(k-1\) folds used for train the input data and one-fold is used for test the input data [20]. Also, calculates the average of errors for k folds.

In \(k-\)fold cross validation method all folds cannot have equal samples. The above limitation can be eliminated by using stratified \(k-\)fold cross validation (SKFCV). Selection of parameter k plays crucial role in SKFCV. If k is large, the error estimator variance is high and the bias is low. If k is small, the error estimator variance is low and the bias is high. Also, \(k-\)value is proportional to the computational time. In these paper \(k-\)values randomly selected as five. After number of iterations, optimum \(k-\)value can be chosen according to highest classification accuracy.

Results and Discussion

The proposed algorithm tested using intel i7 processor with 8GHz processor, 2GB RAM and Windows 10 operating system. The programs can be run on MATLAB 2022a.

Dataset: The dataset used is downloaded from Harvard Medical School, US. The evaluation dataset contains 80 MRIs, including 70 abnormal and 10 normal images. Dataset details are shown in Table 1.

Table 1 Details of training and validation dataset

In the pre-processing, the DWT is applied to extract the features like correlation, homogeneity, entropy, and skewness etc. These features were extracted by performing single level DWT decomposition. Two-dimensional DWT decomposes the MR image into four individual sub bands: Approximate, horizontal, vertical and diagonal coefficients. The input MR image with tumor and all the four coefficients of benign tumor MR image are shown in Figs. 3 and 4, respectively. The input MR image without tumor and the first level decomposed components of No tumor image are shown in Figs. 5 and 6, respectively.

Fig. 3
figure 3

Input MR image with tumor

Fig. 4
figure 4

First level DWT decomposition from MR image with tumor

Fig. 5
figure 5

Input MR image without tumor

Fig. 6
figure 6

First level DWT decomposition from MR image without tumor

The features extracted using 2D-DWT are reduced using PCA. The 2D-DWT extracted 65536 features from input MR images. By using PCA these 65536 features were reduced to 1024. The confusion matrix for different kernels is shown in Table 2 and Dimensionality reduction details are depicted in Table 3.

Table 2 Confusion matrix for different kernels
Table 3 Details of dimensionality reduction

The following parameters are used as indices to check the performance and validation of the proposed algorithm and are depicted in Eqs. (2022).

$$\begin{aligned} Accuracy&= \frac{TP+TN}{TP+FP+TN+FN} \end{aligned}$$
(20)
$$\begin{aligned} Sensitivity&= \frac{TP}{TP+FN} \end{aligned}$$
(21)
$$\begin{aligned} Specificity&= \frac{TN}{TP+TN} \end{aligned}$$
(22)

Where TP represents True positive (TP), TN represents True negative, FP represents False-positive and FN represents False- negative. The performance analysis of the chosen algorithm with different kernels is shown in Table 4.

Table 4 Performance analysis of the kernel support vector machine with different kernels

The proposed algorithm obtained good results for training and validation images. Among the three algorithms, GRB kernel achieved highest sensitivity and specificity. For linear kernel, the classification accuracy was \(93.75\%;\) for polynomial kernel, accuracy was \(96.25\%;\) and for the GRB kernel, accuracy was \(98.75\%.\) The GRB kernel performed well compared to other two kernels.

The proposed algorithm is compared with RF, HMM, and SVM. It obtained good results for training and validation images [21]. Among the four algorithms, KSVM with GRB kernel outperformed with sensitivity of \(99\%\), specificity of \(98\%\), and accuracy of \(98.75\%\). The comparison of various algorithms with the proposed algorithm is shown in Table 5.

Table 5 Performance analysis of the proposed algorithm with various classification algorithms

Conclusion

Classification of a benign tumor from sMRI using computer aided diagnosis system remains a challenge. The proposed algorithm successfully classified the tumor images from the dataset. The developed algorithm employs DWT, PCA and KSVM. DWT is used to extracted the statistical features of MR images and PCA is reduced the dimensions of the extracted features. KSVM is used to classify the abnormal MR images. Dot product in SVM is replaced by kernels in KSVM. Three kernels were used to identify the highest classification accuracy. The performance analysis showed that among the three algorithms, GRB kernel achieved \(98\%\) accuracy. The limitation of the proposed algorithm is processing time. KSVM processing time is high compare to SVM. Feature work will focus on reducing processing time.