1 Introduction

Computer-based automated diagnosis of diseases from biomedical images is considered as the potential field of research that integrates the benefits of medical and engineering domain [8]. The non-invasive, soft tissue visualization and economical characteristics of ultrasonic images has made them suitable for diagnosing the abdominal organs such as liver, spleen, pancreases and gall bladder [17]. In specific, liver diseases like fatty liver, hepatitis, cyst, matastasis, cirrhosis and hemangioma can be effectively diagnosed by ultrasonic images [19]. The above diseases are described in Table 1. Despite, the potential characteristics of ultrasonic images, the activity involved in classifying the normal cells from infected cells of the liver is influenced by minimum contrast, close appearances and hazy nature of images [2, 16, 18]. Inherently, the ultrasonic image of one liver disease may closely resemble the image of other liver disease or the similar ultrasonic images of the same liver disorder may exhibit different textures. This resembles of ultrasonic images result in a varying decision of the liver disorder made by the diagnosing radiologists [20]. Further, the diagnosis of this liver disease is influenced by the comfort and experience of the radiologist who are responsible for establishing the contrast and gain setting during the capture of ultrasonic images [9]. Furthermore, the patient’s body condition, probe application and the absence of quality ultrasonic machine also impacts the quality and reassembles of the images [12]. Hence, an automated classification scheme that is capable of handling texture and wavelet features in diagnosing liver diseases with improved accuracy becomes essential. In this paper, a novel texture and wavelet features-based automated liver disease diagnosis mechanism that uses improved active contour for segmentation, shift variant bi-orthogonal wavelet transform for filtering and an integrated random forest-based classification is proposed. The core objective of the proposed liver disease diagnosis scheme focusses on the enhancement of accuracy during classification such that diagnosis is achieved at a rapid rate with reliability.

Table 1 Description of various liver diseases

The subsequent sections of this paper are organized as follows. Section 2 highlights on the literature review conducted for elucidating the merits and limitations of the existing liver disease diagnosis schemes. Section 3 presents the outline of the proposed methodology and detailed explanation about every method for automatic liver disease diagnosis scheme. Section 4 reveals the simulation results and analyses of the proposed liver disease diagnosis scheme in terms of sensitivity, specificity and accuracy. In section 5 concluding part is presented along with future work.

2 Related work

Traditionally, accurate classification for liver diseases is achieved based on the selection of potential features and Region of Interest (ROI) determination. But, the process of estimating the ROI region automatically from the ultrasound images leads to complexity since they do not possess continuous or definite boundaries in the diseased liver part used for analysis. Initially, an automated approach for liver disease detection is facilitated by utilizing the square ROI of 30 × 30 pixel dimension [14]. This method of liver disease diagnosis used 180 images that contained 80 normal liver images and 100 fatty liver images. It is also used and integrated nearly seven textures during fusion of the segmented images to ensure the significant classification rate of 95%. Then Alivar et al. [1] devoted a scheme that used ROI size of 64 × 64 pixels during segmentation. Then ROI images are carefully used for determining different quantifiable wavelet packet, Gray Level Co-occurrence Matrix (GLCM) and Gabor Transform for phenomenal feature extraction. These triple level methods of feature extraction are essential for achieving a significant classification rate. The authors used a dataset of 76 ultrasonic image samples out of which 39 are fatty liver, 30 are normal and 7 are cirrhosis. This classification was based on K-nearest technique and achieved the accurate of 96.1%.

Further, an analysis has been made on the liver ultrasound images by applying various linear, non-linear and diffusion filter for preventing speckle noise and improving the classification rate using Gray Level Run Length Matrix (GLRLM) features and Support Vector Machines (SVM) [10]. The dataset consists of 58 images out of which 20 were cirrhosis, 17 were fatty, 10 were carcinoma, 6 were cyst and 5 were hepatitis. The overall accuracy of this system is 92.91%. In [11], a classification as well as discrimination of stages of fatty liver was performed with 29 subjects including 28 normal, 47 steatosis, 42 fibrosis, and 12 cirrhosis images. The ROI was selected from the focal zone of the back-scan–converted ultrasound images. Then the discrimination between normal and fatty liver is performed using Wavelet Packet Transform (WPT) and Gray Level Co-occurrence Matrix (GLCM). Finally, this system achieved an overall accuracy of 94.91% using SVM classifier. Furthermore, an automated liver disease classification scheme was proposed by Minhas et al. [13] with integrated statistical features and WPT for feature extraction. This automated detection scheme used SVM based multi-objective scheme for classification. The classification rate of this scheme was improved by using ten-fold strategy and confirmed to be greater than 95% since it used the ROI segmentation of 64 × 64 pixel dimensions. In [6], an analysis was made on focal liver lesions using 42 hybrid textural features that were selected by principal component analysis (PCA). This scheme achieved diagnostic accuracy of 96% using feed forward neural network. From the thorough investigation of the aforementioned automated liver diagnostic approaches, the ROI was not select properly and found as major limitation among literature survey. Even though several methods were proposed, none of them deals with a fully automated procedure for six diseases, this motivated to carry out this research towards the same. From the limitation, the base induction for the formulation of the proposed work is determined.

3 Proposed method

This propose scheme depicted in Fig. 1 starts with the process of image acquisition which is followed by improved active contour segmentation. Then shift-invariant bi-orthogonal wavelet transform is enforced on the segmented images for deriving diagonal, horizontal and vertical components based on which the component images are derived. Further, the grey level run length matrix (GLRLM) features extraction is achieved from the derived component images and then classification is performed using the method of random forests. The classification process using random forests is also improved through the incorporation of ten-fold validation procedure. Further, the results obtained from the classification of the utilized GLRLM features are compared with other feature extraction techniques to validate the potential of the proposed diagnosis scheme. The proposed method has been compared with the existing method and the results are validated.

Fig. 1
figure 1

Proposed framework for automated diagnosis of liver diseases

3.1 Improved active contour-based segmentation

Initially, the image is divided into unique regions that contain pixels of similar attributes are segregated during the segmentation process. For achieving segmentation, improved active contour approach is used for separating the boundaries of the object from the utilized image based on the enforcement of constraints. The classical active contour method is improved by initially defining a false edge. When the basic process of active contour is facilitated then the reference point of each determined region is checked. If the gradient value pertaining to the pixel of the segmented regions is very small then the false point is assigned to the reference point. The collection of false points constitutes the false edges. Then the force is enforced on the false edges in the tangential direction which is upright until true edge is visible. Thus, this method can be used for the ultrasonic images since they do not possess a standard boundary and further, the existing boundary may also merge into the neighbourhood regions. Further, this improved active contour is also furthermore based on the contributed work of Chan and Vese [15] that focusses on reducing the energy level. Hence the proposed scheme is proved to be meritorious as it eliminates the limitations of the traditional segmentation schemes existing in the literature. In addition, the gradient is not used as the constraint for determining the boundaries of the regions and hence they are suitable for its application in the segmentation process of blurred and noisy images like the ultrasonic images.

This Chan and Vese-based scheme uses two terms and for fitting energy is depicted using Eqs. (1) and (2)

$$ {E}_{Fit}={F}_1(C)+{F}_2(C){\int}_{in(c)}{\left|{O}_1-{l}_1\right|}^2 dydx+{\int}_{out(c)}{\left|{O}_1-{l}_2\right|}^2 dydx $$
(1)

with

$$ {\mathit{\operatorname{inf}}}_c\left[{F}_1(C)+{F}_2(C)\right]\approx 0\approx {F}_1(C)+{F}_2(C) $$
(2)

where l1 and l2 refers to the mean outside and inside of O1th image that contains regions related to the piecewise intensity constants with Cas the evolutionary curve.

Then the fitting energy expressed in Eq. (1) is minimized by incorporating two terms pertaining to the area and the length of the evolutionary curve as depicted in Eq. (2). This segment also uses the level set method for solving the specific case of minimum partitioning issue that always evolve during the application of improved active contour scheme of Chan and Vese method.

3.2 Shift variant bi-orthogonal wavelet decomposition

The Shift Variant Bi-Orthogonal Wavelet Decomposition used in this automated approach aids in capturing the frequency and temporal data related to the utilized images’ signal that comprises of multiple resolution scaling. This Shift Variant Bi-Orthogonal Wavelet Decomposition is used for investigating the signals of the image and prevents them from generating spurious information that are general in the image analysis. In this analysis, the signals of the image are investigated using varying number of least scales and translations. Initially, the segmented images are converted into four numbers of shift sets viz., Ss = {(0, 0), (0, 1), (1, 0), (1, 1)} for the determination of image pairs. Then the individual image pair results in four sub-images during the process of decomposition achieved using Eq. (3) through the incorporation of filters f1and f2.

$$ {I}_{i-1,k}={\sum}_k{g}_k{I}_{i,2l+k}{f}_{\left(i-1,k\right)} $$
(3)

The resulting sub-images correspond to three higher wavelet-co-efficient-based frequency sub-images and one approximation lower frequency sub-images. Further, the mixing operation of the lower frequency sub-images are performed based on shift sets using origin point such that aids in better approximation of the originally used image. This process of mixing and shifting is performed upto ‘k’ levels, such that better multiple scale representations of the original image are achieved. Then compute the co-efficient of approximation \( \left({M}_k^0\left(a,b\right)\right) \) related to each of the resultant sub-images based on the mean influential approximation pixel points intensity \( {A}_k^0\left(a,b\right) \) and \( {B}_k^0\left(a,b\right) \) using Eq. (4). In addition, determine the wavelet co-efficient of the original image \( \left({D}_i^e\left(a,b\right)\right) \) using the cumulative value of weighted product between the intensity degrees based on varying levels of intensity through Eq. (5)

$$ {M}_k^0\left(a,b\right)=\frac{A_k^0\left(a,b\right)+{B}_k^0\left(a,b\right)}{2} $$
(4)
$$ {D}_i^e\left(a,b\right)={\sum}_{a,b\complement k}^n{w}^e\left({a}^1,{b}^1\right)\left[{E}_i^e\right(a+{a}^1,b+{b}^1\Big]{}^2 $$
(5)

Then estimate the similarity of the original images using the derived multiple scale representations using the Eq. (6)

$$ {M}_{i, AB}^e\left(a,b\right)=2\frac{\sum \limits_{a,b\complement k}{w}^e\left({a}^1,{b}^1\right){E}_{i,A}^e\left(a+{a}^1,b+{b}^1\right){E}_{i,B}^e\left(a+{a}^1,b+{b}^1\right)}{F_{i,A}^e\left(a,b\right)+{F}_{i,B}^e\left(a,b\right)} $$
(6)

Furthermore, estimate the weights of the co-efficient using Eqs. (7) and (8) and then perform the verification process of consistency using for achieving the weights that could be used for decision process during testing and training.

$$ {\alpha}_{i,A}^e=\sum \limits_{a^1,{b}^1}{w}^e\left({a}^1,{b}^1\right){\alpha}_{i,A}^e\left(a+{a}^1,b+{b}^1\right) $$
(7)
$$ {\alpha}_{i,B}^e\left({D}_i^e\left(a,b\right)\right)=\frac{1}{2}+\frac{1}{2}\left(\frac{1-{D}_{i, AB}^e\left(a,b\right)}{1-{F}_M}\right)\kern0.75em $$
(8)
$$ {E}_{i,F}^e\left(a,b\right)={\alpha}_{i,A}^e\left(a,b\right){E}_{i,A}^e\left(a,b\right)+{\alpha}_{i,B}^e\left(a,b\right){E}_{i,B}^e $$
(9)

Finally, determine the wavelet co-efficient of the decomposed image using Eq. (9) that performs the mean operation using the shift sets.

3.3 Feature extraction using GLRLM

In this scheme, nearly 11 GLRLM features such as short-run emphasis, long-run emphasis, run percentage, run-length non-uniformity, grey-level non-uniformity, low grey level run emphasis, high grey level run emphasis, long run low grey level run emphasis, long run high grey level run emphasis, short run low grey level run emphasis and short run high gray level run emphasis are extracted. Further, features related to Euclidean shape, color and to some extent texture contribute to the last level in this automated scheme of liver disease detection. The GLRLM feature is extracted mainly for collecting potential values of the image pixels for classification by enforcing significant constraints in implementation. This GLRLM-based feature extraction scheme is confirmed to gather a better diversity of features even in the grey scale, hazy and appearance of ultrasonic images of liver. This automated approach possesses a better discrimination rate in classifying normal ultrasonic images of liver from diseased liver ultrasonic images in the spatial field.

3.4 Hybrid classification scheme using random forest-based learning

Random forest-based learning technique is used for classification for the five important reasons viz., i) It is potential in handling multi-class issue of classification, ii) It generates decision tree that provides individual votes that maps each input used for classification into the most likely probability class label, iii) It is rapid and capable of estimating non-linear structures of the data through optimal ensemble factor, iv) It is meritorious in utilizing categorical and numerical data of the dataset and v) It prevents the overfitting of data even when the number of decision trees added to the forests are high. The proposed automated liver automation scheme uses three integrated components such as attribute evaluator, instance filter and the forest-based learning algorithm for classification. The following are the three attribute evaluator methods used in this work.

3.4.1 Correlation-based feature selection (CFS)

It is a simple attribute evaluator method that grades feature subsets based on a correlation based heuristic estimation function [4]. The bias associated with the function is towards the subsets containing features that have high correlation with the class but are not correlated with each other [5]. It ignores irrelevant features as they have low correlation with the class. Redundant features are not considered in the resultant feature subset as they are highly correlated with one or other features.

3.5 Symmetrical uncertainty (SU)

SU [4] is an attribute evaluator method. It provides a symmetrical measurement for correlation between features and also balances the bias of mutual information. SU is defined as the fraction between the Information Gain (IG) and the Entropy (H) of two features, x and y such that

$$ SU\left(i,j\right)=2. IG\left(i|j\right)/\left[H(i)+H(j)\right] $$
(10)

where IG is given by

$$ IG\left(i,j\right)=H(j)+H(i)-H\left(i,j\right) $$
(11)

where H(i) and H(i,j) represents entropy and joint respectively.

3.5.1 Gain ratio (GR)

Gain Ratio attribute evaluator is used in the design of improved-RFC approach as it is an improvement to information gain which resolves the matter of bias towards attributes with a larger set of values [7]. It measures gain in information for the purpose of classification with respect to the entropy of feature Fe.

$$ GR\left(C,{F}_e\right)=\left[H(C)-H\left(C|{F}_e\right)\right]/H\left({F}_e\right) $$
(12)

where H(C) represents entropy of class C, H(C| Fe ) represents the entropy of class C given feature Fe and H(Fe) is the entropy of measure of feature Fe.

figure b

Initially, the three attribute evaluators are contextually used for selecting and implementing training in order to choose related attributes for effective classification. Then instance filter is applied for effectively balancing the distributions of the multi-class classification. This instance filter is mainly used for re-sampling the data distribution if the data is not uniformly distributed. The input data and target data is needed for training the classifier. Then the classifier divides the input sample data into two different samples, which are training and testing samples. The training samples are used to train the classifier and the testing samples are then used to provide an independent measure of the classifier performance during and after training. In this work, we utilized 80% of data for training, 20% of data for testing purposes. Finally, the traditional random forest classification method is implemented over the uniformly distributed data obtained in the instance filter step. This classification step improves the accuracy rate to 97.8%, which is superior to other existing liver disease diagnosis schemes.

4 Result analysis and discussions

The result investigated of the proposed automated diseased liver detection scheme is conducted on 180 images which were collected from SRM Medical College Hospital and Research Centre (SRM MCHRC), Chennai. These images were captured from LOGIQF series of ultrasound system using curvilinear probe with the resolution of 540 × 450 pixels. Out of 180 images, 45 cases of normal, 50 cases of fatty, 19 cases of hepatitis, 18 cases of cyst, 13 cases of metastasis, 15 cases of cirrhosis and 20 cases of haemangioma. Nearly 500 iterations are carried out for separating the ROI from the images gathered for investigation and 100 iterations is used for segmenting out the smaller regions of ROI from the utilized original images. Figure 2 shows the automatic selection of ROI in liver ultrasound images. The black rectangle in Fig. 2 depicts a selected region of interest of various diseases with 64 × 64 size of each image automatically. In order to avoid distortion, these sub-images were taken near the center lobe of the ultrasound image. Care was taken to avoid hepatic vessels, bile stores in the liver and areas of echo non-homogeneity while selecting the ROI. indicates a selected sub-image. These selected regions are representative regions based on which diagnosis can be done and used for feature extraction process. The GLRLM features are extracted with orientations viz., 0,45, 90 and 135 degrees such that 1980 features are collected from 180 images involved in shift variant bi-orthogonal wavelet decomposition. This proposed scheme was simulated using Matlab R2017b. Since no comparison is available on a common database, the performance of three methods have been compared on liver US images available with us. The significance of the proposed approach is measured in terms of sensitivity, specificity and accuracy [21]. It is given by:

$$ Sensitivity=\frac{TP}{\left( TP+ FN\right)}X100 $$
(13)
$$ Specificity=\frac{TN}{\left( TN+ FP\right)}X100 $$
(14)
$$ Accuracy=\frac{\left( TP+ TN\right)}{\left( TP+ FN\right)+\left( FP+ FN\right)}X100 $$
(15)

where TP represents the number of positive examples classified correctly.

FP:

represents the number of positive examples misclassified as negative

FN:

represents the number of negative examples misclassified as positive

TN:

represents the number of negative examples classified correctly

Fig. 2
figure 2

Examples of automated ROI selection

Table 2 depicts results of the proposed classifier. If the overall classification rate and the accuracy are high, it signifies that the proposed classifier was successful in correctly classifying the seven classes. The accuracy rate of Improved Random Forest Classifier (IRFC) used in this automated detection technique is conformed to infer an excellent classification accuracy. This improvement in true positive rate of this proposed automated scheme is achieved due the eleven different GLRLM features extracted and the classification module used. Figure 3 shows the comparison of other feature extraction techniques with the proposed scheme. The performance of GLRLM features yield 93.4% of BPNN, 95% and 98.5% of IRFC respectively. As a result, GLRLM features were confirmed as highly accurate and suitable features for proposed classifier of ultrasound liver diseases. Some misclassifications made by the classifier are depicted in Fig. 4.

Table 2 Results of the proposed classifier
Fig. 3
figure 3

Performance analysis of various feature extraction techniques

Fig. 4
figure 4

The misclassified instances

Figure 5 shows the overall performance of various classifiers. The performance of proposed method yields an accuracy of 97.8%, sensitivity of 90.5% and specificity of 99.5% respectively. The performance of BPNN and SVM yields an accuracy of 94.1% and 95.2% respectively. Along with the GLRLM features, the proposed classifier showed an excellent accuracy when compared with BPNN and SVM [3].

Fig. 5
figure 5

Overall performance of various classifiers

5 Conclusion

The presented novel texture and wavelet features-based automated liver disease diagnosis mechanism improves the sensitivity, specificity and accuracy. The utilization of shift variant bio-orthogonal wavelet transforms is confirmed to increase the contrast during the diagnosis of liver disorders. From the above results, we found best features as GLRLM among other features such as GLCM, invariant moments, and intensity histogram. The proposed method is a fast method, robust to noise and it is a successful ensemble which can identify non-linear patterns in the data. Moreover, the major advantage of proposed method is that it does not suffer from overfitting, even if more trees are appended to the forest. The results of the proposed liver disease detection scheme confirmed better classification accuracy of 97.8% better to the baseline liver disease automated techniques used for comparison. A comparative approach revealed that both GLRLM and proposed classifier showed excellent accuracy for automated liver disease diagnosis. Further the work can be extended for extracting features based upon the differences in echogenicity of the liver from that of spleen and renal cortex. These extracted features can be utilized to improve the diagnostic accuracy. Also, calculation parameters like shape index, area of liver can be automated to aid the radiologist.