Keywords

1 Introduction

Medical imaging modalities such as X-ray, Computed Tomography (CT) scan, Ultrasound and Magnetic Resonance Imaging (MRI) has allowed non invasive insight into human internal organs. It has made it possible to visualize and observe various organs and cells structures, their function, detect abnormalities or dysfunction as well as assist in pathologic diagnosis [1]. The brain which is one of the most complex, least accessible and prone to complex abnormalities can be expressed in variety of complexity scales [2] is the primary beneficiary of these medical imaging techniques. Deeper understanding of the brain anatomical structures plays crucial role in improving brain lesions and diseases detection [3].

Skull stripping is an important pre-processing step for the analysis of neuroimaging data and MRI images [4, 5, 6]. It refers to the process of delineation and removal non-cerebral tissue region such as skull, scalp and meninges from the brain soft tissues [7]. The accuracy in skull striping process affects the efficiency in detecting tumor, pre-surgical planning, cortical surface reconstruction and brain morphometry [8], and has been considered as an essential step for brain segmentation [9]. Removal of the skull region reduces the chances of misclassifying diseased tissues [10]. The process of skull stripping is poses some challenge due to the complexity of the human brain, variability in the parameters of Magnetic Resonance (MR) scanners and individual characteristics [11]. Poor quality and low contrast images also contribute to difficulties in segmenting the images precisely [10].

From the reviews done, it is presumed that accurate and reliable quantification of the skull stripping outcomes is one of the biggest challenges in the medical imaging domain [4]. Up until now, only a few evaluation criteria have been proposed to quantify the quality of skull stripping outcomes [6]. The common standard used for validating skull stripping is manual delineation which acted as a ground truth where the skull stripping outcomes is compared [12]. Manual delineation which still considered as gold standard [13] is a tedious task, time consuming and subjective due to inter and intra-expert variability [14].

A main issue is that obtaining these validation data and comparison metrics for skull stripping are difficult tasks due to the lack of reliable ground truth [15]. Thus, even if a rich set of manual delineations are available, they may not reflect the ground truth and the true gold standard may need to be estimated [16]. In addition, the subjectivity of human decisions could also introduce inaccuracies and inconsistencies [6].

Thus, this research investigates the accuracy of the proposed techniques, Seed-Based Region Growing (SBRG) segmentation results through a qualitative evaluation of three experienced radiologists. The non-cerebral tissue region are delineated, segmented and removed using SBRG. Then the resulting images are presented to the radiologist for performances assessment. The proposed qualitative evaluation technique is expected to offer a new way of skull stripping evaluation in MRI brain images.

The organization of the rest of this paper is as follows: Sect. 2 presents our materials and methods, including the overview of SBRG methods and descriptions of the qualitative evaluation method proposed. The results and discussions are discussed in Sect. 3. Finally, we present our conclusion in Sect. 4.

2 Materials and Methods

Eighty axial sequence of Fluid Attenuated Inversion Recovery (FLAIR)-MRI of brain normal and abnormal slices were acquired from the Hospital Sungai Buloh, Selangor, Malaysia. The MRI brain images criteria are limited to adult male and female (with their age ranging between 20 and 60 years).

2.1 The Seed-Based Region Growing Algorithm

The skull stripping process is performed using a Seed-Based Region Growing (SBRG) algorithm [17, 18], which developed using a Borland C ++ Builder 6.0. SBRG is very attractive especially for semantic object extraction as well as image applications. Furthermore, SBRG algorithm is observed to be successfully implemented in various applications of medical images [18].

The process of SBRG begins by selecting a seed pixel which is located within the area of delineation. This seed grows iteratively into neighboring pixels of window size 3 × 3 pixels to produce a region with similar mean values. The mean value, M for the M × M neighborhood is calculated as in (1).

$$ Mean\,\,(M) = \frac{\sum grey\,level\,pixels\,value\,in\,MxM\,neighborhood}{\sum number\,of\,pixels\,in\,MxM\,neighborhood} $$
(1)

For every growth from the seed pixel to one of its neighbors, the calculated mean value, M and the grey level of the particular neighbor, G j is compared using (2).

$$ \left| {G_{j} - M} \right| < T $$
(2)

If the absolute difference of the two pixels is less than a pre-defined threshold, T the neighbor pixel will be included into the growing region. The predefined threshold, T is set to 10. The mean value is updated constant while the growing process is recursively iterated until no neighboring pixels are found.

2.2 Qualitative Evaluation Method

Unsupervised qualitative evaluation method is employed for the skull stripping accuracy evaluation. A group of three experienced radiologists is requested to visually analyze the accuracy of 80 skull stripping images produced by the SBRG.

The accuracy level of skull stripping assessment is divided into five categories which are less delineation, slightly less delineation, correct delineation, slightly over delineation, and over delineation as elaborated in Table 1.

Table 1 Accuracy level of skull stripping assessment

Based on the assessment conducted, the performances of skull stripping are then evaluated. Each level of accuracy mentioned is assigned to a weightage based on its significance. The weightage values are significant as it will be used further in the qualitative statistical analysis.

The reliability of agreement among the radiologists is observed. It is used to monitor the consistencies among the radiologists in analyzing the skull stripping analysis. A statistical analysis method known as Fleiss Kappa is employed. Fleiss Kappa is defined as a useful statistical measure for assessing the reliability of agreement between a number of raters when assigning categorical ratings to a number of items or classifying items [19].

Finally, the significance of agreement between the raters is identified based on the Fleiss Kappa values calculated. Richard and Gary [20] in their study summarized that the significance of agreement of Fleiss Kappa can be divided into several categories according to its range values as tabulated in Table 2.

Table 2 Significance of fleiss kappa value

3 Results and Discussion

The accuracy of skull stripping among the radiologists is measured by observing the mode value for level of accuracy rated by each radiologist. From the overall analysis conducted, the percentage of accuracy is calculated using (3):

$$ \% \,Accuracy = \frac{Rated\,Weightage}{{Best\,Weightage\,Value\,\,*No.\,of\,Data\,{{Images}}}} $$
(3)

The variation results among the radiologists are then evaluated using standard deviation. All modes, percentage of accuracy and standard deviation for radiologists produced are tabulated in Table 3.

Table 3 Skull stripping accuracies among radiologists

From Table 3, it can be monitored that the mode values for all radiologists return the value of 5 (correct delineation) level of accuracy. Moreover, the percentage of accuracy for each radiologist is noted to show good and consistent performance as it produced 97, 95.3 and 92.5 % for Radiologist 1, Radiologist 2 and Radiologist 3 respectively. The overall standard deviation value among the radiologists is seen to be at a low rate of 0.287, which verifies a strong consistency of agreement among the radiologists.

Table 4 tabulates the break review of qualitative performances for radiologists where the total occurrence for each weightage value is counted. The percentage of occurrence is evaluated using (4):

$$ \% \,of\,Occurrence = \frac{No.\,of\,Weightage\,Occurrence}{{No.\,of\,Raters\,*\,No.\,of\,Data\,{{Images}}\,}} $$
(4)
Table 4 Qualitative performances review for radiologists

From the Table 4, it is noticeable that radiologists return a highest total occurrence of weightage 5 (correct delineation) which is 191 numbers of occurrence. The weightage 4 (slightly less delineation) is also cannot be underestimated as they produced a good numbers of occurrence too which is 39. The total occurrence is then followed by weightage 3 (slightly over delineation) which is 10 numbers of occurrences. No occurrence of weightage 1 (over delineation) and weightage 2 (less delineation) are reported.

Next, the reliability of agreement among the radiologists in analyzing the skull stripping performances and its significance is identified using Fleiss Kappa Analysis as tabulated in Table 5.

Table 5 Significance of fleiss kappa analysis among radiologists

Referring to Table 5, the Kappa value for the radiologists is found to be fairly high at 0.686 which is categorized as substantial agreement. The significance is considerably good for qualitative performances analysis. Thus, the overall qualitative performances analysis in the study revealed that: (1) the overall performances of SBRG returns “correct delineation” level of accuracy outcome, which proved that the SBRG skull stripped images are significantly capable to be used further in various medical applications processing (2) the SBRG is an effective technique for skull stripping (3) the substantial agreement among the radiologists in reliability of agreement significances proved that the number of raters involved in the study are appropriate for the skull stripping qualitative assessment. (4) the proposed qualitative evaluation method of skull stripping may offer a new way of skull stripping evaluation in MRI brain images.

Table 6 tabulates the samples of correct delineation of skull stripping quantified by radiologists.

Table 6 Samples of correct delineation of skull stripping

4 Conclusion

This research investigates the qualitative performances of skull stripping accuracy for Fluid Attenuated Inversion Recovery (FLAIR)-Magnetic Resonance Imaging (MRI) brain images. The segmentation technique of Seed-based Region Growing (SBRG) is implemented to strip the brain skull region. The skull stripped images are then visually analyzed by a group of three experienced radiologists which return “correct delineation” accuracy level for overall accuracy outcome. Therefore, based on the qualitative analysis performed, it can be concluded that SBRG is an effective method for skull stripping purpose, whereas the proposed qualitative evaluation method of skull stripping may present an innovative method of skull stripping evaluation in MRI brain images.