Keywords

1 Introduction

Image mining is concerned with the extraction of useful information and knowledge from image data. The representation of the raw image data in a format that allows for the effective and efficient application of data mining is key to the success of any form of applied image mining. The domain of image mining can be divided into Region Of Interest (ROI) mining and whole image mining. Traditionally work on image mining has been directed at 2-D scenarios, however, increasingly more and more 3-D image data is available. Consequently there is also an increased interest in 3-D image mining; Volume Of Interest (VOI) mining.

This paper proposes and compares a number of techniques for VOI Based Image Classification (VOIBIC). VOIBIC entails a number of challenges. The first is the identification and isolation of the VOI. The second is how best to represent the VOI so that classification can be effectively and efficiently conducted. With regard to the work described in this paper three VOIBIC approaches are considered. The first is founded on the idea of using statistical metrics, the Statistical metrics based representation. This representation offers the advantage that it is straightforward and, although not especially novel, provides a benchmark. The second proposed representation is founded on the concept of point series (curves) describing the perimeter of a VOI, the Point Series representation. Two variations of this representation are considered: (i) Disc based and (ii) Spoke based. The third proposed representation is founded on a Frequent Subgraph Mining (FSM) technique whereby the VOI is represented using an Oct-tree structure to which FSM can be applied. The identified frequent subtrees can then be used to define a feature vector representation compatible with many classifier generation methods. This paper also considers augmenting the VOI data with meta data and determining the effect this has on classification performance. The presented evaluation used two 3-D MRI brain scan datasets: (i) epilepsy patients versus healthy people and (ii) musicians versus non-musicians. The VOI in this case were the lateral ventricles, a distinctive object in MRI brain scan data.

The rest of the paper is organised as follows. Section 2 provides a review of some previous work relating to VOI classification. This is followed in Sect. 3 with a description of the datasets to be adopted in this paper. The three VOI classifications approaches are described in Sect. 4. Section 5 provides a comparative evaluation of the operation of the proposed VOI classification approaches including comparison with some previous related work. Finally, the paper is summarised and concluded in Sect. 6.

2 Related Work

This section provides a review of some previous related work concerning 3-D MRI brain scan classification [22, 24, 29]. In [24] a method was proposed to classify brain tissue from 3-D MRI brain scans using a partial volume model. A histogram based representation was used to measure brain tissue intensity and noise variance in the image and then classify this into six tissue types using “a posteriori” classifier. Although the result was promising the performance of the classification process was relatively expensive. Both [22] and [29] proposed similar work based on Discrete Wavelet Transform (DWT) representations. Both proposed a classification method to classify normal and abnormal brains. DWT was used to extract features from 3-D MRI images and then Principle Component Analysis (PCA) was applied to reduce the number of dimensions. Their results were excellent in terms of both effectiveness and efficiency but it was argued that the difference between normal and abnormal brains might be too obvious. For work based on Tree based representations, [3] and [4] proposed an approach to classify brain tissue using a minimum spanning tree graph-theoretic approach. This was evaluated using four classes: (i) elderly, (ii) young normal individual brain, (iii) ischemia patients’ brain and (iv) Alzheimer’ brain. Some good results, in terms of effectiveness, were reported. In [21] an approach to classifying MRI brain scans was proposed according to different levels of education. The lateral and third ventricles of the brain were used and represented using an Oct-tree structure and FSM used to generate feature vectors. The classification results were promising in terms of classification effectiveness, but run time complexity was high.

3 Application Domain

The research presented in this paper is concerned with Volumes Of Interest (VOI) and more specifically at identifying and classifying VOI in 3-D Magnetic Resonance Imaging (MRI) scans of the human brain. The particular focus is the left and right (lateral) ventricles; the cerebrospinal fluid filled spaces at the centre of the brain [20]. Their function is: (i) to act as shock absorbers, (ii) to distribute nutrients to the brain and (iii) remove waste. There are in fact four ventricles in a human brain: two lateral ventricles (referred to as the left and right ventricles), a third smaller ventricle connected to both lateral ventricles and a fourth smaller ventricle that connects the third ventricle to the spinal cord. Only the left and right lateral ventricles are considered with respect to the focus of the presented research. This was because: (i) the lateral ventricles are relatively easy to identify within 3-D MRI brain scans, so facilitating automatic extraction; and (ii) they are much larger than the other two ventricles and consequently can be argued to be more significant. An example of a 3-D MRI brain scan is given in Fig. 1 where the “lateral ventricles” are the dark areas at the centre of the brain. Note that 3-D MRIs comprise a sequence of two dimensional (2-D) “slices” through the brain in each of the three cardinal planes: (a) Sagittal - SAG (left to right), (b) Coronal - COR (front to back) and (c) Transverse - TRA (top to bottom).

Fig. 1.
figure 1

Example of a 3-D brain MRI scan: (a) Sagittal (SAG) plane, (b) Coronal (COR) plane and (c) Transverse (TRA) plane

Two MRI brain scan datasets were used for evaluation purposes with respect to the research presented in this paper: (i) Epilepsy and (ii) Musician. For the Epilepsy dataset, the MRI brain scan volumes were obtained by the Magnetic Resonance and Image Analysis Research Centre (MRIARC), at the University of Liverpool, between the years 1999 and 2004. The Epilepsy dataset comprised 210 MRI brain scans. Of these 105 were from healthy people and the remaining 105 from epilepsy patients. For the Musician dataset, some of the MRI brain scan volumes were obtained by MRIARC between the years 1999 and 2004, and the remainder by the University of Heidelberg between the years 2000 and 2004. The Musician dataset comprised a total of 160 MRI brain scans. Of these 80 were from musicians and the remaining 80 from non-musicians.

4 Volume of Interest Classification

This section describes the three proposed VOI classification approaches. Note that the thresholding segmentation techniques presented in [26] was used to extract the VOI (lateral ventricles). An example of an extracted lateral ventricle is shown in Fig. 2. The three VOI classification approaches are presented in the following subsections.

Fig. 2.
figure 2

Example of an extracted right ventricle (figure created using Meshlab [1]).

4.1 Statistical Metrics Based Classification

In this section the Statistical metric based approach to VOIBIC is presented, whereby a number of statistical measures are used to represent VOIs. The measures are used to define a N-dimensional feature space, one dimension per metic, from which a feature vector representation can be extracted, one feature vector per volume (ventricle). The feature vectors generated are then used as input to a classifier generator. It was anticipated that this representation would be unlikely to provide effective results, however the approach would provide a benchmark with which the two alternative representations could be compared.

The usage of statistical techniques is widely reported in the literature. In the context of 2-D, example applications include [12] and [13]; and in 3-D, example applications include [11] and [25]. The metrics considered with respect to the work presented in this paper were:

  1. 1.

    Axis length ( l ): The axis length of the ventricles.

  2. 2.

    Axis width ( w ): The axis width of the ventricles.

  3. 3.

    Axis depth ( d ): The axis depth of the ventricles.

  4. 4.

    Maximum perimeter on xy plane ( \(p_{xy}\) ): The maximum perimeter of the ventricles in the xy plane.

  5. 5.

    Maximum perimeter on yz plane ( \(p_{yz}\) ): The maximum perimeter of the ventricles in the yz plane.

  6. 6.

    Maximum perimeter in the xz plane ( \(p_{xz}\) ): The maximum perimeter of the ventricles on xz plane.

  7. 7.

    Volume ( vol ): The volume of the ventricles (directly relating to the number of voxels, note that in the case of the ventricles 1 voxel \(= 1 \) mm\(^3\)).

  8. 8.

    Volume extent ( \(v_{ext}\) ): The value derived by dividing the volume (vol) by the size of the minimum bounding cube surrounding the volume (\(\frac{Vol}{l \times w \times d}\)).

Note that two feature vectors were generated, one for each lateral ventricle, per image.

4.2 Point Series Based Classification

The proposed Point series based VOIBIC approach is presented in this section. The point series represents the boundary of the VOI. Two techniques for generating the desired point series are considered: (i) Disc based and (ii) Spoke based. Regardless of which technique is used, the resulting point series can be translated into feature vectors compatible with a number of classification model generators. Alternatively they can be used directly using a K-Nearest Neighbour (KNN) approach. In the context of the first, a signature based approach is proposed for generating the model, founded on Hough signature extraction [14]. The key idea is to transform a shape into a parameter space where the shape can be represented in a spatially compact way, a Straight Line Hough Transform (SLHT) was used for the transformation (see [27] for further detail). For the KNN mechanism a similarity measure is required, the simplest is the Euclidean distance between a labelled comparator series and a previously unseen series. However, this requires that both series are of the same length. Instead it is proposed that the “warping path” distance, generated using Dynamic Time Warping (DTW) [2], can be used.

As noted above, two techniques for generating the desired point series are proposed: (i) Disc based and (ii) Spoke based. The first uses all the boundary voxels while the second uses a representative subset of the complete set of boundary voxels. The input in both cases is a binary-valued 3-D image comprising voxels labelled as being either black (belonging to the VOI) or white (not part of the VOI). The output in both cases is a point series describing the VOI boundary. Both the Disc based and Spoke based techniques operate with reference to a primary axis (see Fig. 3). Consequently for each image three point series can be generated for each ventricle. Thus, in total six, curves are generated for each MRI image, three describing the left ventricle and three describing the right ventricle. Each technique is described in further detail in the following two subsections.

Disc Based Technique. The Disc based representation is founded on the idea of generating a point series by considering a sequence of slices, slice by slice (along a primary axis), and collecting point information from the boundary where each slice and the volume of interest intersect. The intersection is usually described by a circular shape, as illustrated in Fig. 3a hence the technique is referred to as the “Disc” based technique. In this manner a sequence of disc boundaries is generated and concatenated together to form a single point series.

Fig. 3.
figure 3

Point series generation techniques

Spoke Based Technique. The Spoke based representation technique is illustrated in Fig. 3b. The technique involves measuring the distance from the geometric centroid of the VOI to points on the boundary, in a given plane. As such it is essentially a 2-D technique. The effect is that of a sequence of spokes of different length radiating from the centroid.

4.3 Tree Based Classification

The tree based VOIBIC approach is presented in this section. The approach uses the concept of Oct-trees decomposition akin to the Quad-trees decomposition used with respect to 2-D [5]. The Oct-tree is constructed by repeatedly subdividing the space into “octants” [17, 23]. The decomposition continues until homogeneous sub-regions are arrived at or a user defined “maximum depth” is reached. [8, 9]. The decomposition can be conducted according to a variety of image features such as colour or intensity. With respect to the lateral ventricles a binary encoding was used. The VOI was encapsulated in a Minimum Bounding Box (MBB). The voxels representing the VOI were allocated a “1” (black), while the remainder were allocated a“0” (white). The advantage of this representation is that it maintains information about the relative location and size of the VOI.

Trees are not readily compatible with classifier generation, they needs to be processed in some way so that a feature vector representation, compatible with classifier model generation, can be formulated. The idea presented here is to apply Frequent Subgraph Mining (FSM) to the Oct-tree data structure and then used the identified frequent sub-trees to define a feature space. A number of FSM algorithms have been proposed, such as: (i) gSpan [28], (ii) AGM [16] and (iii) FFSM [15]. The gSpan algorithm, coupled with the Average Total Weighting (ATW) scheme desribed in [18], was adopted. The output from the FSM was a set of frequently occurring sub-graphs together with their occurrence counts. Typically a large number of sub-graphs are generated many of which are redundant (do not serve to discriminate between classes). Feature selection techniques were applied to reduce the overall number of identified frequent sub-graphs.

5 Evaluation

The evaluation of the proposed VOIBIC approaches is presented here. The section is divided into three sub-sections (Sub-sects. 5.15.3): (i) comparison of the proposed VOIBIC approaches, (ii) statistical significance testing of the results and (iii) comparison with previous work.

5.1 Comparison of the VOIBIC Approaches

The comparison of the proposed VOIBIC approaches was undertaken in terms of: (i) classification effectiveness and (ii) efficiency. In terms of effectiveness many experiments were conducted, not shown here, using a variety of parameter settings. The results presented here, however, are those generated using best performing parameter settings (so as to consider each technique to its best advantage). The comparison is also divided into two parts: (i) without augmentation and (ii) with augmentation.

Tables 1 and 2 summarise the “best” classification results obtained for the Epilepsy and Musicians datasets respectively. In the tables the acronyms “SMB”, “DB”, “SB” and “TB” refer to the Statistical Metric Based, Disc Based, Spoke Based, and Tree Based approaches respectively. The abbreviations “Accu.”, “Sens.” and “Spec.” indicate classification accuracy, sensitivity and specificity. Best results are indicated in bold font. From the Tables it can be seen that for all the metrics considered the Tree based approach tended to be most effective. The exception is in the case of the sensitivity value; where the Spoke based approach, when applied to the Musician dataset, produced the best result. Tables 3 and 4 summarise the “best” classification effectiveness results obtained using the Epilepsy and Musician datasets augmented with age and gender data. From the tables it can be seen that the Tree and Spoke based techniques produced the best results.

Table 1. Classification effectiveness results for Epilepsy dataset (without augmentation)
Table 2. Classification effectiveness results for Musician dataset (without augmentation)
Table 3. Classification effectiveness results for Epilepsy dataset with augmentation (Epilepsy+)
Table 4. Classification effectiveness results for Musician dataset with augmentation (Musician+)

Figures 4 and 5 show the runtime results obtained, with respect to the best performing VOIBIC approaches considered previously, for the Epilepsy and Musician datasets respectively. The presented run times are the total TCV run time in each case, including: feature extraction, curve generation (for the Point series based approach), Oct-tree generation (for the Tree based approach), training and testing. All the experiments were conducted using a 2.9 GHz Intel Core i7 with 8 GB RAM on OS X (10.9) operating system.

The run time complexities of the four VOIBIC approaches are presented in Figs. 4 and 5. Figure 4 shows the run time complexity for the classification process when applied to the Epilepsy and Epilepsy+ (augmented) datasets, and Fig. 5 the run time complexity for the classification process when applied to the Musician and Musician+ (augmented) datasets.

Fig. 4.
figure 4

Run time complexity for the classification process using the Epilepsy and Epilepsy+ datasets (Color figure online)

Fig. 5.
figure 5

Run time complexity for the classification process using the Musician and Musician+ datasets (Color figure online)

From the figures it can be seen that the Statistical metrics based approach, as was to be expected, was the most efficient for all datasets (the run time was less than one minute; hence the bar is not visible in the figures); while the Tree based approach was the most expensive. It is obvious that there was a trade-off between effectiveness and efficiency. However it is possible to claim that the Spoke based approach produced the best overall results, in terms of both effectiveness and efficiency, compared to the other three approaches. In the following section the statistical significance of the results presented in this subsection is considered.

5.2 Statistical Significance Testing of the Results

This section presents the results obtained from conducting statistical significance testing. Given any competition, at some level of granularity, one of the competitors will always come first, the question is whether this outcome is statistically significant or not? There are a variety of mechanisms that can be used to establish statistical significance. The Friedman rank statistic [10] is well suited for comparing the significance of the results obtained with respect to the proposed VOIBIC approaches and was thus adopted with respect to the work presented in this paper.

The Friedman rank test, \(F_R\), is based on the total ranked performances of a given classification techniques applied to each dataset, and is calculated as follows:

$$\begin{aligned} F_R = \frac{12}{rc(c+1)} \bigg [\sum _{j=1}^c R^2_j - 3r(c+1) \bigg ] \end{aligned}$$
(1)

where r is the number of datasets used in the study, c is the number of classifiers used in the study and \(R^2_j\) is the square of the Total of the Ranks (TR) for classifier \(j, (j = 1, 2, 3, \ldots , c)\).

For the purpose of the evaluation the classification accuracies were used. Table 5 reports the classification accuracies for all four techniques when applied to the two datasets augmented with age and gender. In the table, the techniques achieving the highest classification accuracies with respect to each dataset. The overall highest ranked technique is indicated in bold font. The numbers in the parentheses indicate the rank of each technique. The Friedman test statistic and corresponding p-value are also shown.

If the value of \(F_R\) is larger than 6.00, the critical value for the Friedman rank test [10], the null hypothesis that there is no difference among proposed techniques can be rejected. With respect to the work presented in this paper, the calculated \(F_R\) from Eq. 1 is “13.5” (which is larger than 6.00). Thus the null hypothesis, at \(\alpha = 0.05\), is rejected. It can thus be concluded that there are significant statistical differences in classification accuracies among the proposed VOIBIC approaches.

Table 5. The best classification accuracy results, total rank and average rank for the proposed techniques

Given that the null hypothesis can be rejected a post hoc Nemenyi test [6] can be applied to highlight significant differences among the individual VOIBIC approaches. The Nemenyi post hoc test states that the performances of two or more classifiers are significantly different if their Average Ranks (AR) differ by at least a Critical Difference (CD), given by:

$$\begin{aligned} CD = q_{\alpha ,\infty ,c} \sqrt{\frac{c(c+1)}{12N}} \end{aligned}$$
(2)

Note that in Eq. 2, the value \(q_{\alpha ,\infty ,c}\) is based on the Studentised range statistic [6].

Figure 6 shows the critical difference diagram for the data presented in Table 5. Note that this is a modified version of the Demsar 2006 significant diagram [19]. This figure shows the classification approaches listed in ascending order of ranked performance on the y-axis, and the image classification approaches’ AR across two datasets displayed on the x-axis. The diagram displays the ranked performances of the classification techniques, along with the CD tail, to highlight any techniques which are significantly different to the best performing techniques. The CD value for the figure was calculated as per Eq. 2 and equals to “2.14”. The critical difference diagram clearly shows that the TB approach is the best performing classification approach with an AR value of 1.0; however, this result is not significant with respect to SB and DB, while it is significant with respect to SMB.

Fig. 6.
figure 6

Critical difference diagram for the proposed image classification approaches

5.3 Comparison with Previous Work

There have been a number of previous studies where data mining has been applied to MRI brain scan data. There are no similar studies using the nature of the lateral ventricles and the same data sets as used with respect to the work presented in this paper, however the work of Elsayed et al. [7, 9] and that of Long [21] is of note. The significance of the work presented in [7] and [9] is that this also used the Musician and Epilepsy MRI scan datasets but in the context of 2-D and with respect to corpus callosum, another easily identifiable object found in MRI brain scan data. The significance of the work presented in [21] is that it was directed at the 3-D analysis of the ventricles using a tree structure similar to that used with respect to the third technique presented in this paper, but in the context of Alzheimer’s disease detection and level of education. Thus some limited comparisons can be made with respect to the work of Elsayed et al. and Long. In [7, 9] the reported results were better than those reported in this paper; however, as already noted, the work of Elsayed et al. was directed at a 2-D representation of the corpus callosum. It might very well be the case that the corpus callosum is a better indicator of epilepsy and musical ability, hence the better results. Long and Holder [21] also reported (slightly) better results than those reported with respect to the work presented in this paper, but as noted above, the work was directed at Alzheimer’s disease detection and level of education, not epilepsy and musical ability. Long also considered not only the two lateral ventricles, but also the “third” ventricle, which may also make a difference.

6 Conclusion

In this paper three alternative approaches to conducting VOIBIC have been proposed: (i) Statistical metrics based, (ii) Point series (Disc based and Spoke based) and (iii) Tree based. Two datasets, Epilepsy and Musician, were used for evaluation purpose. The reported results indicated that the Spoke and Tree based approaches were the most effective in the context of classification performance. The results also indicated that there was a trade-off between classification effectiveness and efficiency, the more sophisticated approaches, although providing for a greater accuracy, were also the least efficient. The most efficient was the Statistical metrics based approach which produced the worst classification performance, while the Tree based approach was the least efficient but tended to produced the best classification performance. For most classification applications we are interested in the quality of the classification, not its efficiency, however the Spoke based approach would provide a good compromise if efficiency was of concern. Note that although the Tree based approach tended to produce (slightly) better accuracy values than the Spoke based approach, the statistical significance analysis presented in the paper indicated that there was no statistically significant difference between the Spoke based Point series approach and the Tree based approach in terms of classification accuracy. The evaluation reported in this paper also indicated that by augmenting the proposed representations with meta data improved classification effectiveness could be obtained, regardless of the representation classification paradigm or dataset used.

The research described in this paper has indicated a number of potential research directions for the future work. One potentially fruitful avenue would be to investigate alternative 3-D representation techniques such as meshes, instead of the standard feature vector format used in this paper. It would also be of interest to investigate further alternative classification models to which the proposed representation techniques can be applied.