Keywords

1 Introduction

There are various types of image data available from different domain like remote sensing, robotics vision, satellite image, computer aided visualization etc. Among all the categories of image data, medical image data is of utmost important and very sensitive. Proper methodologies should be developed for medical images processing so that it will help the healthcare stakeholders to provide better treatments and practice improved procedure in diagnosis, treatment and prevention of diseases. Medical Images are the visual representation of the organs, bones and other structure within the body for analysis relating to the observation and treatment of patients. It helps to find internal structure of the body by applying different techniques like CT images, MRI, mammography, nuclear medicines to diagnose disease and provide proper treatment. Due to advancement of technology there has been a substantial increase in the numbers of medical images of different categories. Large amount of data is now available from multiple sources. This has become a challenge for researchers for effective data acquisition, filtering, analysis and deriving meaning information. Deriving useful and timely information from the large repository of medical image data is a challenge for the researchers. Another challenge in processing and mining of medical image is the variety of medical image as medical images are of different types. Some of the widely used medical images are X-ray imaging, Computed Tomography (CT), Ultra-Sound Imaging (US), Magnetic Resonance Imaging (MRI), Scintigraphy (Anger camera) etc. These images are used by health practitioners throughout the world for analysis, diagnosis and providing treatment to the patient. These varieties of images provides challenges to the researcher in deriving some method for medical image mining and processing. This study is conducted to find about the various image mining and processing techniques used in medical images for knowledge discovery, prediction and proper treatment of patients. In this paper, Sect. 2 describes the processing phases of medical image data including data acquisition and storage, pre-processing, feature extraction, classification, image indexing and image retrieval. It also describes the different techniques used in medical images processing such as rough set, neural network, association rule mining [2, 3], classification [3], the k-nearest neighbour (kNN) and classification algorithm [4]. Section 2.6 describes the possible direction for future work. In the last section the conclusion of the paper is provided.

2 Medical Image Processing

Medical imaging method is a process to picture different parts of a living being. Some of the images are about the tissue, tissue composition or characteristics of bones, blood or other bodily fluids. Sometimes special substances called traces are injected into the body to image the physiological characteristics. There are various types of medical images which are applicable to many areas of healthcare domain like evaluation of MRI images, Interpretation of CT/X-Ray image etc. The main issue in image processing is to process pixel information present in an image into meaning information. Image processing is the technique of deriving a specific feature or enhancing a particular image. In image processing the main aim is to convert an image in digital form or acquire a digital image and enhance the image to identify specific feature for better analysis and interpretation. Spotting pattern, image compression, image enhancement, image restoration etc. are some techniques used for manipulating the image in image processing. Results obtained after image processing is a modified image or a report based on image analysis. The objective of image processing is extracting information from digital images. Medical image processing is supported by many medical imaging techniques like MRI, mammography, ultrasound, CT etc. These techniques are used for visualizing the internal structure of body which provides the required information to health practitioner and facilities them to provide better patient health care. The availability of huge amount of image data available from multiple sources possesses a challenge for researchers for effective data acquisition, filtering, analysis and deriving meaning information. Image processing is a combination of many different field of studies like data mining, machine learning, computer graphics, database system and artificial intelligent. There are two significant techniques in processing of image data. One of them is to process of image data only and other is to process from a combination of images and related alphanumeric characters. Medical Image processing can be divided into the following categories [5, 6]. It is depicted by the following Fig. 1.

Fig. 1.
figure 1

Image processing stages.

2.1 Image Collection and Storage

There are several ways to store huge volume of image data. Searching for information and knowledge derivation in image database is different from traditional database because spatial information is required in image database whereas no such information is required in traditional database. Semantic values are depicted by the traditional database but in image database values are to be supported by some context. So, applying the methodology of data processing directly in images is not acceptable. This led to the rise of data management in image mining. The data management in image mining and processing is divided in two subdivisions [7]. One of category is to store images and the other one is to index and retrieve images. Storing image data cannot be done by following the rules of database because of difference in image database and traditional database as mentioned above. Many formats have been proposed for storing image data. Some of the format stores the metadata of the image in one file with the image but these have the disadvantage of using more memory and time. Another format proposes storing image metadata into separate relational database for faster and efficient image management. Object Oriented Databases are also proposed as a way for standardizing the image storage [7].

2.2 Image Pre-processing

This is an essential step required for enhancing the quality of image before proceeding to next phase [8]. Its importance ensures the quality of image data and subsequently the overall result. The main objective of analysis is to improving the image quality before applying any techniques for object detection or classification. Some of the techniques applied in image pre-processing are wavelet function, histogram equalization method, median filtering, high pass filter etc. Removal of noise or de-noising is one of the most important steps this state. It can be done by many techniques such as median filtering, wiener filtering and DWT (Discrete Wavelet Transform). Filtering of noise from images can also be done using different techniques such as Gaussian filter, high pass filter, adaptive median filter etc. After removal of noise the focus in pre-processing shifts to recognition or enhancement of object in an image. Histogram equalization techniques, grey scale modification, thresholding and Markov random model are used to enhance different types of images so that further techniques can be applied. Enhancement of biopsy images is done by Grey Scale Modification, thresholding and interpolation. Color Image enhancement is achieved by statistical mixture model like Dirichlet and Gaussian mixture, vector direction filter and markov random model. Contrast enhancement of image can be achieved by using hybrid filter containing simple mean and fuzzy switching median. To improve the segmentation median filter, histogram equalization and normalization are used. To compress images without any loss Haar wavelet functions can be applied. CT scan brain images are pre-processed by Shape priori technique. Alteration of images for presenting them in a format which is suitable for transformation techniques can be achieved by applying wavelet functions.

The different techniques of pre-processing used in medical images are mentioned in the Table 1 given below:

Table 1. Techniques used in pre-processing of medical images.

2.3 Feature Extraction

This stage refers to the methods used in analyzing objects or images to extract significant information that are representative of various classes of objects. Feature Extraction basically focuses on changing the identified objects into some sets of attributes. Features are generally mathematical representation of an image which describes the object in term of shape, texture or color. Various transformation and feature extractions techniques are applied to recognize pattern. Inferring knowledge can be done by pattern interpretation which can then be used in an application. 2D Gabor feature, association rule mining, sobel edge detector, grey tone spatial dependencies, fuzzy C mean algorithm, expectation maximization (EM) algorithm are some of the methods used for extraction of useful features from medical images. To find feature which are independent of each other decision tree, naive Bayes classifier, normalized cut algorithm, PCA (Principal Component Analysis), DWT (Discrete Wavelet Transformation), nonlinear anistropic diffusion and automatic thresholding are used. Entropy based discretization; Fuzzy clustering technique and fuzzy connectedness framework are used to group objects of same type. To improve visualization of MRI data and extract feature boundary expectation maximization (EM) algorithm is used. Fuzzy connectedness framework is used to find connectedness between image elements and grouping objects of same type. To incorporate contextual information and segmentation of single and multispectral MR image Markov random field are used. A combination of texture, color and edge extraction techniques is used to improve detection process. Identifying natural cluster of patient population is done by unsupervised neural networks. Support vector machine is used to derive nuclear features from segmented nuclei. Some texture feature extraction techniques are Haralick’s statistical measure, GLCM and 2D wavelet transform function. Extraction of feature set essential for classification is done by Texture Features based on gray-tone spatial dependencies, Gabor Feature, Sobel edge detector and Gabor transforms and Histogram Equalization Method.

Table 2 depicts the various techniques used for feature extraction with emphasis on medical images.

Table 2. Feature extraction techniques used in medical images

2.4 Image Classification

Classification means categorization of objects present in an image. Supervised classification groups the detected objects into some predefined categories. Some methods of supervised classification are support vector machine, rule based classification, decision tree, neural networks etc. Unsupervised classification or image clustering means grouping of objects based on some common factors. Hierarchical clustering, fuzzy clustering and nearest neighbor clustering are some types of unsupervised clustering. Supervised classification is also called as parametric or hard classification whereas unsupervised classification is also called as non-parametric or soft classification [20]. There are many different techniques used in image classification. Classification of digital mammography into two sections can be done by using association rule mining. This technique can also be used to divide CT images of brain into three groups namely benign, malign and normal. This is also used to detach corpus callosum region from the rest of brain image. Decision tree algorithms are used for mammography classification and pre-processing of CT brain images and creation of a model to segment brain images. Retrieving and classification of different medical images can be done by text based indexing and classification system, Texture correlogram, support vector machine, self-organizing map and wavelet transformation. Classification is also done by neural network or support vector machine, example is given of classification of digital image of chest X-ray into two groups: abnormal and normal. Classification of medical images based on energy, homogeneity, entropy and contrast is done by test based image classification. To reduce searching time of an image by grouping images into cluster fuzzy C-mean (FCM) may be used. Multilayer perception model are used to classify images on different criteria. Classification of breast mammography and detection of cancerous tissue is done by genetic algorithm, branch - bound algorithm and Grey Level Concurrence matrix, Decision tree c4.5 is used to classify image samples into healthy and unhealthy. K-nearest neighbour (KNN) combined with genetic algorithm improves classification in image database. Categorization of medical images can be done by self-organising map and wavelet transformation.

Table 3 shows the different techniques of image classification with emphasis of medical images in Table 3.

Table 3. Image classification techniques in medical images

2.5 Image Indexing and Retrieval

A suitable indexing scheme is required for efficient retrieval of information from any database. In relational database, the concepts of primary and foreign key are used. This concept is not suitable for image database in image mining because image database is vastly different from relational databases. Hence several indexing techniques are developed for mining in image database. Some common techniques used in image indexing are Non Euclidean similarity measure and signature file access method for multidimensional database. X tree is used to minimizing overlap in image indexing. X tree is also used to create index structure necessary in high dimensional database. Multidimensional indexing method can be created by TV index tree, i-minmax and R* tree. An indexing scheme used for dealing with queries in high-dimensional nearest neighbour is created by using SR tree. Dynamic index structure necessary for spatial searching is created by using R tree.

Table 4 depicts the various techniques used for Image indexing.

Table 4. Study of image indexing techniques in medical images

Image retrieval system is used to browse, search and retrieve images from a digital images repository. Image retrieval methods can be divided into three categories.

  1. (a)

    Query by Associative attribute is based on retrieval of images where attributes are stored as metadata.

  2. (b)

    Query by Description is based on context associated with images.

  3. (c)

    Query by Content is based on visual content such as texture, shape, color etc.

Some applications also apply a hybrid of above techniques for image. A hybrid image retrieval algorithm has been developed which include both two text model and visual features [73]. Deep learning architecture can also be used in image retrieval [74]. Fuzzy production rules are used for high level semantic image retrieval. Clustering techniques based on CBIR is used for retrieving of prominent feature of image. Retrieval of perceivable and textual information from images can be done by clustering and association rule mining. Image retrieval based on texture is done by color histogram. Weighted Euclidean distance is used to retrieve color feature. Pyramid structure wavelet functions are used for retrieval of image based on shape, color and texture. Retrieval of MRI image can be done by edge histogram and texture spectrum histogram. Unsupervised image segmentation technique based on cellular automata is used for image retrieval. Retrieving images having common pattern in medical database is done by image sequence similarity pattern (ISSP) and fuzzy retrieval techniques. Retrieving diagnostic cases similar to medical images is done by Support Vector machine.

Table 5 shows the different image retrieval techniques and their utilities

Table 5. Image retrieval techniques in medical images.

2.6 Future Directions and Discussion

The main aim of image mining and image processing is to remove loss of data and extract meaningful information as and when required by patients, healthcare providers and other stakeholder in medicine domain. The size and variety of medical image data is increasing in a very fast pace. Development in different energy source like magnetic, radio-frequency, optical and nuclear have added many forms or properties in the variety of medical images. This provides challenges to the researchers in medical images to develop new techniques for analysis, storage, mining and retrieval of correct and timely information for helping in welfare of patients. The field of medical image Processing is in infancy and many issues remain unsolved.

  • Retrieving information from fusion of unstructured heterogeneous data.

  • Indexing compressed images

  • Feature extraction from compressed image.

  • Classification of classes into detailed higher level class.

  • Improved performance in all the phases of mining process starting from image acquisition, storage, preprocessing, feature extraction, classification and knowledge derivation.

2.7 Conclusions

This paper presents the various image processing techniques that have been applied for medical image data. Fast growth in volume of medical images has made image mining techniques necessary for decision support and prediction in the area of healthcare. This paper has discussed about the various image mining technique used in medical images to provide significant outcome. According to this paper median filtering is the most commonly used technique used in image pre-processing followed by wavelet function and histogram equalization. In feature extraction decision tree model and its variation is mostly used. Other techniques used in feature extraction are Gabor feature, fuzzy techniques and GLCM (Gray-Level Co-Occurrence Matrix. For extracting common features in CT images from image transaction database a novel fuzzy association rule can be applied. For classification of images association rule mining, SVM (Support Vector Machine) fuzzy Techniques and neural network are widely used. Proper storage of medical image data can be achieved by Object Oriented database. Image indexing can be done by X tree, R tree, R+ tree and their variations. One of the main techniques used in image retrieval is CBIR (Content based Image Retrieval). This information should be produced in a way which can be used by medical practitioners and stakeholders in predicting trends in diseases. This will lead to better diagnosis and research which will improves patient care.