1 Introduction

Microarray technology has become one of the indispensable tools that many biologists use to monitor genome wide expression levels of genes in a given organism. The gene expression level indicates the synthesis of different messenger ribonucleic acid (mRNA) molecule in a cell. Using this gene expression level, it is possible to diagnose diseases, identify tumours, select the best treatment to resist illness and detect mutations [1]. Thus, it is important to develop computational techniques that provide automatic classification of genes for the diagnosis of particular diseases.

In the literature of DNA microarray image processing, different classes of spot have been defined with various objectives, such as to assess the performance of the segmentation algorithms or the simulation of microarray images based on statistical models. Its classification is traditionally performed by an expert in a visual way. The automatic classification of real spot images as part of the microarray image processing pipeline is not well developed due to its complexity, the high level of degradation of these images and the high values of intensities they present.

In this work, a new step for the DNA microarray image analysis pipeline is proposed. Indeed, the main contribution of this work is to propose a new methodology for microarrays DNA images processing. The novelty consists of the fact of using the whole cell to classify the type of spot. As it has been proved, this approach helps the segmentation and subsequent identification of the spots. Even more, it can be used to develop an adaptive segmentation algorithm using the class of spot as input information. The segmentation is then more accurate and the quantization step could be also enhanced and made lighter.

Another advantage of the proposed method is that it works with many descriptors instead of a few of them. The more relevant are selected by the well-known sequential forward selection (SFS) algorithm [2] that reduces them to a supposedly optimal subset. As a result, the developed classifier is adapted to the specific characteristics of the problem.

Classification is then performed by a neural ensemble with a tree structure, made up of seven multi-layer perceptron (MLP) neural networks. The configuration of each neural network is estimated automatically by an exhaustive evolutive searching process that optimizes the size of the network as a function of the classification error rate. The neural classifier is tested on sub-grids extracted from real microarray DNA images and is shown to achieve high accuracy rates. Considering the complexity of the problem, these results confirm the efficiency of this approach.

Another contribution of this work is the generation of a database that is created from the experiments considering the six classes of spot defined in [3] and a seventh class called empty spot or absent spot [4]. The database contains 725 samples for training and 336 for testing. It has been made available for all researches with free access.

The paper is organized as follows. The rest of this section establishes the framework of the research and the background. Section 2 presents the definition of the spot classes and details the generation of a new database of microarray images that will be used as benchmark. Section 3 explains the selection of features process. Section 4 describes the general architecture of the ensemble of classifiers and the methodology used to configure and train each one of its neural networks. Section 5 shows and discusses the results obtained with real images. Finally, Sect. 6 summarizes the conclusions and future works.

1.1 Research framework and background

Learning the control of gene expression is critical for our understanding of the relationship between genotype and phenotype. The need for reliable assessment of transcript abundance in biological samples has driven scientists to develop technologies such as DNA microarray and more recently RNA-Seq to meet this demand [5].

Microarray analysis has become a great source of information for biologists to understand the workings of DNA which is one of the most complex codes in nature. The DNA microarrays are a substrate, with a matrix shape, over which genetic material is deposited, generally following a regular pattern [6]. When the DNA of the samples interacts with the reference genes of the microarray, a hybridization process occurs. In the specialized literature, the specific region where the hybridization process of a particular gene occurs is called “spot”. After the hybridization process, two images of the whole microarray are generated. Then they are combined in a final image of RGB format. The final colour of a spot is a function of the ratio between the intensities of the two dyes (red and green) and, as a result, it indicates the relative abundance of the corresponding gene in the samples [1]. The digital processing of these images aims at obtaining measures of the quantity of the material hybridized of each sample.

Ideally all spots are round and have the same diameter, but in fact they vary in size and shape, and present artefacts which distort the image [7] and, even sometimes, the intensity of the spot is lower than the background [8]. Previous researches have tried to categorize this spot variability through the definition of classes of generic models, described by a set of parameters, with different objectives. In [9], the authors identify four classes of spots. In [10, 11], the problem of processing saturated spots is presented, which corresponds to spots that register a brightness higher than the detection capacity of the scanner.

An interesting work is presented in [3], where the spot is classified using the information of the whole cell. First, the image of the cell is transformed to polar coordinates, the radial/angular projections are obtained, the granulometric curves are calculated and, finally, statistics are extracted from those projections for categorizing the spots. The authors in [12] also propose the idea of clustering over a full image area in order to accomplish the segmentation of cDNA microarray images.

The task of spot segmentation falls within the category of classification, that is, assigning pixels into spot and non-spot classes. In the case of a segmentation based on classifiers, the class of spot predicts the morphology of it. Therefore, a pixel can have a higher or lower probability of belonging to a spot according to the correlation between the spatial position, its intensity level and the intensity of its neighbours. In [13], a classification-based segmentation approach for cDNA microarray images is proposed. Pixels are classified into spot, background and noise, a process that directly leads to the final segmentation. Other similar works are shown in [14,15,16]. The paper by Biju and Mythili [17] presents a fuzzy clustering algorithm for cDNA microarray image spots segmentation. In our case, we have used up to seven classes of spot.

Regarding the use of neural networks (NN), they are a well-established tool for classification problems. Some of the examples found in the literature where this computing technique has been applied to microarray classification are the following, among others. Wang et al. [8] propose a method of segmenting microarray images using a series of artificial neural networks, which are based on multi-layer perceptron (MLP) and Kohonen networks. In [18], authors apply extreme learning machine (ELM)-based microarray data classification. But the goal in this case is not just to predict the class labels but to make clear what lead to the results, i.e. the genes involving with a specific disease. Therefore, they are mainly focused on the sequence feature selection problem.

A paper by Nanni et al. [19] develops a spot quality control strategy using a random sub-space ensemble of neural networks and a feature selection algorithm. They combine the random sub-space ensemble of Levenberg–Marquardt neural net classifiers and the SVM trained using the features selected by the Pudil’s method. They aim at microarray spot quality classification, and thus, they work with only two categories: good and bad. In [20], the authors introduce a new approach for classifying DNA microarray data based on artificial neural networks and a dimensional reduction technique, the artificial bee colony (ABC) algorithm. They use this evolutive algorithm as an optimization technique for selecting the set of genes, from a DNA microarray, that best described a particular disease. After that, this information is used to train three types of ANN (multi-layer perceptron (MLP), radial basis function (RBF) and support vector machine (SVM)) for classifying the DNA microarrays associated with this disease. This is quite different from our work where we first calculate as many features as possible to then apply the SFS algorithm to reduce the number of them.

To summarize, microarray image segmentation is an important and still challenging problem. Although many microarray image segmentation (clustering) methods have been proposed in the literature, there has been little progress on developing efficient algorithms to segment a microarray image and it is still an open problem [21].

2 Materials

This work uses the definition of classes presented in [3]. It represents the majority of the cases observed in the databases of microarray images. It also includes the “absent spot” class, which consists, in general terms, in the cells that do not contain any spot, and whose intensity corresponds to the microarray background [4]. Examples of these classes are shown in Fig. 1.

Fig. 1
figure 1

Spot classes

The formal definition of these classes begins with the following function [3]:

$$f:E \to T = \left\{ {t_{\hbox{min} } ,t_{\hbox{min} } + 1, \ldots ,\,t_{\hbox{max} } } \right\}$$
(1)

where f corresponds to the grey intensity function of the whole microarray image, E is a discrete space (\(E \subset {\rm Z}^{2}\)) and T is a sorted set of discrete grey values. For an image of 16 bits, tmin = 0 and tmax = 216–1 = 65,535. The function f(x) is the value of intensity of the image at the point x = (x, y).

\(Z_{i} \subset E\) is the cell that contains the spot i, defined as the area whose pixels are closest to this spot centre than any other. Based on this, fi is defined as:

$$f_{i} :Z_{i} \to T$$
(2)

which corresponds to the intensity function of the pixel x, where fi(x) = f(x). That is, fi is a restrictive form of f(x) for the region defined by Zi.

The generic model of intensity distribution for any spot i is given by the equation:

$$f_{i} \left( x \right) = a_{i} s_{i} \left( {x - x_{i}^{c} } \right) + n_{i} \left( x \right)$$
(3)

where si(y) corresponds to the morphological form of the distribution of the spot I considering a cylindrical model, in which ai represents the height of the cylinder associated with the spot i, x ic corresponds to the coordinates of the spot centre and ni(x) is the function that describes the noise presented in the image. The function that represents the noise has two components that identify two different sources of noise:

$$n_{i} \left( x \right) = n^{g} \left( x \right) + n_{i}^{l} \left( x \right)$$
(4)

where ng(x) represents the background signal at x, described by a Gaussian function, and n l i (x) represents the noise signal of the local background associated with local aspects such as an inhomogeneous lighting and the presence of artefacts.

The morphological function si is built as follows:

$$s_{i} \left( y \right) = r_{i} \left( \theta \right)t_{i} \left( y \right)$$
(5)

where ri(θ) corresponds to a function in polar coordinates that represents the contour of the spot. This defines a closed border:

$$s_{i} \left( y \right) = \left\{ {\begin{array}{*{20}l} {t_{i} \left( y \right):\left\| {x - x_{i}^{c} } \right\| \le r_{i} \left( \theta \right)} \hfill \\ {0:\left\| {x - x_{i}^{c} } \right\| > r_{i} \left( \theta \right)} \hfill \\ \end{array} } \right.$$
(6)

in which ti(y) is a spatial function of the spot intensities (texture).

Finally, according to the particular distribution of the functions ri(θ) and ti(y), seven main classes (topologies) of spots are defined (Fig. 1):

  • Regular spot This type of spot has a circular shape and a homogeneous distribution of intensities. Both the function of the radius, ri(θ), and the global variation of intensities, aiti(y), are modelled by normal distributions. For the whole microarray, it is considered that the average value of the radius varies uniformly within a small interval. The range of values of the coefficient ai is also represented by a uniform distribution in the range [tmin, tmax].

  • Cracking spot These spots have a cracked appearance, presenting dark regions or lines on its surface. The function of the radius ri(θ) has the same normal distribution as a regular spot. The distribution of intensities is expressed as \(t_{i} \left( y \right) = \tilde{t}_{i} \left( y \right) - \chi_{i} \left( y \right)\), where \(\tilde{t}_{i} \left( y \right)\) follows the same model of a regular spot and χi(y) corresponds to a cracked function whose value is greater than zero if y belongs to the cracked region. The distribution of χi(y), the morphology of the lines (number, length, etc.) and the spatial position are difficult to model, but typically the thickness of the lines is lower than the spot radius.

  • Saturated spot These spots represent a uniform level of intensity equals to the maximum values allowed, being ai = tmax. The texture function does not present variations (ti(y) = 1) and the contour function of the spot ri(θ) presents the same normal distribution as a regular spot.

  • Doughnut spot These spots have a circular hole in their centre. The distribution of the intensities is a combination of two normal functions: one for the central region, t low i (y), with a mean value of 0, and another for the peripheral region, t high i (y), with a mean value of 1. The contour is defined by two functions that have normal distribution similar to the regular spot: one for the contour of the central region, r in i (θ) and another for the peripheral region, r out i (θ).

  • Egg spot These spots have the reverse situation than the doughnut spot. The function that represents the intensities of the central region, t high i (y), has an average intensity higher than the function that represents the intensities of the peripheral region, t low i (y).

  • Fragmented spot These spots present degenerated or irregular borders, with a significant standard deviation δr in relation to the mean. This type of spot presents, in addition, a smaller area than a typical spot. The function of intensities ti(y) is modelled as a normal distribution.

  • Empty spot The cell does not have any spot, so that the intensity function fi(x) corresponds to the function of intensities of the microarray background. Following the Angulo model, in this case ri(θ) should be equal to 0 for all θ.

2.1 Creation of a database of microarray images

The size and structure of the database can be crucial in order to get good classification results [22]. In this work, the microarray images database consists in a set of images of cells in a greyscale extracted from the original microarray images. They are saved with 16-bit tiff format. Therefore, the image resolution is 216 and the average size of cells is 21 × 21 pixels.

The sources of the DNA microarray images are two databases widely known and with free access: the Princeton University Microarray Database (PUMAdb)Footnote 1 and the Stanford Microarray Database.Footnote 2 The experiments from which the images were extracted are: Mus musculus (id experiment: 58012, 57133, 57129), Acyrthosiphon pisum (id experiment: 101767, 101769, 102673, 102675, 102380), Mycobacterium tuberculosis H37rv (id experiment: 83716), Francisella tularensis (id experiment: 59225), Chlamydomonas reinhardtii (id experiment: 45603) and Arabidopsis thaliana (id experiment: 16673).

The two databases of cell images generated have 725 microarray images for training and 336 images for testing. That is, a total number of 1061 images that cover the whole spectrum of spot classes are now available for the scientific community interested in DNA microarray images processing.Footnote 3

The technique used for the gridding is based on the statistical analysis of the one-dimensional projection of the image. This type of algorithm obtains the sum of all intensities over a set of adjacent lines (rows or columns), each result called the projection vector. Then the local extremes (maximum intensities for the signals and minimum for the background) are detected inside the projection vector. These local extremes represent an approximation to the centre of the spots. From these estimations, horizontal and vertical lines are generated, whose intersections indicate the positions where the spots are located in the microarray. The specific implementation used in this paper is based on [23].

When the training database was created, one of our objectives was to balance the distribution of the classes. The ground truth for the database was created by an expert who classified the images comparing them with the classes defined by Angulo. The final percentage of images of each class in the database is the following: regular 24%, cracking 18%, saturated 2%, doughnut 16%, egg 17%, fragmented 15% and empty 8%. The low proportion of saturated spots is due to the relative scarcity of this type of spot in the microarray images. However, as this type of spot is the easiest one to be classified, this fact does not affect the performance of the classifier. The total number of cells images chosen for the training database was 725. This dataset has been proved to be sufficient for the study.

Another goal was the generation of a free access repository of images for the research community3. The creation of a database is a laborious and tedious task; therefore, the availability of this database will allow the researchers to save a lot of time for their research, as well as to boost this research line and to facilitate the uniformity of criteria.

3 Feature selection process

Feature selection is a major problem in microarray spot quality classification methods [19]. The process of feature selection involves extracting a set of descriptors from the cell images, based on their intensities, and then selects those which optimize the separability of the classes. In our case, this process is repeated for each class independently. However, because of the nature of the problem, a pre-processing of the images is usually required before the extraction of the descriptors [24]. In this work, the pre-processing has been carried out as follows.

3.1 Pre-processing of cell images

In our work, the pre-processing consists of scaling the relative intensity of certain classes of spots. Due to the wide range of the microarray images intensities, a great number of spots remain invisible when the images are visualized. In order to visualize all the spots, each cell must be transformed to a greyscale (0–256) at local level (it means a transformation from 216 to 28 bits). The algorithm is applied on each colour channel separately once they have been converted to greyscale with a bit depth of 16 bits.

Only then the spot morphology is revealed, which is a critical point in order to assign the corresponding class during the creation of the database. Therefore, to keep the consistency between the database generation and the automatic classification process, it was decided, for certain classes of spots, to carry out a transformation to a greyscale, previously to the process of feature extraction. This applies to regular, cracking, doughnut, egg and fragmented spot classes. The other classes of spot are better categorized in the original space of intensities, with values between 0 and 65,535, and therefore, no transformation is required to extract the features. This applies to saturated and empty spot classes.

3.2 Set of features

The problem of optimal feature selection is still open, each method making a specific approximation to solve it [25]. In our case, we have worked with a wide and general framework. A total number of 363 features of intensities were computed for each spot image of the database. These features are grouped into the following categories [26]:

  • Basic features Simple intensity information related to the mean intensity in the region; standard deviation, kurtosis and skewness of the intensity in the region; in the image, mean first derivative in the boundary of the region (gradient) and second derivative (Laplacian) in the region. Additionally, five contrast measurements can be extracted in order to analyse the difference of intensity between object and background. There are 11 basic intensity features that have been taken into account.

  • Statistical textures Texture information extracted from the distribution of the intensity values based on the Haralick approach [27]. They are computed using co-occurrence matrices that represent second-order texture information (the joint probability distribution of intensity pairs of neighbouring pixels in the image), where mean and range—for five different pixel distances in eight directions—of the following variables were measured: (1) angular second moment, (2) contrast, (3) correlation, (4) sum of squares, (5) inverse difference moment, (6) sum average, (7) sum entropy, (8) sum variance, (9) entropy, (10) difference variance, (11) difference entropy, (12, 13) information measures of correlation and (14) maximal correlation coefficient. A total of 2 × 14 × 5 = 140 statistical features have been considered.

  • Local binary patterns Texture information extracted from occurrence histogram of local binary patterns (LBP) computed from the relationship between each pixel intensity value with its eight neighbours. The features are the frequencies of each one of the histogram bins. LBP are very robust in terms of greyscale and rotation variations [28]. Other LBP features such as semantic LBP can be used in order to bring together similar bins. We use 59 uniform LBP features and 31 semantic LBP features, giving a total number of 90 features for this category.

  • Filter banks Texture information extracted from image transformations such as discrete Fourier transform—magnitude and phase—discrete cosine transform (DCT) [29] and Gabor features based on 2D Gabor functions, i.e. Gaussian-shaped bandpass filters, with dyadic treatment of the radial spatial frequency range and multiple orientations. They represent an appropriate choice for tasks requiring simultaneous measurement in both space and frequency domains (usually eight scales and eight orientations). Additionally, the maximum, the minimum and the difference between both are computed. We use 16 DCT features, 16 Fourier features and 8 × 8 + 3 Gabor features, i.e. 16 + 2 × 16 + 67 = 115 features were extracted using filter banks.

  • Invariant moments Information of shape and intensities based on the Hu moments [30], with a total of 7 features for this category.

3.3 Features selection

Data pre-processing and feature selection enhance the performance of the classifiers [18]. That is why, after computing the features previously described over each of the images of the database, on the whole cell, the more relevant subsets of features were selected for each class of spot using the well-known sequential forward selection (SFS) algorithm. This technique carries out a “bottom-up” search strategy that, starting from an empty feature subset and adding one feature at a time, achieves the best feature subset that can be obtained with the desired cardinality. It should be noted that due to the large number of initial characteristics (363), the application of an exhaustive search to find the set of optimal characteristics is not possible as it would lead to analyse 2^363 possible combinations.

In particular, the SFS adds to, or removes from, one feature at a time that most/less contributes to the correct classification. It is based on an error function that minimizes the amount of attributes while optimizing the classification. Finally the smallest possible set of features that optimizes the classification process is obtained [2].

Specifically, the classification criteria used in the SFS toolbox we have used are called SP100. With this method, the decision line is set so that the sensitivity is 100%, that is, you favour the class that interests you most, instead of placing the line decision in the middle of the overlapping region between classes as it is traditionally done.

The SFS is based on an error function that minimizes the number of attributes while optimizing the classification process. This error function maximizes the ratio: Sp = TN/(FP + TN) where TN = true negative and FP = false positive, that is, minimizes the number of false positives.

The feature selection process was applied to the training database (725 spot images). It is worth remarking that the testing was performed on a different dataset (336 images). Indeed, they are two different databases with different origin. The repository of 725 spots was generated from different microarray images in order to have enough samples of all the classes for the training of the networks. The repository of the test dataset (336 spots) is generated from the two real images shown in Figs. 6 and 8. Each real image has 168 spots.

4 Artificial neural networks ensemble

One of the goals of microarray data analysis is to cluster genes or samples with similar expression profiles together, to make meaningful biological inference about the set of genes or samples. Since traditional classifiers have not reached sufficient sensitivity and specificity, another possible way is combining the classifiers in ensembles. In this paper, we take advantage of neural networks, which have been proved efficient for microarray image processing [20], combining multi-layer perceptron (MLP) as a multi-class classifier.

The MLP has been selected for several reasons. The main reason is that a neural network is equivalent to a universal function approximator [31], with the property of being able to separate initially non-linearly separable data. In particular, a MLP with a single hidden layer allows to reduce the training error as much as desired by increasing the number of neurons in the hidden layer. In addition, the MLP training algorithm is well defined and is equivalent to a nonlinear optimization problem without constraints. Therefore, considering the neural network as an optimization problem, we can define cost functions that allow the automatic estimation of network parameters [32], with fast convergence searching algorithms [33]. Therefore, both the estimation of the configuration parameters and the training of the network can be carried out automatically.

4.1 Structure of the classifier

To address the problem of DNA microarray image classification, a hierarchical classifier is proposed. It is made up of 7 sub-classifiers, each one being specialized on detecting a specific class of spot. The tree-like structure of the classifier is justified by the fact that it facilitates the work of the sub-classifiers, having now to discriminate a more specific set of spot classes as it goes deep through the different levels of the classifier, increasing the percentage of hits. This structure is reinforced by the pre-processing applied to the images of some spot classes, as explained in the previous section.

The sequence of sub-classifiers was selected based on the knowledge obtained from experts and from some experiments. During some classification experiments, it was observed that certain kinds of spot are better discriminated on the 16 bits original image resolution (saturated and empty spot). This is because what most characterizes these two classes from the rest is its intensity level in the original scale of 216 bits, which, besides being the classes with the highest rate of success, suggests classification these two classes first. For other classes, the best results are obtained using the greyscale of 8 bits. This way the consistency with the visual classification performed by the human expert on the training set is maintained. In fact, the expert performs the same processing before the classification to visualize the morphology of the spots that would, otherwise, remain largely invisible.

Therefore, based on the analysis of the problem and the obtained results, the classifier has been structured in three levels. In the first level, a sub-classifier determines whether the spot belongs to the empty class. If this is true, the classification process ends; otherwise, the image is passed to the second level. At this level, the image is processed by a sub-classifier that determines whether the spot belongs to the saturated class. If this condition is met, the classification process ends; otherwise, the image goes to the third level. At this last level, the image is processed by five sub-classifiers in a parallel way. Each one of them determines the level of membership of the spot to a specific class. In a competitive decision framework, the sub-classifier which brings the highest score assigns the class to the spot. The five sub-classifiers of this level correspond to regular, cracking, doughnut, egg and fragmented classes. Figure 2 illustrates the architecture of the classifier.

Fig. 2
figure 2

Architecture of the neural classifier

Each sub-classifier is implemented by a multi-layer perceptron (MLP) neural network, with linear activation function in the output layer and sigmoid function in the hidden layer, as this architecture corresponds to a universal function approximator [31]. This structure has been widely applied to classification in different fields [32, 35]. Theoretically, it is possible to reduce the classification error as much as you want by increasing the number of neurons in the hidden layer.

Every neural network of the classifier has as many inputs as features have been selected by the SFS algorithm. In addition, the networks have only a single output that will be close to 1 if the input belongs to the corresponding class or close to 0 otherwise. If the level of membership for each class is very close to each other, the algorithm still selects the highest. The fact that a spot had similar membership value to different classes would mean that it presents mixed characteristics, and that could lead to a definition of new classes of spots.

4.2 Neural networks optimization algorithm

With the purpose of obtaining neural networks with good generalization ability, the training process is performed controlling the over fitting, using the smallest possible number of neurons in the hidden layer. In order to determine this number of neurons of each neural network that gives an accurate classification, and at the same time keeping the network as simple as possible, an iterative searching procedure is adopted. The details of this procedure are the following:

  • It works with a set of Ni spot samples by class, being i an integer number between 1 and 7 that represents the class.

  • The iterative process starts with an initial configuration of one neuron in the hidden layer and gradually increases the size of this layer by adding one neuron each iteration. A maximum number of neurons, M, is set as the limit for this process. In this paper, M was set to 25.

  • Each iteration generates a predefined number of networks, K, with the same number of neurons in the hidden layer, but with different weights values assigned randomly to each of them. In this paper, K was set to 1000.

  • Each one of these networks is trained independently. To train and test the network performance, the samples are randomly chosen. For the training, validation and testing sets, the 70, 15 and 15% of the total samples were selected, respectively.

  • The “repeated random sub-sampling validation” or random cross-validation strategy is used to end the training. Over the epochs, the classification error of the training set decreases gradually. The training stops when the classification error of the validation set starts to increase. This strategy also contributes to avoid the overfitting of the network.

  • In case the training does not stop by the previous criterion, a maximum of training epochs, E, is used as a limit. For this paper E was set to 1000.

  • After an iteration ends, from all of the K networks generated, the one with the lowest error rate is selected. This error rate is defined as the average of the quadratics errors of the network in the three sets (training, validation and testing), calculated after finishing its training process.

  • After the M iterations, there are M selected networks that represent the best one of each iteration. From all of these, the one with the lowest error rate is chosen as the final classifier.

4.3 Training

The maximum values of the parameters used for training the neural network have been selected empirically, by trial and error, using the previous experience of the authors and information found in the literature. They were deliberately enlarged to cover the largest possible number of cases.

Each net is independently trained as a binary classifier with the Bayesian regularization backpropagation algorithm [33]. If the training procedure is understood as an optimization process with nonlinear restrictions, then this algorithm has the following characteristics:

  • Cost function F = βED + αEW, where ED is the sum of the squared errors, EW is the sum of the squares of the network weights, and α and β are the parameters of the objective function. The parameters α and β are computed automatically according to a procedure described in [33].

  • The searching method corresponds to the Levenberg–Marquardt algorithm [34], which allows a fast classification error convergence.

5 Results and discussion

First of all, regarding the feature selection process, a total number of 363 intensity features in the greyscale were computed for each cell image of the training database (725). In particular, the Balu toolbox was used to compute the descriptors.Footnote 4 For the regular class, 57 features were selected; for the cracking class, 30 features; for the saturated class, one feature; for the doughnut class, 30 features; for the egg class, 16 features; for the fragmented class, 31 features and, finally, for the empty class, one feature. After applying the SFS algorithm, the description of the features selected for each class and the corresponding success rate (Sp value, last row) are presented in Table 1. The classification accuracy has been obtained at the end of the selection process for the best set of features selected by SFS algorithm. It is given by the value of the Sp function (between 0 and 1), as defined in Sect. 3.3. The best value corresponds to Sp = 1 (FP = 0) and the worst one occurs when there are many false positives.

Table 1 Description of the types of features selected for each class and the success rate given by the SFS algorithm

From these results (Table 1), it can be deduced that the Hu and Haralick features are not selected for any class, and the features that have been selected to classify the regular, cracking, doughnut, egg and fragmented classes are mainly the local binary patterns. In detail, the percentage of LBP-based characteristics selected for each of the classes is as follows: regular, 51%; cracking, 43%; doughnut, 47%; egg, 44% and fragmented, 32%.

One of the peculiarities of this type of descriptors that makes them especially robust is that they compare relative intensities between pixels, giving consistent results under different grey levels. This is especially useful for the above-mentioned spots, which are principally defined by their morphology, but at the same time they present a great variety of intensity level for the same class.

In the cases of the saturated and empty spots, they are at the opposite ends of the range of intensities, which explains why the use of a single feature is enough to differentiate them.

In particular, for the empty spot the selected feature is the standard deviation of the image intensities. For the saturated spot, the selected feature is the standard deviation of the x and y axis profiles regarding the centre of gravity of the image.

These results may mean that the discarded features do not provide significant information for the classification of the spots. Perhaps they are redundant, and on the contrary, the selected features captured the nature of the problem in a more effective way.

Once the relevant features have been considered, the iterative algorithm that finds the best network (minimizing the number of neurons in the hidden layer) for each spot class is applied. Figure 3 shows the evolution of the number of neurons for each neural classifier during this optimization process. Tests were performed covering a range between 1 and 25 neurons for the hidden layer, which corresponds to the number of bars in the figures. Each bar represents the error rate given by the best network selected for each configuration. Then, from these 25 networks selected, the one with the lowest error rate was chosen as the classifier for each spot class.

Fig. 3
figure 3

Evolution of the error rate of the best networks and the number of neurons of the hidden layer

Table 2 details the best number of neurons of the hidden layer of these classifiers. Figure 4 shows their configuration, with the number of neurons of the input, hidden and output layers. As it was expected, the results show a direct relation between the complexity of the class and the number of neurons in the hidden layer.

Table 2 Number of neurons of the hidden layer for each class
Fig. 4
figure 4

Representation of the neural classifier of each class of spot

5.1 Results of the classification process

Table 3 shows the performance of the classifier selected for each spot class in terms of percentage of hits and misses in the classification during the training. The hit rate corresponds to the addition of the true positives (TP) and the true negatives (TN) given by each network; the error rate corresponds to the addition of the false positives (FP) and the false negatives (FN). For the outputs, a threshold value of 0.5 was used. If the output of the network is greater or equal than the threshold, then the spot is assigned to that class; otherwise, if the output is smaller than the threshold, the spot belongs to the other class. These rates are obtained for the three datasets used (training, validation and testing) and correspond to the best selected networks.

Table 3 Results of each classifier, true positives (TP), false positives (FP), true negatives (TN) and false negatives (FN), for the training, validation and testing sets

Precision and recall values were calculated for the test set (Table 3). Results prove that for new data (that have not been used for training), the precision of the classifier is quite high, being 1 for some of the classes and greater than 0.9 in all the cases. The same good results are obtained for the recall indicator.

The whole classifier (the ensemble of the neural networks) has been applied to 725 images of the training set. The output was compared to the ground truth, and statistics of hits and errors were calculated. Table 4 summaries these results. Out of these 725 images of the training dataset, only three of them were erroneously classified, giving a global hit percentage of 99.59%.

Table 4 Summary of the performance of the classifier

Even more, for testing the effectiveness of the classifier, two sub-grids that had not been used for the training were selected. After the gridding process, individual cells were extracted and two different testing databases were generated. Each image of these databases was assigned to its corresponding class by a human expert. Each database contains 168 cell images, giving a total number of 336 images for this testing dataset.

The classifier was tested with each testing database independently, and its hits and errors were registered. Figures 5 and 6 illustrate the images associated with the first series of testing, while Figs. 7 and 8 show the images associated with the second testing database.

Fig. 5
figure 5

Sub-grid of the first series of testing (a, b), with the target classes (c) and the output of the neural classifier (d). Spot colour code: regular red square, doughnut yellow square, cracking green square, egg light blue square, saturated blue square, fragmented fuchsia square and empty white border square (colour figure online)

Fig. 6
figure 6

Hits (blue squares) and errors (red squares) in the first series of testing (colour figure online)

Fig. 7
figure 7

Sub-grid of the second series of testing, target classes (a) and results of the neural classifier (b). Spot colour code: regular red square, doughnut yellow square, cracking green square, egg light blue square, saturated blue square, fragmented fuchsia square and empty white border square (colour figure online)

Fig. 8
figure 8

Hits (blue squares) and errors (red squares) in the second series of testing (colour figure online)

In both testing databases, the classifier obtained a high hit rate. The results were 95.8 and 91.1% success for test sets 1 and 2, respectively.

In this first series of experiments, Fig. 5a shows the selected sub-grid. It is noteworthy that, to make it clearer, this image has been scaled from the original 16 bits to 8 bits (256 grey levels) at local level (sub-grid). This way the spots that otherwise will be invisible are now shown. Indeed, in Fig. 5c it is possible to see how in the original image the intensity of the pixels has been scaled to 8 bits at individual cell level, and therefore, all the spots are now visible.

Figure 5b shows how the gridding algorithm successfully generates the array containing the individual cells where the spots are confined. The gridding appears on the image of the sub-grid.

The class to which the spot belongs is represented by the colour of the cell border (Fig. 5c). The colour code is as follows: red for “regular” spot class, yellow for “doughnut”, green for “cracking”, light blue for “egg”, blue for “saturated”, fuchsia for “fragmented” and, finally, white border for “empty” spot class. These are the target classes to be identified by the classifier. It can be observed the prevalence of the “doughnut” spot class in this first series of experiments, followed by the “regular” and “empty” classes. Indeed, the distribution of these target classes in this first grid, in order of importance, is the following: doughnut 81.6%, regular 11.3% and empty 7.1%.

It can be also pointed out that the distribution of the minority classes is not random, but it tends to form clusters within the image. Specifically, it can be seen that the cells of the “empty” class are mostly concentrated in the last row of the grid, suggesting that these cells have been left so intentionally as part of the experiments. However, some of them are also presented in other rows, where other spots would be expected.

In the image, the empty cells show two different textures. Empty cells with granular texture only contain microarray background signal. The other type of empty cell looks mostly black due to the presence of noise. That is, noise intensity is greater than the background signal. Therefore, when applying the change of scale, the background is displayed with a uniform intensity, and only the noise signal is highlighted. Figure 5d shows how the classifier has successfully detected both types of empty cells. Most errors found correspond to “doughnut” spots wrongly classified as “regular” or “cracking”.

Results of the classification are shown in Fig. 6, where 95.2% of the spots were rightly classified for this test set (blue squares).

There are two doughnut spots misclassified (red squares) in the central part of the image. It could be due to the irregular and granular appearance, associated with spots of very low intensities. A third doughnut spot wrongly classified as regular in the row below it is better defined, but the contrast between its inner and external regions is weak. Three more doughnut spots, located at the top of the image, have in common the characteristic of having very thin borders with irregular intensity level. Besides, their central areas do not present a sharp contrast regarding the outer rings. Finally, two “regular” spots, one at the bottom of the image and the other near the centre, were misclassified as “doughnut”. In these cases, the spots have a granular centre (low intensity) and some higher intensity pixels at the border but without defining a crisp ring.

The same analysis has been done to the results of the classifier on test set 2. Again we have obtained the gridding image from the original one. Figure 7a, b shows the classes of spot of each cell and the output of the classifier. The distribution of the target classes in this grid, in order of importance, is the following: regular 90.4%, empty 4.8%, doughnut 3%, cracking 1.2% and fragmented 0.6%.

Unlike in the previous test, and as already mentioned, the majority class is now the “regular” one, followed by the “empty” and “doughnut” classes, with only few cases of “fragmented” and “cracking” spots. It can be seen how again the minority classes tend to appear in clusters and the cells of the “empty” class mainly in the last row, thus confirming the assumption that they have been placed that way during the experiments.

Figure 8 shows the final results in test set 2, where the hits are represented by blue squares and errors by red squares. Most of the errors correspond to “regular” spots classified as “cracking” ones. This result may be due to the granular aspect associated with low intensity spots; in addition some of them have noise. At the top of the image, there is a “fragmented” class spot whose fragmentation is really low that has been classified as a regular one. Also at the top part of Fig. 8, there is a “cracking” spot with the peculiarity that it is quite “regular”, crossed by a well-defined dark line, right in the middle, which leads the classifier to mistakenly think that it belongs to the “doughnut” class. In the second row from the bottom of Fig. 9, there is a “regular” spot classified as “doughnut”. This spot shows a very thin ring in its border. In the same row, there is a “regular” spot classified as “egg”. This spot shows a small positive gradient in its intensities, from its border towards its centre.

Fig. 9
figure 9

Classification performance of the test set with 1000 partitions

As a general conclusion, the difference performance of the classifier on both testing series can be explained by the prevalence of different classes of spot in each one of the sub-grids. In the sub-grid of the first series the more abundant spot class is the doughnut one (Fig. 5), in which the classifier showed a very high hit rate during the training process. Nevertheless, in the sub-grid of the second testing set (Fig. 7), the more abundant spot class is the regular one, in which the classifier showed a slightly lower performance during the training. However, Fig. 8 shows how only very few spots were wrongly classified.

5.2 Evaluation of the classification robustness

For generalization purposes in the classification stage, we present an analysis that shows the robustness of the network models. In particular, a random cross-validation analysis is carried out to show that the performance of the models is independent of the set of data used for the generation of the neural networks.

Once the optimum number of neurons in the hidden layer is found, 1000 network models are estimated considering different sets for training (70%), validation (15%) and test (15%), each time. The validation set is used to stop the network training, while the test set is used to measure the classification performance. The sets are selected by randomly sampling the training dataset.

Figure 9 shows the error percentage over the test set for the 7 networks (1 = regular, 2 = cracked, 3 = saturated, 4 = doughnuts, 5 = egg, 6 = fragmented and 7 = empty). It can be observed that in all the networks the average error and the standard deviation are very small (below 0.4), being the 3rd one the best. This means that the descriptors are appropriate for our purpose.

6 Conclusions and future works

This work provides a pipeline for the processing of DNA microarray images. A new computing method for classifying the spots into morphology-derived classes is proposed. The classification is performed without previous segmentation, after the gridding process. The high accuracy classification rates obtained when tested on sub-grids extracted from real microarray DNA images prove the efficiency of this novel approach. A main conclusion is that the use of this classifier can be used to improve the segmentation process of DNA microarray images.

One of the main contributions is that we perform the classification of the spots into morphology-derived classes in order to assist the segmentation procedure that is traditionally performed after the gridding process. A new approximation for the classification of spots is presented, applying the idea of using the information of the whole cell, without segmentation.

Besides, instead of computing a reduced number of descriptors and showing its discriminant value for the classification, we perform the calculation of a great number of descriptors that are then reduced to a presumably optimal subset using the sequential forward selection algorithm [2].

Another contribution is that, based on the expert knowledge of the DNA microarray images classification, an ensemble of neural networks has been designed. This supervised neural classifier has a tree-like structure made up of seven MLPs. Each branch of the tree corresponds to a neural network specialized in the detection of a specific class. Besides, the configuration of each network has been optimized using an iterative algorithm that minimizes the classification error. Every MLP has been independently configured and trained.

The performance of the competitive classifier is validated with real microarray DNA images, where the final sub-classifiers compete for spot allocation to one class or to another. The neural classifier predicts the spot class with a very high degree of reliability.

The pre-processing of the images, using different scales of grey intensities depending on the class of spot to be detected, as well as the extraction of multiple features from each individual cell that has been later reduced to a supposedly optimal subset by the sequential forward selection algorithm, has helped to improve the performance of the classifier.

Another useful contribution of this work is the generation of two databases of cell images, 725 microarray images for training and 336 images for testing. That is, a total number of 1061 images that covers the whole spectrum of spot classes are now available for the scientific community interested in DNA microarray images processing.Footnote 5

The very good results of this approach encourage further work. The classification errors manifest that the separation between classes is not always well defined, and there are spots that have characteristics of more than one class. This suggests considering the fuzzy approach, dealing with degree of belonging to different classes at the same time. Special attention must be paid to the effect of noise in the images, mainly for certain spot classes, such as the cracking one.

As the analysis of microarray experiment could lead to quantification of thousands of genes, another possible future research line is to consider how to improve and make lighter this quantification by developing adaptive segmentation algorithms [36].

Even if it is not one of the goals of this paper, as future works the performance of the proposed method could be evaluated against other techniques.