1 Introduction

Malaria is a serious infectious disease caused by genus plasmodium, a blood parasite injected by female Anopheles mosquito into the human body. According to the World Health Organization’s annual malaria report 2013 [1], malaria takes the life of a child every 45 min. The plasmodium attacks the RBCs, which are blood components. Parasitemia, the quantitative measurement of the parasites in blood, is used to check the severity of malaria [2]. For this purpose, visual quantification through light microscopy is still the most prevalent and commonly practiced method because of its availability and economical methods of testing [3, 4]. The microscopy examination of malaria involves two types of slides, i.e., thick and thin blood smears. The thick blood smear tests are mainly used in malaria to test for the presence or absence of plasmodium in the blood. The thin blood smear tests are used for detailed examination of malaria such as quantification of parasitemia, specie identification and life cycle classification. According to the recommendations of WHO in [3] and revised version in 2004 [5], the thin blood smear must be examined under 70–100 windows while diagnosing via the microscopy of malaria; the number of infected RBCs will be counted among 100 RBCs in each window. Physicians frequently ask for the thin blood smear test in severe stages of malaria under microscope through visual quantification, which has proven to be too laborious, time-consuming, and the results are often erroneous due to the massive number of on-going examinations [4].

The rest of this paper is arranged as follows. Section 2 discusses the research background and current challenges, Sect. 3 presents the proposed methodology framework, Sect. 4 reports experimental results analysis, discussion and finally Sect. 5 concludes the paper.

2 Research background

This section summarizes the categories of the tools and techniques presented in the literature for the task of quantification of malaria parasitemia and its grading. For detailed study of the existing tools and techniques, interested readers are referred to the survey reported in [2].

A noticeable number of studies addressed the automatic malaria parasitemia quantification. Majority of them tried to resolve problems such as image luminance, low contrast, poor illumination and out of focus images in the preprocessing step [3740]. However, most of these problems were resolved due to the advent of high-quality imaging tools. The well-known techniques employed by the majority of the studies as preprocessing steps are: Histogram equalization (HE) [1417], brightness preserving dynamic HE (BPDHE) [18], smallest univalue segment assimilating nucleus (SUSAN) [19], smoothing the image through Median filter and edge preservation through Laplacian [11, 20] and in the same way, but for edge preservation, the authors of [2123] used unsharp masking. The underlying study considered image smoothing through median filter of kernel size [3 × 3], high kernel size will remove the parasites particularly in their initial stages and for edges preservation of red blood cells and parasites an unsharp masking is used. The selection of these two methods has been made on the basis of their positive results, experimented on 74 images of the standard dataset obtained from [24].

Further, according to the literature survey, we can broadly divide the adapted methodologies previously for automatic malaria diagnosis or parasitemia estimation into two deductive and inductive approaches [41, 42]. Deductive approach is a top-down strategy starting with the foreground and background separation, followed by red blood cell segmentation. Finally, the parasites are studied. In contrast, inductive approach is bottom-up or the deductive approach in reverse [4345]. Both of these approaches suffered from RBCs’ morphology dependent factors. However, inductive approach is better because it has more freedom to select morphology independent factors. The color, intensities, shape, size, area, radius and all other morphology-related factors of RBCs are highly variable factors in different patients. In the same way, we cannot depend on the morphology of parasites except for color. Moreover, in the literature, we also discovered another serious problem of occluded RBCs. The occluded RBCs problem was addressed by few studies on the same morphological dependent grounds, while the majority ignore the problem altogether.

The deductive approach adopted by [19] and [7] is seriously affected in the presence of dense occluded RBCs. The study in [19] is also dependent on area granulometry for RBC size (only constant when RBC is healthy) estimation, and there is no specification for occlusions of RBCs. The authors of [25, 26] trained SVM and PCA for classification, while for features extraction they also relied on morphology dependent factors, i.e., area and radius. In addition to the study of [25] was dependent on bimodal histogram. Further, both mentioned studies have no clear approach on how to address the occlusions of RBCs. The studies [6], [27] and [8] are dependent on the circularity of RBCs (detection through circular hough transform). The circularity of RBCs is very sensitive and can be disturbed due to exertion of even slight pressure on the slide during preparation. For features extraction, the consideration of fixed area, radius and edges is the cases suited to normal RBCs but due to malaria and other diseases theses features alter frequently [13]. However, these are adopted by majority of the research studies such as [7, 10, 11]. Moreover, authors in [7] also counted the number of infected RBCs based on the number of parasites which is not acceptable in medical cases, as authors in [13] stated that the infected RBC will be counted one, regardless of the number of parasites in it. The occluded RBCs problem is addressed in the work mentioned in [10], through the method developed in [28], but in dense occluded RBCs the method will affect the accuracy. The segmentation of RBCs based on nucleic approach exposes the problem, as RBCs have no nuclei, and the studies considered the parasites as nuclei. The studies based on nucleic approach will be seriously affected when the RBCs become really nucleated, such as when the RBCs life span is near the end, or the RBCs are highly matured. The nucleic approach is followed by several researches reported in [29, 30] and in [31] segmentation of RBCs. The segmentation based on chromatin dots offers no surety that on the basis of maximum and minimum intensity levels that they will be the same in all images, and in addition, these studies are highly susceptible to noise. On the same grounds the studies of [32, 33] addressed the segmentation of the parasites. In addition, single RBCs may have noisy chromatin dots, single dots are not considered by experts as parasites, false results will be reported and accuracy will be at risk [46].

2.1 Research challenges

Automatic tools and techniques introduced previously provide better solutions for the mentioned problems, but mostly deal with dependent factors of RBC’s morphology [2]. For example, the circularity of RBCs is not universal case and the majority of previous studies, reported in [68] considered RBCs as round or elliptical in shape and red in color. Size, area and any other fixed geometrical factors of RBCs are also risk factors, true only in normal situations [9] and have been considered in [7, 10, 11] as well. A slight deviation from the proposed models of these studies will abruptly reduce the accuracy and efficiency and may even generate no response in some cases. In addition, occluded RBCs are also a serious issue that have not been properly addressed in the past. The term occlusion is used because of clumping and overlapping RBCs [12]. Clump means to glue, and RBCs glued to each other in the form of long chains, an indication of iron deficiency (common in malaria) in the blood. Overlapped RBCs are formed due to inappropriate slide preparation. The occluded RBCs affect the accuracy in terms of malaria parasitemia [12]. Malaria parasitemia is the percentage ratio of infected RBCs to all RBCs present on the slide [13].

$${\text{\% MP}} = \frac{\text{iRs}}{\text{aRs}} \times 1 0 0$$
(1)

where iRs and aRs represent the number of infected RBCs and the number of all RBCs in a single window, respectively.

To cope with all these challenges, we proposed a methodology that improves the accuracy and efficiency based on independent factors regarding RBCs’ morphology.

3 Proposed methodology

The proposed methodology is divided into three sections: Features extraction, splitting occluded RBCs and malaria parasitemia grading. Pictorial description of the proposed methodology is shown in Fig. 1.

Fig. 1
figure 1

Framework of the proposed technique to determine the degree of malaria parasitemia in Giemsa-stained thin blood smears

3.1 Features extraction

Thin blood smear images suffer from various issues because feature selection criteria of the corresponding patients vary from slide to slide. The aim of this study is to discover an efficient procedure for the selection of most informative feature descriptors that could be found easily in all the slides of malaria parasitemia and then retaining these features descriptors as a solid foundation in which to determine the malaria parasitemia. Literature study and discussion with the corresponding field experts revealed that the selection of suitable feature descriptors based on color information plays a vital role in parasite segmentation stained with Giemsa, overcoming the issues related to segmentation accuracy. The criterion for the selection of suitable feature descriptors based on color is that the color, which is less distributed in the image, will be the eligible color of the feature. In this regard, the most suitable probabilistic approach is Gaussian mixture model with expectation maximization to determine the mean, weight and co-variance of the colors distributed in the image as presented in the equation:

$$\left\{ {W_{k} ,\mu_{k} ,CV_{k} } \right\},\forall k_{\text{copts}} \in {\text{Color}}$$
(2)

where {W k , μ k CV k } is the weight, mean and co-variance matrices of kth color component. The color components are determined with the Gaussian mixture model by assigning the pixel through the normal distribution probability as mentioned in Eq. (3)

$$P\left( {k |f_{x} } \right) = \frac{{W_{k} N(f_{x} |\mu_{k} ,CV_{k} )}}{{\sum_{k} W_{k} N(f_{x} |\mu_{k} ,CV_{k} )}}$$
(3)

where k is the color values in a group vector, W k are the weights given as (∑ K k=1 W K  = 1) and f x (W 1, …, W K f 1, …, f K ).

Next, the spatial variance is calculated from horizontal and vertical variances of the kth color components, which are presented in Eqs. (4) and (5), respectively.

$$V_{v}(k) = \frac{1}{{\left| Y \right|_{k} }}\sum\nolimits_{y} {P(k|f_{x} )|y_{v} - M_{v} (k)|^{2} }$$
(4)

where \(M_{v} (k) = \frac{1}{{\left| Y \right|_{k} }}\sum\nolimits_{y} {P(k|f_{x} )y_{v} }\)

$$V_{h} (k) = \frac{1}{{|X|_{k} }}\sum\nolimits_{x} {P(k|f_{x} )x_{h} -M_{h}(k)|^2}$$
(5)

where \(M_{v} (k) = \frac{1}{|Y|k}\sum\nolimits_{y} {P(k|f_{x} )y_{v} }\), where y v and x h are y-coordinate and x-coordinate of the pixel x, while |Y| k and |X| k are given as |Y| k  = ∑  y P(k|f x ) and |X| k  = ∑  x P(k|f x ), respectively.

The total variance of a color component k is given as:

$$V\left( k \right) = V_{v} \left( k \right) + V_{h} \left( k \right).$$
(6)

Further, we normalized V(k) to the range [0, 1] as,

$$V\left( k \right) = \frac{{\left( {V\left( k \right) - \min_{k} V\left( k \right)} \right)}}{{\left( {\max_{k} V\left( k \right) - \hbox{min} kV\left( k \right)} \right)}}.$$
(7)

Thus, the weighted sum of color spatial-distribution feature F s(x, f) is defined as:

$$F_{s} \left( {x,f} \right) \propto \sum P\left( {k|f_{x} } \right) \cdot \left( {1 - V\left( k \right)} \right).$$
(8)

The weighted feature color is also normalized to the range [0, 1].

Results with the proposed technique are presented for visual inspection and compared with the ground truth images marked by medical experts as shown in Fig. 2. The verification has been made by another panel of medical experts from Saidu Medical College Swat, KPK, Pakistan. The segmented features are slightly dilated for clear visibility.

Fig. 2
figure 2

Parasite segmentation with the proposed technique (slight dilation is applied for clear visibility). The first two columns (left) consist of original input image and parasites segmented images with the proposed technique, while the last column (right) contains ground truth data marked by medical experts

As the parasite in its initial stages is in the form of threads and can span an area of at least 50 pixels (empirically checked by experimenting on more than 45 images out of 74), the small areas are identified as noise and removed from the image. After segmentation of parasites both the original and the resulted image having parasites are converted to binary form for further processing.

3.2 Occluded red blood cells splitting

The precise grading of malaria parasitemia depends on the accurate quantification of RBCs (infected and non-infected). The accuracy of quantifying RBCs (infected and non-infected) mainly suffered with occlusions (clumps and overlaps of RBCs). The splitting of occlusions process needs to be designed in a way to save processing time on highly accurate grounds. Following procedure, we performed preprocessing steps to ensure the efficiency, i.e., checking for the presence of occluded RBCs and the separation of occluded RBCs from single RBCs in the image.

3.2.1 Checking for occluded RBCs

We double checked for the presence of occluded RBCs, i.e., median area check and median elongation check. First, we find the convex hulls of all the RBCs present under the current window through Eq. (9). We find the areas and elongation of the convex hulls through Eqs. (10)–(12), respectively. Using these two measures, we find a normalize variance among all the RBCs. Through experimentation, we found that if the variance is higher than 0.2 in case of area and higher than 0.5 in case of elongation then the occluded RBCs will exist and vice versa.

$$\sum\limits_{i = 1}^{|X|} {\alpha_{i} x_{i} |(\forall_{u} :\alpha_{i} { \ge }0){\& }} \sum\limits_{i = 1}^{|X|} {\alpha_{i} = 1}$$
(9)

where |X| = finite set of points, x i is point |X|, while α i is weight assigned to x i , the sum of the weights must be equal to 1 mean normalized.

$$Area_{RBC} = No.\,of\,Pels$$
(10)

where no. of pixels = pixels defining the convex hull object of the RBCs.

$${\text{Elongation}}_{RBC} = \frac{{L_{RBC} }}{{B_{RBC} }}$$
(11)

where L RBC is the major axis and B RBC is the minor axis of each convex hull (RBCs).

$$\sigma^{2} = \frac{{(X - \mu )^{2} }}{N}$$
(12)

where X represents the area in one case and elongation in the other and N is the number of terms in distribution.

3.2.2 Separation of single and occluded RBCs

Once it is decided that occluded RBCs exist and then the next step is to separate them from single RBCs. In the same way, in the separation, we again applied the double check mentioned in Eqs. (9)–(12). We consider median among many central tendency measures for the purpose that the median is the best central tendency measure when the data values are irregular and have both small and large values. We divide the area of every convex hull of RBC with the median area, and obtained results near or equal to 1 are considered as single RBCs and are included in mask of single RBC. On the other hand, the obtained results greater than 1 are considered as multi-RBCs and are included in multi-RBCs mask. Then, we pass the single RBCs mask into the pixel IDX_list of the input image and obtained the image for single RBCs. In the same way, we pass the multi-RBCs mask to obtain the image for occluded RBCs. Moreover, we performed the second check similarly, but instead of area, we used elongation here. The process of separation is depicted in Fig. 3.

Fig. 3
figure 3

Presents the separation process, a Presents input original image, b presents the binary image of the original, c presents the single RBCs and d presents the separated occluded RBCs

3.2.3 Splitting the occluded RBCs

After separation of single and occluded RBCs, the image of occluded RBCs is further considered for splitting into single cleaved RBCs. In splitting the occluded RBCs, we first determine the distance transformed of the occluded RBCs image and then we find the local maxima. After finding the local maxima, we find the centered maxima and consider these maxima as a center points for the circles. Then, we draw the circles with mid-point circle drawing algorithm. Each circle after drawing is mapped with the occluded and through slight erosion; we separate the single RBC from the occlusion. In this way, the process is continued up to the number of central maxima and we collect the resulting separated single RBCs in a separate image, which is the output image. The whole process is depicted in Fig. 4.

Fig. 4
figure 4

Overall process after separation of occluded RBCs from single RBCs, a presents the image of occluded RBCs, b distance transform of image presented in a, c presents local maxima of the occluded RBCs, d presents the centroids of the occluded RBCs for circles drawing, e presents the mapping of drawn circles on the initial points of the boundaries of the occluded RBCs and f presents the final mapped and cleaved RBCs in constituents number

The proposed technique for occluded RBCs splitting when applied on different images has shown good results. The basic concept is taken from watershed transform, but as watershed suffers from over and under segmentation in occlusions. More than four RBCs require too much processing time as compared to the proposed technique. More experimentation results are depicted in Fig. 5.

Fig. 5
figure 5

Occluded RBCs splitting through the proposed technique, a, c and e are original images, while b, d and e are the results obtained by drawing the circles on the centroid positions obtained through local maxima from distance transform and then mapping of the circles with the occlusion to obtain the actual cleaved number of RBCs

3.3 Malaria parasitemia grading

For malaria parasitemia grading, we have performed the following steps.

3.3.1 Imposition of segmented parasites

The parasites, which are segmented in the first step, are imposed on the single RBCs after splitting the occluded RBCs into single RBCs if the occlusions existed otherwise this step will be followed directly after segmentation of parasites. The imposition of parasites is needed for the purpose of identifying the infected RBCs and counts their number to estimate the percentage malaria parasitemia. The imposition of parasites process is just simply the addition of the two binary images, i.e., the one which has single RBCs, while the other having segmented parasites as they were in opposite signs to cancel the effects of noise and any other artifact. The visual results for inspection are presented in Fig. 6.

Fig. 6
figure 6

Parasite imposition process with proposed technique on images having occluded RBCs. a, d Present original images, b, e have cleaved occluded RBCs and imposition of parasites on them and c, f present the imposition of parasites on single RBCs

3.3.2 Identifying infected RBCs

As all the RBCs are separated and single, identifying the infected RBCs is needed for counting. We used one specific quality of infected RBCs, considering the outer boundaries of all the RBCs and encircling those with green that are infected on the basis that if the parent, or outer boundary has child boundary. RBCs having no child boundary are considered as non-infected RBCs. From medical literature an RBC, in case of malaria, is considered infected based on the presence of plasmodium in it. An infected RBC having many plasmodium parasites will count as one infected RBC. The visual results for this phase are shown in Figs. 7, 8.

Fig. 7
figure 7

Parasite imposition on images having no occluded RBCs. a. c Input images, while b, d are the resultant after imposition of parasites

Fig. 8
figure 8

Identification of infected RBCs. a, c Present original images, while b, d present infected RBCs highlighted with the red boundaries

3.3.3 Segmentation of infected RBCs

In segmentation of infected RBCs, we followed the same concept as we did in the identification. We took an empty binary image of the same size in which all RBCs (infected and non-infected) are present. Then, we highlight those areas with (1’s) which we identified in the image having all RBCs (infected and non-infected). Adding the image in which areas are highlighted to the image having both infected and non-infected RBCs resulted in an image having infected RBCs. The whole process is visualized in Fig. 9, while the results are presented in Fig. 10.

Fig. 9
figure 9

Process of infected RBCs segmentation. a Is original binary image, b The empty image with areas highlighted as the infected RBCs area, c present infected RBCs, resulted through proposed technique and finally d contains all non-infected RBCs

Fig. 10
figure 10

Segmentation process of the infected RBCs in slide images having occluded RBCs. a, f Represent the input images to this module while b presents the infected RBCs in the cleaved RBCs. d, g Present infected RBCs existed in the single RBCs, c, f are the non-infected RBCs in the cleaved RBCs. e, h Present the non-infected RBCs in the single RBCs

3.3.4 Counting infected and non-infected RBCs

Following segmentation, counting infected and non-infected RBCs is a simple task. For automatic counting, we used MATLAB built-in function ‘bwlabel’. The RBCs segmentation and counting results are shown in Figs. 11,12 and Table 1.

Fig. 11
figure 11

Segmentation process of the infected RBCs in slide images having all single RBCs. a, d Represent the original input images, b, e present the infected RBCs while c, f present the non-infected RBCs

Fig. 12
figure 12

Counting process results. a, d, g Original input images, b, e, h are images labeled as infected RBCs, while images c, f, i are labeled as non-infected RBCs

Table 1 Complete statistics after examination of single thin blood smear image

3.3.5 Estimation of percentage malaria parasitemia

Malaria parasitemia is the percentage ratio of infected RBCs to all RBCs present on each window in a slide. According to the recommendations of WHO [2, 3], the percentage of malaria parasitemia ratio must be estimated on the basis of observing 80-100 windows, each with 100 RBCs. Having the total count of infected and non-infected RBCs, the percentage of malaria parasitemia ratio can be estimated by using the formula described in Eq. (1).

3.3.6 Malaria parasitemia grading

According to the book at [34] and to the study [35, 36], the percentage of malaria parasitemia should be examined in 100–200 windows and can be graded to one of the following grades or levels listed in Table 2.

Table 2 Percentage of malaria parasitemia grading.

Finally, malaria parasitemia is graded to the mentioned levels in Table 2. Further, for testing purpose, we assumed 40000 RBCs per window and estimated the results based on this assumption with the result of each image because each image is a single window.

4 Results analysis and discussion

We performed a quantitative analysis to validate the effectiveness of the proposed framework.

4.1 Ground truth data preparation

The images obtained from DPDx [16] were printed as forms and distributed among three pathologists. Each form has a single image of thin blood smear and its manually estimated statistics and marking of the parasites in the image. These forms are verified by another panel of three medical experts. The data collection has been made in Department of Pathology, Saidu Medical College, Saidu Sharif Swat, KPK, Pakistan.

4.2 Inter-rater agreement

The collected data are first checked for inter-rater reliability agreement through a variation of Cohen’s Kappa (Two Raters) called Fleiss’ Kappa through Eq. (13).

$$\upkappa = \frac{{P^{{\prime }} - P_{e}^{{\prime }} }}{{1 - P_{e}^{{\prime }} }}$$
(13)

where \(P^{{\prime }} = \frac{1}{N}\sum\nolimits_{i = 1}^{N} {P_{i} }\) and \(P_{e}^{ '} = \sum\nolimits_{j = 1}^{N} {p_{j}^{2} }\), N = total number of subjects and i, j = 1,2,3,…,N, k represents subjects and categories, respectively. The Fleiss’ Kappa calculation for the collected data is \(\upkappa = 0.96\), which shows strongly reliable data.

4.3 Quantitative evaluation of the proposed occluded RBCs splitting technique

We first check the relationship of counting red blood cells (automatically after occlusions splitting and manually made by the experts) through Pearson’s correlation coefficient. The relationship between the two variables is shown in Fig. 13. For the same purpose, we also performed the confusion matrix based-precision, recall and F-measure with Eqs. (15), (16) and (16) through the confusion matrix in Table 3.

$${\text{Precision}} = \frac{{T_{p} }}{{T_{p} + F_{p} }}$$
(14)
$${\text{Recall}} = \frac{{T_{p} }}{{T_{p} + F_{n} }}$$
(15)
$${\text{F-measure}} = 2\times \frac{{{\text{Precision}} \times {\text{Recall}}}}{{{\text{Precision}} + {\text{Recall}}}}$$
(16)

where T p = correctly counted as red blood cells, T n = correctly counted as non-red blood cells, F p = in-correctly counted as red blood cells and F n = in-correctly counted as non-red blood cells

Fig. 13
figure 13

Graph present correlation between manually and automatically (by splitting occluded RBCs via distance transform and circles drawing) counted RBCs

Table 3 Confusion matrix

The achieved precision, recall and F-measure by counting the RBCs after splitting the occluded RBCs with the proposed technique are 0.973766, 0.989544 and 0.985951, respectively.

4.4 Statistical analysis of the overall framework

Moreover, in the same way, we performed the overall results of percentage malaria parasitemia estimation through Pearson’s correlation coefficient to find the relationship between manually and automatically estimated percentage malaria parasitemia depicted in Fig. 14. Confusion matrix based-sensitivity and specificity are determined with Eqs. (17) and (18). We noted the strength of the proposed techniques’ correct acceptance of infected RBCs as infected through sensitivity and correct rejection of non-infected as non-infected RBCs through specificity.

$${\text{Sensitivity}} = \frac{{T_{p} }}{{T_{p} + F_{n} }}.$$
(17)
$${\text{Specificity}} = \frac{{T_{n} }}{{T_{n} + F_{p} }}.$$
(18)

Using Eqs. (17) and (18) by following the mentioned rules the achieved sensitivity by the proposed framework is 0.98013, while the specificity achieved is 0.9711.

Fig. 14
figure 14

Correlation between automatic and manual malaria parasitemia estimation

4.5 Comparison of the proposed framework with other techniques

The proposed framework is compared with other techniques on the same grounds, i.e., on the same image dataset and number of images. Automatic malaria parasitemia estimation or grading is rich in the literature, but mostly the experimentations were made on in vitro slides of own datasets and results are compared in Table 4.

Table 4 Performance comparison of automatic malaria parasitemia

5 Conclusion

We developed a framework for grading malaria parasitemia in thin blood smear digital images on grounds, which are more nearer to universality and improved the efficiency. The color of the parasites is the only unique property which is the same in all thin blood smear digital images. However, the approach suffers from noise, but we addressed it by canceling its effect in binary completely. The accuracy increase is also due to proper and independent grounds addressing of the occluded RBCs splitting. The efficiency is increased due to accomplishing the sub-steps of the framework in suitable ways and mostly in binary to save the processing time. The overall framework is organized carefully because the study is dealing with the most important entity, i.e., health.