1 Introduction

The importance of detecting chromosome abnormalities in human genetic disease is widely recognized. It makes the chromosome analysis a specialized discipline with widespread applications both in research and clinical practice. A normal human cell has 22 pairs of chromosomes, autosomes of classes 1–22, and 2 sex chromosomes, in the form of XX or XY [3]. In the four stages of the cell cycle (prophase, metaphase, anaphase, and telophase), only in late prophase or metaphase, the chromosomal structures are visible under a light microscope after Giemsa staining [6]. One of the objectives of the chromosome analysis is the creation of karyotype, which is a layout of chromosome images organized by decreasing size in pairs. Figure 1 shows a sample of the metaphase image and its ordered karyotype from a publically available database [21, 22]. The manual process of karyotype, performed by the cytogeneticist in the genetic labs, involves the cutting of chromosome image from a photograph of a cell, taken from a microscope. The chromosomes are arranged into their appropriate places on the layout according to their visual classification by the cytogeneticist. The overall process is highly tedious and time-consuming [15].

Fig. 1
figure 1

a A metaphase cell spread, b a karyotype of the metaphase cell chromosome

Automated Karyotyping Systems (AKS) allow countless clinical advantages such as interactive and graphical environment, faster examination of the samples, allowing quality printing, being self-explanatory, better interpretation of the image, and archival of data for future analysis [19]. For past three decades, substantial research has been carried out to develop the AKS. However building a completely automated system with no human interference is still a challenging problem. One of the major hindrances in automation is the difficulty in segmentation and classification of the clusters of touching and overlapping chromosomes [10]. Chromosomes are nonrigid in nature, and their shape variability is a natural phenomenon. They can take any form by bending in different directions and angles, and thus forming numerous touching and overlapping clusters in a metaphase cell. The separation and classification of such clusters require human interference and expert knowledge of the cytogeneticist.

The segmentation step in the AKS fails to identify each chromosome as a single object and instead presents a number of clusters, thus requiring special efforts to separate touching and overlapping chromosomes [1]. Automatic separation of overlapping and touching chromosome is important for analysis of metaphase images [6]. Agam and Dinstein [1] and Lerner et al. [16] have used shape and banding evidence for resolving clusters, but both are restricted to “touching or slightly overlapping” configurations. Popescu et al. [23] proposed a method of analyzing the boundary and axis. Charters and Grahman [4] demonstrated combined use of trainable shape models and classification evidence on synthesized overlaps of X- and T-shape. Shunren et al. [25] proposed two intelligent chromosome incision algorithms based on the counter characteristics and the Fourier transform for resolving overlaps, whereas Grisan et al. [8] used a space variant threshold scheme to address the problem. However, in the latter scheme, the chromosomes involved in the overlap must bisect each other. Jahani et al. [10] presented an approach based on morphological operators for identification of any cluster of the overlapping and touching chromosomes. It fails when two chromosomes touch end to end. A computational geometry-based approach, proposed by Srisang et al. [26], has limited success due to the inability of the algorithm to find the center of overlapping areas.

In mid 1990s, multicolor fluorescence in situ hybridization (MFISH), a multispectral combinatorial labeling technique, was developed. It is used to stain each chromosome and is proved to be extremely useful in cytogenetics. Karvelis et al. [12], Choi et al. [5], and Schwartzkopf et al. [24] have used maximum likelihood-based methods. Though MFISH technology is a boon for the AKS, it has inherent limitations as reported by Lee et al. [14], producing erroneous interpretations. The huge cost involved in the hybridization process restricts its routine usage in genetics laboratories. A wide variety of MFISH databases is therefore not publically available for analysis and experimentation. Moreover, processing five images corresponding to each dye increase the computational complexity by five times as compared to gray-scale imaging.

There exist a few studies [16, 23] which are limited to testing on clusters of two overlapping chromosomes, whereas clusters of multiple overlapping chromosomes are more frequent in metaphase images. Though the same evidences could possibly be used for larger clusters, the complexity, the number of trainable shape models, the training process, the hypothesis of combining shape and banding evidences, analysis and the optimization strategies increase drastically and are not yet fully explored. Thus, resolving the touches and overlaps in a metaphase image is still an open issue and a major hindrance to the development of the AKS.

This paper proposes a novel approach to automatically extricate the overlapping chromosomes in a cluster of single and multiple overlaps from gray-scale images. Figure 2 presents the overview of the proposed approach. Figure 2a shows an overlapping chromosome cluster. A desired cut-point is the one which lies on the boundary of the chromosome cluster and connects the overlapped region with the non-overlapping segments of the chromosomes. Every overlap in a cluster has four cut-points as highlighted with dark circles in Fig. 2b. The lines perpendicular to the medial axis of the chromosome as seen in Fig. 2c depicts the presence of the bands (intensity variations) along the chromosomes. The overlapped region formed by joining the identified cut-points is adhered to the respective non-overlapping segments of the chromosomes using the banding information on the overlapped region. The overlapping chromosomes are thus disconnected at the cut-points lying on the boundary of the cluster to accomplish its extrication. Figure 2d details the block diagram of the proposed approach.

Fig. 2
figure 2

a A cluster of two overlapping chromosomes indicating the overlapped region and the non-overlapping segments of the chromosomes, b the desired cut-points connecting the overlapped region of the cluster with the non-overlapping segments, c the overlapped region recovered by joining the identified cut-points using the banding information in that region, and d block diagram of the proposed algorithm

The proposed algorithm applies heuristics to extricate the overlapping chromosomes and is envisioned in two parts as follows:

  • Identification of the number of overlaps and detection of the cut-points on the overlapped region: The number of overlaps and the respective cut-points are detected by computing Delaunay Triangulation (DT) and restricting it to the boundary pixels using Constrained Delaunay Triangulation (CDT) [2]. The computation leads to the formation of triangles on the cluster. The vertices of the triangles, with relatively larger areas, facilitate the identification of the cut-points.

  • Separation of the overlapping chromosomes: The chromosome segments and the overlapped region are disentangled using the detected cut-points. Respective segments of chromosomes are adhered using the banding information.

Most of the earlier reported algorithms are limited to the separation of single overlaps. The issue of resolving multiple overlaps has comparatively received less attention. The proposed algorithm is able to efficiently disentangle the clusters with single and multiple overlaps and thus overcomes the limitations of earlier methods.

2 Methods

The focus of this study is to propose a reliable method for the automated separation of the chromosomes in a cluster. It is assumed that the segmentation of the cluster was already carried out, e.g., by the algorithm reported in [17]. The various steps in the proposed algorithm include detection of pixels on the boundary of the cluster, finding the exact cut-points using DT, and utilizing them along with the banding information to separate the overlapping chromosomes. DT is used to automatically detect the cut-points on the boundary of the cluster.

DT for a set of points P in a plane is a triangulation DT(P) such that no point in P is inside the circumcircle of any triangle in DT(P) [2, 20]. The DT is not unique, but all solutions satisfy the Delaunay property. DT always results in a plane graph. No two edges in the embedding cross each other [2]. Moreover, the triangulation always connects points to their nearest neighbor [20, 27]. DT is computed on the boundary pixels making them the vertices of the triangles, which are formed along the entire region of the cluster. It is observed that the edges of a few triangles cut across the boundary of the cluster, which is undesirable. The aim of the proposed approach is to detect the cut-points, which are present on the boundary of the overlapped region. It is thus necessary to restrict the triangulation within the boundary of the cluster. This is achieved using CDT by the removal of vertex connections that are not fully enclosed within the prefecture of the chromosome cluster. The resulting DT is constrained by the boundary and enables to form triangulations within the non-convex polygon in this case is the chromosome cluster. Figure 3a shows the formation of the DT for one of the test samples in the database. The DT is further constrained within the boundary of the cluster using CDT. The triangles formed outside the boundary of the cluster are eliminated as depicted in Fig. 3b. Figure 3b also demonstrates the possibility of formation of a cavity or a hole in the clusters owing to the non-rigid nature of the chromosomes. A few samples from the database are depicted in Fig. 3c. The results of computation of CDT on some clusters in the data set generated for the experimentation of the proposed work are demonstrated in Fig. 3d. The traingles formed on the overlapped region and the cut-points [Please refer Fig. 3(e)] are highlighted in red color.

Fig. 3
figure 3

a DT of the pixels on the boundary of the overlapping chromosome cluster, b corresponding CDT of the cluster indicating the formation of a cavity, c various samples from the database, d computation of CDT on the various samples, and e the identified cut-points on the overlapped region using the proposed algorithm. The triangles formed on the overlapped region and the cut-points are highlighted in red color

The proposed heuristic approach is based on the observations made on the overlapping region of the cluster after the computation of the DT. These observations are made by studying the database prepared for this experimentation and are expected to be applicable mostly for other images too. The observations are as follows:

  • The DT of the pixels on the boundary of the cluster generates two triangles on every overlapped region.

  • The areas of the two triangles, formed on the overlapped region, with cut-points as the vertices are relatively larger than the areas of other triangles formed by non-cut-points.

  • The two triangles formed on an overlapped region either have a common edge or they are located very close to each other.

  • The vertices of the two triangles formed on the overlapped region are the required cut-points for the separation of chromosome cluster.

The properties of the DT justify these observations. The cut-points are actually the non-collinear boundary points on the overlapped region. They therefore become the vertices of the triangles formed at the overlap. The cut-points are also the nearest boundary pixels on the overlapped region. As the DT always connects a point with its nearest neighbor, the triangles formed by the cut-points do not enclose any other boundary pixel. Moreover, since the overlapped region of the cluster is the one with relatively large area than other regions on the chromosome segments, the triangles formed on this area have more area than other triangles. The triangles at overlapped regions shown in Fig. 3d conform and validate the observations stated above. In this figure, they are highlighted in red color. The proposed algorithm aims to detect these triangles of relatively larger area and finds their vertices to automatically locate the cut-points (refer to Fig. 3e). The identified cut-points further aid the separation of the chromosomes in the cluster.

2.1 Identification of the number of overlaps and detection of the cut-points on the overlapped region

A cluster of chromosomes may have multiple overlaps. To locate the appropriate cut-points on every overlapped region, it is essential to initially identify the total number of overlaps in the cluster. The first part of the algorithm computes the number of the overlaps in the cluster and further detects the exact cut-points on the overlapped region. It involves pre-processing of the image, detection of pixels on the boundary of the cluster, generation of the CDT, and then detecting the cut-points. Pre-processing involves conversion of image from RGB to gray level (L) using standard luminosity method. The weighted sum of R, G, and B components is computed using the following equation [9]:

$$L = 0.2989R + 0.5870G + 0.1140B.$$
(1)

The threshold value was computed using standard Otsu’s algorithm [18]. The boundary of the segmented cluster is identified and followed by smoothing operation in order to reduce the sensitivity of the local variation occurring in the boundary of the cluster. Smoothing is implemented using a standard Moore–Neighbor tracing algorithm modified by Jacob’s criteria [7].

2.1.1 Identification of number of overlaps

The DT of the pixels on the boundary is generated using the randomized incremental approach [2]. The constraints under which the DT has to be formed must be defined for the generation of CDT. The defined constraints depend on the nature of the overlap. A cluster is comprised of chromosomes, overlapping in various unpredictable styles, sometimes leading to the formation of a cavity as shown in Fig. 3b. In case of defining constraints for such clusters, both outer and inner profiles of the cluster are computed. Outer profile contains the pixel coordinates on the boundary of the overlapped chromosomes, and inner profile contains the pixel coordinates on the boundary of the cavity. The constraints for the computation of the CDT are defined using the outer and inner profile, whereas only the outer profile information suffices to define the constraints in the rest of the clusters, where a cavity is not formed.

Let B be the set of pixels on the boundary of the cluster.

$$B = \left\{ { \, b_{1} ,\;b_{2} ,\;b_{3} \ldots b_{x} } \right\}$$

Every cut-point belongs to this set. Let P be the set of possible cut-points.

$$P = \left\{ { \, p_{1} ,\;p_{2} ,\;p_{3} \ldots p_{y} } \right\}\quad P \subseteq B$$

The computation of the CDT on B leads to the formation of triangles restricted to the boundary of the cluster. Let T be the set of triangles generated after the computation of the CDT, where all the vertices of any triangle in T are the elements of B.

$$T = \left\{ { \, t_{1} ,\;t_{2} ,\;t_{3} \ldots t_{z} } \right\}$$

Let A(t i ) represent the area of the corresponding triangle t i in T, where i = 1, 2, 3…z. The pairs of the triangles in close proximity and with relatively larger areas are possible candidates, which lie on the overlapped region of the cluster. Thus, to compute the exact number of overlaps in a cluster, it is initially essential to identify the potential triangles in T which have the corresponding area above a certain threshold value. Figure 4a shows the CDT computed on clusters of overlapping chromosomes with 1, 2, and 3 overlaps. The plot of the triangles versus their respective areas is shown in Fig. 4b, where the triangles are arranged in descending order of their areas. Irrespective of the number of the overlaps a cluster has, a threshold value of 100 pixels is empirically computed to extract T s , a set of m triangles with areas containing more than 100 pixels. The number of triangles to be extracted is directly proportional to the number of overlaps a cluster has.

Fig. 4
figure 4

a The CDT of a clusters with 1, 2, and 3 overlapping chromosomes, b a plot of the triangles (T) generated after computation of CDT versus the area, (A) of the triangles. The value of m indicates the number of triangles with area above 100 pixels

As expected, Fig. 4b illustrates the increase in the value of m as the number of overlaps in the cluster increases from 1 to 3 (m signifies the number of triangles whose area is above 100). These triangles in T s have a greater likelihood of lying on the overlapped region of the cluster.

$$T_{s} = \left\{ { \, t_{1} ,\quad t_{2} \ldots t_{m} } \right\}\quad T_{s} \subseteq T\quad {\text{and}}\quad m < z$$
$$A\left( {t_{i} } \right) > 100\quad {\text{and}}\quad i = 1,\;2 \ldots m$$

Every overlap has a pair of triangles from T s lying on it. These triangles are either very closely placed or share a common edge. Thus, to identify the number of overlaps, it is necessary to pair the closely located triangles in T s . Let d i,j represents the Euclidean distance between the centroid c i and c j of the triangles T s . The distance d i,j is used to derive the pairs of nearest triangles in T s . A pair of triangles with relatively larger area characterizes an overlap. The two triangles formed in an overlapped region either have a common edge or they are located very close to each other. This may also be observed in Fig. 3. The minimum among all distances of pairs of centroids identifies a pair of triangle which is very closely located. It, however, does not always verify if the paired triangles form a quadrilateral as a result. So, an additional constraint on checking the proximity of these edges, based on calculating the vicinity of the edges, is also applied. Once a pair is found, it is declared as an overlapping candidate and excluded from the set of candidates. The process is repeated till the set is exhausted. The triangle which does not satisfy above two criteria is also excluded from the set. The number of pairs, satisfying above criteria, necessarily represents the exact number of overlaps in a cluster.

2.1.2 Detection of the cut-points on the overlapped region

Having identified the number of overlaps in a cluster, it is further required to find the exact cut-points on every overlap. The vertices of the corresponding pair of triangles, identified on every overlapped region, are used to detect the cut-points. The vertices of the closest triangles are the required cut-points. If the triangles share a common edge and thus have two common vertices, four cut-points are accordingly identified. In some cases, two triangles may not have a common edge but would instead be very closely located on the overlap. Figure 5a demonstrates such a cluster wherein six vertices of the paired triangles, t v and t w , are obtained. In such cases, the midpoint of the line segment joining the vertices that are lying on the proximal edges of two paired triangles represents a single vertex. Even though the separation of the overlapped region in the cluster can be accomplished using the six vertices, only four vertices are derived out of these six. This is because a quadrilateral formed by four vertices eases the task of finding the axis of the chromosome, as will be discussed in the next section. Figure 5b shows the formation of the quadrilateral, pqrs, where the pairs v 2w 3 and v 1w 2 are replaced by s and q, respectively. Vertices s and q are the midpoints of the line segments joining v 2w 3 and v 1w 2. Thus, four cut-points are derived from six vertices.

Fig. 5
figure 5

a Triangles, t v and t w, located close to each other on the overlapped region, b the closer vertices, v 2w 3, and v 1w 3, are replaced by s and q, respectively, to form a quadrilateral pqrs on the overlap, c disentangled overlapped region pqrs, d scan line along edges sp, rq, e scan line along edges sr, pq, f plot of intensity variation (I) versus the scan line (S) along sp and rq, and g plot of intensity variation (I) versus the scan line (S) along sr and pq

Figure 3e demonstrates the results of the proposed algorithm to identify the cut-points in a variety of sample images, which are comprised of clusters with single and multiple overlaps. The algorithm identifies the cut-points efficiently even in the clusters, where the overlaps are very close to each other, and even when one of the chromosomes involved in the overlap does not have its sizeable parts lying on the either side of the other chromosome in the cluster.

2.2 Separation of the overlapping chromosomes

To disentangle the overlapping cluster, it is necessary to initially separate the non-overlapping chromosome segments from the overlap. The non-overlapping chromosome segments are separated along the edges of the quadrilateral formed on the overlapped region, with the cut-points as vertices. Appropriate cut-lines (edges of the quadrilateral), connecting these points, detected on the overlap are determined by tracing the boundary of the cluster in clockwise or anticlockwise order. The cut-line separates the non-overlapping chromosome segments and the overlapped region.

Every chromosome has a particular sequence of higher and lower intensity values, indicating the presence of light and dark bands on it. These bands, present on the body of chromosomes, are always perpendicular to the medial axis (long axis) of the chromosome. This orientation of the bands, with respect to the axis of the chromosome, is always retained irrespective of any shape variability or any nature of the overlapping chromosome. If a and b are the two chromosomes, such that b overlaps on a, then b is the topmost chromosome in the cluster. The bands present on the overlapped region are perpendicular to the axis of the chromosome b, and the overlapped region belongs to chromosome b (refer to Fig. 2c). Thus, the bands on the overlapped region are oriented in line with the topmost chromosome of the cluster. So, in the proposed algorithm, the banding evidence at the overlapped region is used to aid a decision about which chromosome the overlapped region belongs to.

Multiple chromosomes may overlap in any directions making varying degrees with each other. To find the chromosome to which the overlapped region belongs to, the variations of the bands on the overlapped region need to be examined. It is therefore necessary to observe a proper direction (scan line) along the axis of the chromosome. An overlapped region, extracted using the proposed algorithm along with the scan lines, is shown in Fig. 5(c). Though the bands on the chromosomes are ideally expected to be continuous, there is large possibility of having some discontinuities and a loss of banding information in real images. Figure 5c demonstrates this possibility, where one of the vertical bands near cut-line (edge-pq) is slightly discontinuous. Only one scan line passing through this discontinuity would fail to identify the presence of a dark band. So, to accommodate the inherent limitations of the microscopic imaging and staining in real metaphase samples, the change in the intensities on the overlapped region is examined along three scan lines in every direction. Figure 5d and e describes a simple midpoint algorithm used to identify the direction of the scan line. Two sets of non-incident edges of the quadrilateral are considered independently. For each set, multiple segments parallel to one of the opposite edges, (sp and pq), of the quadrilateral are considered. Finally, the line joining the midpoints of the multiple segments defines the scan line. To compute the band pattern on the overlapped region, subsequent scan lines parallel to the computed scan line are also considered. The variations in intensity values along the first scan line in the horizontal and vertical directions are shown in Fig. 5f, g, respectively. The plot in Fig. 5f has significant peaks and valleys (variations in intensity values) indicating the existence of band in the horizontal direction. A vector, storing the intensity values along every scan line in each direction, is formed. The covariance matrix between two populations of vectors resulting from the set of scan lines in each direction is computed. The overlapped region belongs to the chromosome lying along the direction with the maximum variance. Finally, using the geometrical and morphological characteristics, all the overlapping areas in the cluster are adhered with respective non-overlapping chromosome segments.

2.3 The proposed algorithm

The algorithm to extricate the cluster of overlapping chromosomes involves pre-processing of the image followed by detection of the cut-points by computation of DT and CDT. It finally assigns the overlapped region to the appropriate chromosome segments, leading to successful disentanglement of the chromosomes.

Algorithm 1: Extrication of overlapping chromosomes

Input: C = Cluster of overlapped chromosomes

Output: Disentangled chromosomes of C

begin

1. Pre-processing:

C is binarized based on the threshold value computed using Otsu’s algorithm. Compute the pixels (P = p1, p2…pn) in the boundary of C. Smooth the boundary of C using Moor’s algorithm.

2. Detect cut-points:

• Compute the DT of P.

• Based on boundary constraint, the CDT of P is calculated.

• Determine m triangles (R i , i = 0, 1,…m) obtained by the CDT whose area satisfy the threshold (Tu) condition.

• Compute the centroid (CR i ) of each triangle in Ri and measure the distance (DC) between CR i and CR j for all i, j = 1,…t and i ≠ j. Determine the distance (DE) between edge e i and edge e j for all i, j = 1,…m such that ei and ej belong to different triangles.

• Form different groups (G k ) of two triangles based on the minimum DC and DE. Choose the pair one after another in order of their proximities.

• Total number η denote the number of overlaps in the cluster.

• For each group Gk (l ≤ k ≤ η), check whether corresponding triangles T t ^{G k } and T t ^{G k } share a common edge. If they do not, merge two nearest edges of T t ^{G k } and T t ^{G k } by averaging respective vertex coordinates.

• Trace the edges of quadrilateral (Q) formed by T t ^{G k } and T t ^{G k } in clockwise or anti-clockwise order.

• The four vertices of Q represent the cut-points and the area of Q determines the overlapped chromosome region (R).

3. Extricate Chromosomes:

• For each G k , compute the similarity of the band pattern between the overlapped region and the chromosomes using the coherence relation.

• Reconstruct a chromosome by assigning R to an appropriate chromosome, and the overlapped regions in other chromosomes remain empty.

End

The proposed algorithm extricates the cluster of overlapping chromosomes by identifying the cut-points and further assigning the overlapped region to the appropriate chromosome. The topmost chromosome of every overlap is thus completely extricated, and the overlapped part of the lower chromosomes is permanently lost, which usually is the case even in manual process of separation of the chromosome cluster.

3 Results

The proposed algorithm has been tested on a standard PC (Intel core 2 quad CPU, 3.0 GHz, 4 Gb RAM) in windows environment using Matlab 7.12.0 (R2011a). Performance of the algorithm is examined using variety of synthesized and actual clusters from publically available databases and private genetic labs. The data set used for the validation of the proposed algorithm includes altogether 60 cases exhibiting varying degrees and complexities of overlaps. It includes images from LK1 data set [13], which has chromosomes of lower quality than other classic database. They were simulated (manually) to generate 40 overlaps. For creating synthetic images graphics editing software, Adobe Photoshop CS4 was used to randomly overlap the chromosomes and form clusters of single and multiple chromosome overlaps. The image editing features of the software enable the creation of overlaps with varying styles. Simulated images generated from LK1 database reflect high degrees of overlaps (up to 6) and contain variation in the nature of overlap. As the overlapping is generated synthetically from LK1 database of real chromosomes, it is expected that background noise on the chromosome section should have similar behavior.

Ten images numbered: 2, 10, 12, 22, 44, 53, 60, 72, 74, and 114 in the folder originals from the data set described in [17, 18] were selected. These metaphase images have clusters with single, two, and three overlaps with varying degrees and shapes. Many images in the database have repeated occurrences of similar types and nature of overlaps. A few representative cases were therefore selected. A few images also have brightness saturation at the overlapped region thereby leading to complete loss in the banding information. Such images were excluded from experimentation. In practice, these types of images are also not considered in the process of manual karyotyping. Manual karyotyping involves a process of finding best metaphase images. Ten real images obtained from Denanath Mangeshkar Genetic Lab (Denanth Mangeshkar Hospital and Research Center, Erandawne, Pune 411 004, India) are also included in the database. Figure 6 demonstrates the intermediate results obtained during the testing of the proposed algorithm using the clusters with single and multiple overlaps. The results were validated using the ordered karyotype of LK1 database as the ground truth and also by the expert in the genetic lab. Figure 6a showcases some of the samples from the database. Figure 6b depicts the results of CDT computation on the samples to identify the cut-points illustrated in Fig. 6c. Figure 6d finally presents the results of disentangling the chromosome cluster and assigning the overlapped region to the appropriate chromosomes.

Fig. 6
figure 6

a Test samples with single and multiple overlaps, b computation of CDT on the test samples, c the cut-points identified using the proposed algorithm, and d the extricated chromosomes from the respective clusters

Figure 7 demonstrates a few cases wherein the observations made for characterizing the overlaps in Sect. 2 were violated, and the proposed algorithm fails to extricate the overlapping chromosomes. The centromere is the narrowest part of the chromosomes. The overlapping chromosomes may have their centromere lying above each other. In such cases, the area of the overlapped region is comparatively less. The triangles formed on such an overlap, after the computation of CDT, may not necessarily be larger than the rest of the triangles formed on the wider and the bulky region of the chromosomes. Figure 7a demonstrates this possibility of overlap, where the centromere of chromosomes A and chromosomes B lies on chromosome C and shows the corresponding CDT on the sample. With the proposed algorithm for the identification of the number of overlaps, triangles a, b, c, d, e, and f are extracted as the candidates belonging to the overlapped region. The triangle f gets eliminated as it does not lie in the close proximity of other potential triangles. Moreover, the triangles b and c get paired as the outcome of applying minimum centroid distance and proximal edge criteria. This leads to the identification of wrong cut-points, and erroneous separation of the chromosome segments.

Fig. 7
figure 7

Erroneous cut-points detected by the proposed algorithm in few test samples with a overlapping centromeres (chromosomes A and B overlapping on chromosome C), b partially overlapping chromosome, and c three chromosomes forming a single overlap

Chromosomes may also partially overlap on each other where the cut-points to be identified are less than four. It happens in one of the test samples (refer to Fig. 7b). Two chromosomes partially overlap each other, and it is required to detect only two cut-points for the extrication of the overlapping chromosomes. The proposed algorithm fails to detect the desired cut-points in such cases. The results of computing the CDT and wrong cut-points detected by the proposed algorithm are illustrated. In some cases, two or more chromosomes may fully overlap themselves. The observations discussed in Sect. 2 are violated. The algorithm would fail to resolve the overlap. However, manual methods involving a trained cytogenetists are also likely to fail in disentangling such fully overlapped clusters.

Practically, multiple chromosomes in a cluster may partly cover each other to form only one single overlap, and such an overlapped region may have multiple (>4) chromosome segments attached to it. Figure 7c demonstrates such a possibility. The cut-points are correctly identified. Triangles a and b are eliminated as they do not lie in the close proximity of other potential triangles lying on the overlapped region. The algorithm, however, fails to separate the partially overlapping segments of the chromosomes.

Threshold of 100 pixels is empirically computed considering the data sets used for this study. It may vary with other data sets. The method of identifying the number of overlaps depends on the resolution of the image. Despite the correct detection of the cut-point in some clusters, the algorithm leads to inaccurate separation of the chromosome segments. In those cases, pattern of the bands on the overlap was not prominent enough to aid the decision-making about which chromosome the overlapped region belongs to. The proposed algorithm has successfully identified the exact number of overlaps, cut-points, and extricates the overlapping chromosome cluster in rest of the samples from the data set used in experimentation.

The results of the proposed algorithm applied on a set of clusters with single and multiple overlaps of varying degrees of complexities are summarized in Table 1, where each row describes the success rate for the clusters with specific composition. The accuracy of the proposed algorithm for resolving 1, 2, 3, and ≥4 overlaps is computed considering the total number of images in the respective category. Accuracy of detecting correct cut-points for each group is defined as the ratio of the number of images with the correctly identified cut-points to the total number of images in the respective group (A cc = N cc/N img). Similarly, accuracy of correctly extricating the chromosomes is expressed as the ratio of the number of images with the correct separation of the chromosomes to the total number of images in the respective group (A cs = N cs/N img).

Table 1 Results of the proposed algorithm

The algorithm has successfully detected the cut-points with an average accuracy of 93.33 %. Moreover, the algorithm can disentangle the chromosomes in the clusters having 1–6 overlaps with an accuracy of 88.33 %. The extrication of the chromosomes by assigning the overlapped region to the appropriate chromosome segment becomes challenging in the samples with 5 and 6 overlaps because of the multiple cavities formed and number of chromosomes involved. Correct identification of the cut-points in the overlapping cluster does not ensure its successful disentanglement. Erroneous identification of the cut-points, however, guarantees the wrong separation of the chromosome cluster, thus restricting A cs below A cc. The results achieved using the proposed approach were presented to an expert in a genetic Lab (Denanth Mangeshkar Hospital and Research Center, Erandawne, Pune 411004, India; Birth Right Clinic, Yashokamal, Nr Ayurved Rasashala, Karve Rd, Deccan Gymkhana, Pune 411004) for its validation and found to alleviate the cytogeneticists’ manual process of chromosome separation.

4 Discussion

The proposed technique addresses one of the most challenging predicaments in automated karyotyping, the extrication of overlapping chromosomes from the metaphase image. It applies heuristics and exploits the properties of the DT to automatically identify the cut-points and uses the banding information to extricate the overlapping cluster. The efficiency and the robustness of the algorithm are tested using a variety of clusters with varying number of overlapping chromosomes. The proposed DT-based algorithm outperforms in identifying the correct cut-points in most of the critical cases of the overlapping chromosomes and efficiently decomposes the cluster.

Table 2 compares the results of the proposed approach with the ones reported in the literature. It must be emphasized that the reported methods are all tested on independent data sets. Moreover, some of the studies report the overall accuracy of their method computed on the entire database [8, 23, 25, 26], whereas others represent the accuracy for clusters of varying number or types of overlaps [1, 4]. Not all the studies explicitly mention their index to calculate accuracy. The reported accuracy in [8] is the fraction of overlaps correctly resolved with respect to manually identified overlaps, whereas studies reported in [1, 4] present the success rate for the clusters of same types and sizes. A direct comparison may not necessarily justify their effectiveness and efficiency. It may be noted that performances of these algorithms as shown in Table 2 are merely indicative.

Table 2 Comparison of the proposed algorithm with other results reported in the literature

It is worth noting that most of the algorithms [6, 12, 16] previously reported are limited to the separation of two-chromosome clusters, whereas the probability of having multiple overlaps is higher in the real metaphase images and is considered in the proposed approach. Grisan et al. [8] achieves an accuracy of 90 % for disentangling the cluster of 5 chromosomes. The algorithm, however, fails when one of the chromosomes involved does not have a sizeable part of itself on both sides of the overlap site. Chromosomes being nonrigid bodies with high degree of variability may overlap in any fashion. The requirement that overlapping chromosomes should bisect each other imposes a serious constraint on the algorithm. The proposed algorithm identifies the cut-points efficiently in the clusters with multiple overlaps, where the overlaps are very close to each other and even when the chromosomes, involved in the overlap, do not bisect each other. It thus overcomes the limitation of the earlier reported approach in [8].

The proposed algorithm has efficiently identified the cut-points in most of the critical cases and successfully extricated the cluster using the identified cut-points. Moreover, the data set generated included manually simulated as well as real images of clusters in metaphase images. The algorithm achieved good results even in the cases of multiple overlaps and also when the chromosomes, in the overlapping cluster, did not have its sizeable part lying on the either sides of other chromosomes.