1 Introduction

In eucaryotes, the genetic information is encoded and packaged into a set of chromosomes. For example, the genome of human beings has been divided into 23 pairs of chromosomes. The first 22 pairs are homologous chromosomes which means they are common to both females and males. The last pair is called sex chromosomes and it is the only nonhomologous pair (two X in females, one X and one Y in males) [2]. Chromosomes are typically colored by fluorescent dyes at the metaphase of mitosis. At this stage, cells undergo nuclear division and their chromosomes are highly organized and compacted. To unambiguously display the 46 chromosomes in a human cell, researchers artificially rearrange them in numerical order [15]. Cytogeneticists can detect abnormalities associated with inherited defects. An original microphotograph and its corresponding karyotype are shown as Fig. 1a, b respectively.

In the medical profession, chromosome abnormalities [18, 22, 25] are significant evidence in the diagnosis of genetic disorder such as Down syndrome [29], Williams syndrome [36], and cancer [20, 37]. Some chromosome abnormalities, including atypical number and structural abnormalities, are distinct and visible in metaphase cells. Therefore, analyzing karyotypes segmented from metaphase cell images occupies a key role in cytogenetics and cancer studies [20, 37]. Accurate segmentation is crucial to karyotyping which involves organizing and ordering pairs of homologous chromosomes in terms of their entire bands and features. Nonrigid shapes, overlapping chromosomes, stain debris and other noises increase the difficulty for chromosome segmentation [23]. As professional discretion is generally required in karyotyping, it is inevitably time-consuming and expensive [34].

Fig. 1
figure 1

a An example of the real microphotographs, b an example of the karyotypes

Convolutional neural networks (CNNs) refer to a class of feed-forward artificial networks, which have been proved to be efficient in image analysis. The success of CNNs advances the development in computer vision area, significantly improving the performance on many computer vision tasks, such as label prediction [1], neural style transfer [6], object detection [8, 28], and image semantic segmentation [8]. In the field of semantic segmentation, specifically in scene understanding, U-Net [30] and SegNet [3] are two efficient and practical CNN architectures that have achieved many breakthrough results and showed advantages in end-to-end applications.

In this research, we focus on the automatic segmentation of overlapping chromosomes. Most semantic segmentation applications emphasize detecting object regions or boundaries, e.g. Fast/Faster/Mask R-CNN [7, 11, 28] and pyramid CNN [24]. In recent years, several works concentrate more on the occlusion detection problem in many application scenarios [4, 16, 17]. In the medical segmentation field, Kowal et al. [16] combine a CNN model and seeded watershed algorithm for addressing the problem of aggregated cell nuclei segmentation. BANet [4] focuses on handling occlusion by assigning occluded pixels to the correct object in panoptic segmentation tasks. OCFusion [17] resolves the occlusion problem by adding an additional “head”, named as the “occlusion head”, to the Mask R-CNN architecture for classifying pixels. However, these current works still focus on separating the boundaries rather than predicting the underlying regions between overlapped objects. This study attempts to predict not only non-overlapping regions but also the order of superposition and opaque regions of underlying chromosomes.

There are two key contributions in this study. First, a new neural network architecture called Compact Seg-UNet, a hybrid of U-Net [30] and SegNet [3], is proposed in this study to predict non-overlapping and overlapping segments separately. Second, to evaluate the segmentation performance of overlapping chromosomes, we modified the dataset construction method used in [12] to generate more realistic images with opaque overlapping regions.

The rest of this paper is organized as follows. In Sect. 2, we review several chromosome segmentation methods available in literature. Section 3 first provides details of dataset preparation, then the proposed approaches and the architecture of Seg-UNet & Compact Seg-UNet are introduced. Subsequently, we provide detailed results of the experiments in Sect. 4. Finally, conclusions and future work are given in Sect. 5.

2 Related Work

2.1 Architecture of U-Net and SegNet

Before deep CNNs gain popularity in its application in computer vision, researchers regularly work on object recognition and edge detection. The deep learning methods can achieve significant improvements not only in edge detection but also in pixel-wise semantic segmentation. Different from Fully Convolutional Networks [19], U-Net and SegNet [3, 30] are both designed as encoder-decoder architectures. The encoder path extracts and integrates the interior features of images, while the decoder ensures that the output details and sizes are identical to the input images.

Fig. 2
figure 2

Architecture of U-Net [30]

The U-Net is initially applied to biomedical image segmentation [30]. It has a unique U-shaped architecture, comprising of a contracting path and an expansive path (Fig. 2). The contracting path contains 4 blocks, each consisting of two convolutional layers with respective rectified linear units (ReLU) as activation functions, and one max-pooling layer for downsampling. In the expansive path, upsampling is achieved through transposed convolution operators. Each upsampling output is concatenated with a corresponding high-resolution feature map of the contracting path. Thus, high-resolution features are maintained and inherited across layers. [30] utilizes the U-Net for cell tracking and segmentation in biomedical images. The U-Net achieves the best performance of 77.5% intersection over union (IOU) score which is 46% higher than its nearest competitor in ISBI challenge.

Fig. 3
figure 3

Architecture of SegNet [3]

As a semantic segmentation architecture, SegNet is initially applied to road and indoor scene understanding scenarios which also require the neural networks to detect different objects, e.g., pedestrians, vehicles, doors and office chairs [3]. The SegNet consists of an encoder network (contracting path), a decoder network (expansive path), and a pixel-wise classifier in its final layer (Fig. 3). The encoder network is identical to the first 13 convolutional layers of the VGG16 network [33], in which each convolutional layer is supplemented by batch normalization and ReLU. The upsampling step of SegNet is computed by corresponding pooling indices of the encoder path, and it differs from the transposed convolution method in U-Net.

2.2 Previous Work on Chromosome Overlapping Segmentation

Several methods and algorithms have been proposed for the automatic segmentation of overlapping chromosomes in metaphase images. Karvelis et al. [14] utilizes the watershed transform which decomposes images into watershed regions and gradient paths, and then merges adjacent regions to generate chromosome areas. Furthermore, a hybrid of fuzzy C-means and the watershed algorithm has been proposed to detect overlapping region [21]. Methods focusing to detect cut-points have also been studied. Ranjan et al. [27] proposes a novel method to detect pale path for chromosome images, which obtains an optimum number of cut-points and minimized grayscale intensity by self-adaptive searching windows. Delaunay triangulation is utilized to identify the number of overlapping chromosomes by detecting the optimal cut-points [26]. Most of these geometric analyses can detect and segment overlapping chromosomes, but have weak performances when chromosomes are merely touching or partially overlapping. In practice, these methods may require a lot of human interventions which are time-consuming.

Although CNNs have been developed for over 20 years, they are seldom applied to the field of chromosome related image analysis. In 2017, [32] and [9] design pipelines for the automation of chromosome segmentation and classification. However, their chromosome segmentation is implemented by crowdsourcing method which is carried out manually. This is different from our purpose which is the automatic segmentation of overlapping chromosomes. A simplified U-Net (abbreviated as Sim U-Net) for automatic segmentation of overlapping chromosome pairs is proposed by Hu et al. [12], by retaining the first two downsampling and the corresponding upsampling blocks of the regular U-Net (Fig. 2). The maximum channel size of Sim U-Net is 256. By training on randomly overlapped chromosome pairs, it quantitatively evaluates the segmentation accuracies of overlapping and non-overlapping regions at pixel level. For improving the performance, a method is proposed by Saleh et al. [31] which combines a medium-size U-Net (abbreviated as Med U-Net) and test time augmentation (TTA). By retaining the first three downsampling and corresponding upsampling blocks, the maximum channel size of Med U-Net is 512. The depth of bottleneck layers of Med U-Net is 512 which is deeper than that of Sim U-Net (256) and shallower than that of the original U-Net (1024). It achieves IOU accuracies between 90.63–99.94% which is significantly better than the range of 78.93–99.93% [12] in its reproduced experiments. TTA is a method generally used after the training stage to improve the performance of test sets. It is not employed in this study because our focus is to compare the efficacies of different architectures.

3 Proposed Method

In this section, we provide a detailed description of our approach, organized as follows: (a) The construction of datasets, (b) pre-processing of data, (c) architecture of Compact Seg-UNet and (d) evaluation metrics.

3.1 Datasets

In this research, two datasets are used. Dataset 1 comprises 13434 pairs of overlapping chromosomes. The dataset is obtained by extracting 46 individual chromosomes from two microphotographs, one containing DAPI stained human metaphase chromosomes and the other including Cy3 labeled telomeres. The areas of chromosomes are calculated and ordered. After that, 12 chromosomes are picked by selecting every fourth chromosome from a sequence of chromosomes ordered by sizes (Lines 1 to 2 of Algorithm 1). They are combined to produce \(\left( {\begin{array}{c}12\\ 2\end{array}}\right) \) = 66 chromosome pairs (Line 3 of Algorithm 1). With random rotation, 13434 overlapping chromosome images are generated (Dataset 1) (Line 4 of Algorithm 1). Dataset 1 is also used in Hu et al. [12]. Figure 4a gives four examples of the remained 13434 chromosome pairs. Their corresponding ground truth information is illustrated in Fig. 4b whose orange and green regions correspond to the non-overlapping regions of underlying and top chromosomes respectively, while blue regions correspond to the overlapping region.

Fig. 4
figure 4

a Four examples of generated overlapping chromosomes; b their corresponding ground truth (orange and green colors are used to distinguish between two chromosomes; blue color is used to represent the overlapping region)

A shortcoming of Dataset 1 is that, during image generation, the pixel-wise summation of greyscale values occurs when chromosomes are superimposed, resulting in lighter luminance (Fig. 5a). In this case, the non-overlapped regions of top and underlying chromosomes are indistinguishable and the lighter overlapping regions are distinct for recognition. This phenomenon will not happen to physical objects including chromosomes (Fig. 5c). For generating more realistic overlapping chromosomes, we modify the image generation method. In overlapping regions, only the pixel values of top chromosomes remained. A new dataset (Dataset 2) is then constructed in which the top chromosomes are opaque with respect to the underlying ones (Fig. 5b) (Lines 5 of Algorithm 1). The overlapping images and their order of the two datasets are identical except for the pixel values of overlapping regions. Dataset 1 and 2 are available at https://github.com/SifanSong/Chr_overlapping_datasets.

Fig. 5
figure 5

a The overlapping region in Dataset 1 is lighter than non-overlapping areas; b We modify the overlapping method, making Dataset 2 appears more similar to real images; c In real microphotographs, opaque overlaps are more common than translucent ones

3.2 Pre-processing

The ground truth (Fig. 6-GT) of Datasets 1 and 2 is initially transformed to four one-hot images for assessing the accuracies of different target regions. It is then denoised using a label correction method [12] to eliminate mislabeled pixels (Fig. 6i to iv). After denoising, Fig. 6v shows the label of background. Fig. 6vi and vii illustrate smoothed non-overlapping regions of the underlying and the top chromosomes respectively. Figure 6viii shows the overlapping region. Pixel values of target regions are labeled as 1 (the black regions) and those of the rest are labeled as 0 (the white regions in Fig. 6). Before being fed into CNNs, the background of all overlapping chromosome images and one-hot images are extended to \(96\times 96\) pixels (Lines 6 to 10 of Algorithm 1). It is notable that the pre-processing steps are identical in all experiments in this study.

Fig. 6
figure 6

(GT) An example of ground truth labels; (i) to (iv) four one-hot images without denoising; (v) to (viii) four one-hot images after denoising

figure a

3.3 Architecture of Compact Seg-UNet

Seg-UNet is a hybrid convolutional neural network combining the main characteristics of U-Net [30] and SegNet [3]. It consists of an encoder path and a decoder path (Fig. 7). The SegNet architecture is used as a framework, that is, each convolutional block contains a convolutional layer, a batch normalization and an activation function ReLU. In encoder path, 2 x 2 max-poolings with stride 2 are conducted to filter deep features when downsampling, and their pooling indices are saved. The architecture of the decoder path is almost symmetrical to that of the encoder. The pooling indices are utilized to perform upsampling (Fig. 8). The upsampling guided by pooling indices restores double feature sizes by keeping positions with the best logits recorded in max-pooling [3]. In order to deliver low-level features, we concatenate the results of upsampling to corresponding layers with the same feature sizes from the encoder path (Fig. 8). These skip connections decrease the loss of features during upsampling in different scales and preserve more contour details [5, 30].

Fig. 7
figure 7

The architecture of Seg-UNet and its compact version

Fig. 8
figure 8

Upsampling of Compact Seg-UNet

The narrow and thick bottleneck layers introduce excessive trainable parameters in the Seg-UNet, leading to overfitting when applied to small datasets [38]. To avoid this pitfall, we customize the depth of Seg-UNet by removing these layers (the grey box in Fig. 7). We denote this architecture with a smaller network size as the Compact Seg-UNet.

3.4 Evaluation Metrics

To evaluate the performance of segmentation in pixel-level, we first apply three measurement metrics, IOU, precision and recall:

$$\begin{aligned} IOU= & {} \frac{{{Intersection}}}{{{Union}}} \times 100\% = \frac{TP}{TP+FP+FN} \times 100\% \end{aligned}$$
(1)
$$\begin{aligned} Precision= & {} \frac{TP}{TP+FP} \times 100\% \end{aligned}$$
(2)
$$\begin{aligned} Recall= & {} \frac{TP}{TP+FN} \times 100\% \end{aligned}$$
(3)

where Intersection is equal to TP (True Positive) representing existing segments and being predicted correctly; FP (False Positive) is non-existing and wrongly predicted segments; FN (False Negative) represents segments that exist but not predicted; and Union is a summation of TP, FP and FN. For every one-hot image, we use IOU, precision and recall to assess the performance of segmentation. We overlay two corresponding one-hot images of ground truth and predicted segmentation to demonstrate them (Fig. 9). In Fig. 9, the colors black, blue and yellow represent TP, FP and FN respectively. We further use F1 score to combine the precision and recall and to provide a balanced measurement of these two metrics:

$$\begin{aligned} F_1 = 2 \times \frac{Precision \times Recall}{Precision + Recall} \end{aligned}$$
(4)

where Precision and Recall are previously defined in (2) and (3) respectively.

Fig. 9
figure 9

An illustration of True Positives (black color), False Positives (blue color), and False Negatives (yellow color). (Color figure online)

Table 1 The detailed hyperparameters of All experiments

4 Experiments and Results

4.1 Implementation Details

Due to the fact that the one-hot images are composed of 0 and 1 pixel values, we apply sigmoid and binary cross-entropy as the last activation function and the loss function respectively. Our experiments are coded by PyTorch 3.7.5, and run on a NVIDIA RTX 2080Ti GPU and an Intel(R) Core(TM) i9-9900K CPU. We list all detailed hyperparameters in Table 1. For fair comparison, we apply the configuration described in Table 1 to every experiment and each dataset. We randomly choose 1500 images as a testing set. The order of remained images is randomized by a fixed random seed \(= 152\). These remained images are split in the ratio of 4:1 (9548:2386 images) for training and validation (batch size \(= 8\)). We then perform 5-fold cross-validation to prevent overfitting issues. The training process is optimized using Adam optimizer. We utilize the early stopping training strategy in this study, and the initial learning rate lr is 0.00002. The validation performance is checked once per epoch, and model weights with the best validation performance (the lowest loss of validation sets) are saved. If the best validation performance is not updated after 3 consecutive epochs, the lr is decreased by 0.5 decay rate. If the best validation performance is not updated after 9 consecutive epochs, the training process is terminated for avoiding overfitting.

Table 2 The IOU scores (mean ± 2 standard deviation) of one-hot images (5-fold cross-validation). The best results are highlighted in bold
Table 3 The average scores (± 2 standard deviation) of the experiments (5-fold cross-validation). The best results are highlighted in bold

4.2 Results

To better assess the segmentation performance of overlapping chromosomes, we employ four CNN architectures (Sim U-Net, Med U-Net, Seg-UNet and Compact Seg-UNet) on the two datasets. Since Sim U-Net has the shallowest architecture among the four CNNs, we use its results as baselines. As shown in Tables 2 and 3, Experiments A to D are conducted using Dataset 1 and Experiments E to H are carried out using Dataset 2. Specifically, Experiment A and E utilize Sim U-Net; both Experiments B and F use Med U-Net; both Experiments C and G utilize Seg-UNet; and both Experiment D and H use Compact Seg-UNet. Table 2 lists the architectures, datasets, and four IOU scores whose meanings are consistent with Fig. 6b-v to viii respectively. To further evaluate the segmentation performance of these architectures, we calculate sizes of GPU memory usage (GMU), average IOU, precision, recall and F1 scores (Table 3). We note that all the results of Tables 2 and 3 are average scores ± 2 standard deviation (2std.) of test sets of 5-fold cross-validation.

Fig. 10
figure 10

ac Examples of average losses of training, validation and test sets versus epochs respectively, where D1 and D2 represent Dataset 1 and Dataset 2; d the zoom-in plot of Seg-UNet and Compact Seg-UNet experiments of test sets

Fig. 11
figure 11

ac Examples of average IOU scores of training, validation and test sets versus epochs respectively, where D1 and D2 represent Dataset 1 and Dataset 2; d the zoom-in plot of Seg-UNet and Compact Seg-UNet experiments of test sets

For the Experiments (A to D) conducted on Dataset 1 and the Experiments (E to H) conducted on Dataset 2, Table 2 indicates that all IOU scores of Seg-UNet and Compact Seg-UNet have a significantly better performance than those of Sim U-Net and Med U-Net, since their confidence intervals (average scores \(\pm 2std.\)) do not have overlap with those of Sim U-Net and Med U-Net. At the same time, their 2std. scores are much lower than those of Sim U-Net and Med U-Net. The segmentation performance of Med U-Net is generally better than that of Sim U-Net. It indicates that Sim U-Net is too shallow to achieve a good performance in this study, so a deeper architecture (Med U-Net) improves the capacity to learn deep features. In Table 2, the comparisons of Seg-UNet and Compact Seg-UNet (Experiments C to D and G to H) show that results of IOU_0 to IOU_3 are almost similar, but the results of Seg-UNet may be affected by overfitting due to this excessively complex architecture. In contrast, the results of Compact Seg-UNet are slightly better than those of Seg-UNet. Compact Seg-UNet achieves the best IOU scores in the range of 88.51%(\(\pm 0.56\))-99.97%(\(\pm 0.00\)) on Dataset 1 and 82.49%(\(\pm 0.74\))-99.97%(\(\pm 0.00\)) on Dataset 2. Especially, Experiments D and H show that the robustness of Compact Seg-UNet model is superior to that of Seg-UNet with respect to lower 2std. scores. Although IOU_3 scores of Dataset 2 are lower than those of Dataset 1, these IOU scores demonstrate the flaws of Dataset 1. The directly summed pixel values cause unrealistic lighter overlapping regions as we described above, and this phenomenon cannot be observed in the real-world. Therefore, these lighter regions not only are distinct to neural networks, but also do not appear which chromosomes are underlying. Comparing Experiments (A-D) to (G-H), these flaws have been reflected by higher scores of the IOU_3 and poorer ability to distinguish the non-overlapping regions of underlying chromosomes (IOU_1). Although Experiments (C and D) on Dataset 1 show higher average scores in both Tables 2 and 3, we still recommend setting the results on Dataset 2 as benchmarks in future research and evaluations, since the underlying regions are not transparent under microscopes and CNNs should learn to predict the extend of overlapping.

In Table 3, the average IOU is the mean of IOU_0 to IOU_3 of Table 2. We observe that relationships of other average ± 2std. results (precision, recall and F1) of four CNN architectures are consistent with IOU scores. Although Sim U-Net and Med U-Net only have 1443 and 1609 MiB GMU respectively, their segmentation results have been significantly improved by altering architectures and may not satisfy the requirement of further research on automatic segmentation of overlapping chromosomes. Compact Seg-UNet (Experiments D and H) also achieves the best performance in overlapping chromosome segmentation (an average IOU of \(93.44\%\pm 0.26\) and a F1 score of \(0.9596\pm 0.0814\) on Dataset 2) and its GMU has been reduced from 2455 (Seg-UNet) to 2251 MiB. These results highlight the fact that the removal of bottleneck layers of Seg-UNet not only reduces the load of training but also exhibits superiority in this study.

To further explore the segmentation performance in training processes, we present average losses of test sets versus epochs (Fig. 10) and average IOU accuracies of test sets versus epochs (Fig. 11). The results of the first fold of cross-validation are recorded and illustrated as examples. The curves of Sim U-Net and Med U-Net are distinctly separated from those of Seg-UNet and Compact Seg-UNet. Figures 10a and 11a demonstrate the continuous overfitting of models is terminated by our early stopping strategy when validation performances are not updated for a while (Figs. 10b, 11b). Figures 10c and 11c are average scores of test sets, and they also show that both Seg-UNet and Compact Seg-UNet significantly improve average results and decrease total epochs before early stopping. Figs. 10d and 11d are zoomed-in plots of the curves of Seg-UNet and Compact Seg-UNet. We observe that the experiments conducted using each dataset show similar trends and are highly consistent. They highlight the superiority of the Compact Seg-UNet and Seg-UNet over Sim U-Net and Med U-Net in this task with high efficacies.

5 Conclusions and Future Work

In this work, a novel deep learning neural network architecture, Compact Seg-UNet, is proposed to segment images with overlapping chromosomes. With the removal of several low-resolution layers, Compact Seg-UNet requires relatively less GPU memory usage. The scarcity of real-world overlapping chromosome pairs motivates the construction of a dataset with generated overlapping chromosomes. To evaluate the performance of the proposed method, we also compare Compact UNet with Sim U-Net, Med U-Net and Seg-UNet. With the measurement metrics of IOU, precision, recall and F1 scores, the proposed Compact Seg-UNet is superior to other architectures in terms of segmentation performance.

For the public dataset (Dataset 1), there are lighter overlapping regions due to pixel-wise summation of the respective greyscale values. Such feature is not only unrealistic but can be directly recognized by neural networks. In view of this, we propose a modified approach to generate images with opaque overlapping regions that are more commonly seen in the real-world (Dataset 2). On Dataset 2, our proposed Compact Seg-UNet achieves the best average IOU score (93.44% ± 0.26) and the highest average F1 score (0.9596\(\pm 0.0814\)). Those significantly outperform the previous work by large margins.

When CNNs have identified and learned interior textures from chromosomes, the trained models can predict the shape, size and obstructed overlapping region of chromosomes in this study. This differs from most semantic segmentation applications whose emphases are merely detecting object regions or boundaries, e.g. Fast/Faster/ Mask R-CNN and pyramid CNN.

The achievement in this research is the first step towards the segmentation of chromosomes with higher degrees of overlapping. The research shows that Compact Seg-UNet can be used to segment overlapping chromosomes and predict nonrigid shapes. Compact Seg-UNet is designed for alleviating the overfitting problem. It not only improves the segmentation performance but also reduces computational costs. In future research, apart from customizing the size of CNN architectures, methods such as, dropout [35], weight initialization [10] and stochastic weight averaging [13] may be integrated into our architecture to further improve the generalization and robustness.

Despite the encouraging results obtained with Compact Seg-UNet in this study, Datasets 1 and 2 are artificially generated from merely 12 individual chromosomes. This is the main weakness impeding the practicability of models to segment overlapping chromosomes in more complicated scenarios. To achieve robust segmentation performance on more realistic images with overlapping chromosomes, we would augment training sets with real-world data and various chromosome shapes for constructing a great variety of overlapping conditions and chromosome individuals. In the next step of the research, we would also label and use real overlapping chromosome images to train our proposed neural network for improving model robustness, as well as to test fine-tuned models on real chromosome images for an assessment of its efficacy.