Twin Deep Convolutional Neural Network for Example-Based Image Colorization

Varga, Domonkos; Szirányi, Tamás

doi:10.1007/978-3-319-64689-3_15

Domonkos Varga^16,17 &
Tamás Szirányi^16,18

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10424))

Included in the following conference series:

International Conference on Computer Analysis of Images and Patterns

1599 Accesses
12 Citations

Abstract

This paper deals with the colorization of grayscale images. Recent papers have shown remarkable results on image colorization utilizing various deep architectures. Unlike previous methods, we perform colorization using a deep architecture and a reference image. Our architecture utilizes two parallel Convolutional Neural Networks which have the same structure. One CNN, which uses the reference image, helps the other CNN in color prediction for the input image. On the other hand, the second CNN, which uses the input image, helps to identify the areas which holds essential information about the color scheme of the scene. Comprehensive experiments and qualitative and quantitative evaluations were conducted on the images of SUN database and on other images. Quantitative evaluations are based on Peak Signal-to-Noise Ratio (PSNR) and on Quaternion Structural Similarity (QSSIM).

Access provided by CONRICYT-eBooks. Download conference paper PDF

Grayscale images colorization with convolutional neural networks

Article 24 February 2020

Implementation of image colorization with convolutional neural network

Article 06 March 2020

Image Colorization Using CNNs

Keywords

1 Introduction

Automatic image colorization examines the problem how to add realistic colors to grayscale images without any user intervention. It has some useful applications such as colorizing old photographs or movies, artist assistance, visual effects and color recovering. On the other hand, colorization is a heavily ill-posed problem. In order to effectively colorize any images, the algorithm or the user should have enough information about the scene’s semantic composition.

As pointed out in [16], image colorization is also a good model for a huge number of applications where we want to take an arbitrary image and predict values or different distributions at each pixel of the input image, exploiting information only from this input image. This is a very common task in the image processing and pattern recognition community.

To date, deep learning techniques have shown impressive results on both high-level and low-level vision problems including image classification [1], removing phantom objects from point clouds [2], pedestrian detection [3], face detection [4], handwritten character classification [5], photo adjustment [6], etc. In recent years, deep learning based approaches appeared to address the colorization problem.

Main contributions. Image colorization algorithms can be divided into three classes: scribble-based, example-based, and learning-based. In this paper, we show a possible solution that utilizes the advantages of example-based and learning-based approaches. Unlike previous methods, we perform colorization using a deep architecture and a reference image.

Paper organization. This paper is organized as follows. In Sect. 2, the related and previous works are reviewed primarily focused on learning-based approaches. We describe our algorithm in Sect. 3. Section 4 shows experimental results and analysis. The conclusions are drawn in Sect. 5.

2 Related Works

Image colorization has been intensively studied since 1970’s. Broadly speaking, the existing algorithms can be divided into three groups: scribble-based, example-based, and learning-based approaches. In this section, we mainly concentrate on reviewing learning-based approaches.

Scribble-based approaches interpolate colors in the grayscale image based on color scribbles produced by a user or an artist. Levin et al. [7] presented an interactive colorization method which can be applied to still images and video sequences as well. The user places color scribbles on the image and these scribbles are propagated through the remaining pixels of the image. Huang et al. [8] improved further this algorithm in order to reduce color blending at image edges. Yatziv et al. [9] developed the algorithm of Levin et al. [7] in another direction. The user can provide overlapping color scribbles. Furthermore, a distance metric was proposed to measure the distance between a pixel and the color scribbles. Combinational weights belong to each scribbles which were determined based on the measured distance.

Example-based approaches require two images. These algorithms transfer color information from a colorful reference image to a grayscale target image. Reinhard et al. [10] applied simple statistical analysis to impose one image’s color characteristics on another. Welsh et al. [11] utilized on pixel intensity values and different neighborhood statistics to match the pixels of the reference image with the pixels of grayscale target image. On the other hand, Irony et al. [12] determine first for each pixel which example segment it should learn its color from. This carried out by applying a supervised classification algorithm that considers the low-level feature space of each pixel neighborhood. Then each color assignment is treated as color micro-scribbles which were the inputs to Levin et al.’s [7] algorithm. Charpiat et al. [13] predicted the expected variation of color at each pixel, thus defining a non-uniform spatial coherency criterion. Then graph cuts were applied to maximize the probability of the whole colored image at the global level. Gupta et al. [14] extracted features from the target and reference images at the resolution of superpixels. Based on different kind of features, the superpixels of the reference image were matched with the superpixels of the target image and the color information was transfered to the center of the superpixels of the target image with the help of micro color-scribbles. Then these micro-scribbles were propagated through the target image.

Learning-based approaches model the variables of the image colorization process by applying different machine learning techniques and algorithms. Bugeau and Ta [15] introduced a patch-based image colorization algorithm that takes square patches around each pixel. Patch descriptors of luminance features were extracted in order to train a model and a color prediction model with a general distance selection strategy was proposed. Deshpande et al. [16] colorize an image by optimizing a linear system that considers local predictions of color, spatial consistency, and consistency with an overall histogram. Cheng et al. [17] introduced a fully-automatic method based on a deep neural network which was trained by hand-crafted features. Three levels of features were extracted from each pixel of the training images: raw grayscale values, DAISY features [18], and high-level semantic features.

In recent years, Convolutional Neural Network based approaches appeared to tackle the colorization problem. Iizuka et al. [19] elaborated a colorization method that jointly extracts global and local features from an image and then merge them together. In [20], the authors proposed a fully automatic algorithm based on VGG-16 [21] and a two-stage Convolutional Neural Network to provide richer representation by adding semantic information from a preceding layer. Furthermore, the authors proposed Quaternion Structural Similarity [22] for quality evaluation. Zhang et al. [23] trained a Convolutional Neural Network to map from a grayscale input to a distribution of quantized color values. This algorithm was evaluated with the help of human participants asking them to distinguish between colorized and ground-truth images. In [24], the authors introduced a patch-based colorization model using two different loss functions in a vectorized Convolutional Neural Network framework. During colorization patches are extracted from the image and are colorized independently. Guided image filtering [25] is applied as postprocessing. Larsson et al. [26] processed a grayscale image through VGG-16 [21] architecture and obtained hypercolumns [27] as feature vectors. The system learns to predict hue and chroma distributions for each pixel from its hypercolumn. Deshpande et al. [28] proposed a conditional model for predicting multiple colorizations. The low dimensional embedding of color fields was learned by a Variational Autoencoder. Similarly, Cao et al. [29] worked with a conditional model but a Conditional Generative Adversarial Network was utilized to model the distribution of real-world colors. Limmer and Lensch [30] proposed a method for transferring the RGB color spectrum to near-infrared images using deep multi-scale convolutional neural networks. The transfer between RGB and near-infrared images is trained.

3 Our Approach

The objectiveness of our framework is to combine example-based and learning-based approaches in order to produce more realistic and plausible colors. To capitalize on the advantages of example-based and learning-based methods as well, we propose a novel architecture which is shown in Fig. 1. Our architecture consists of two parallel CNNs which are called Input CNN and Reference CNN. These have the same structure. In the following, this structure is firstly described and then the co-operation of the two networks is discussed.

We reimplemented the algorithm of [23] using Keras [31] deep learning library. This algorithm has some appealing properties. First of all, the authors elaborated a class rebalancing method because the distribution of ab values in natural images is biased towards low ab values. Second, colorization is treated as multinomial classification instead of regression. This means that the ab output space is quantized into bins with grid size 10 and keep the Q = 313 values which are in gamut. For all details, we refer to [23].

We used SUN database [32] to compile our training database. We denote a reference image by R and an input image by I. Formally, our database can be defined as $\mathcal {L}_i=\{(I_i, R_i)|i=1, ..., N\}$ where N is the number of image pairs and reference image $R_i$ is semantically similar to input image $I_i$. That is why we opted to utilize SUN database [32] since this dataset contains images grouped by their semantic information. Figure 2 shows the empirical distribution of pixels in ab space gathered from our database. Figure 3 illustrates the empirical and smoothed empirical distribution of ab pairs in the quantized space. These curves were determined and were applied in the training process based on the algorithm of [23].

First, we train only the Reference CNN using only the $R_i$’s from our database. We utilize ADAM optimizer [33] and early stopping [34] with the following parameters: $\alpha =0.0001$, $\beta _1=0.9$, $\beta _2=0.999$, $d=0.0$, and $\varepsilon =1e-8$ where $\alpha $ is the learning rate, $\varepsilon $ is the fuzz factor, and d is the learning rate decay over each update. Then the input CNN and the reference are trained simultaneously using the whole $\mathcal {L}_i=\{(I_i, R_i)|i=1, ..., N\}$ database. As we mentioned the input and the reference CNN have the same structure. Information is transmitted from input CNN to reference CNN and vica versa using element-wise addition operator to certain convolutional blocks (see Fig. 1). The image pairs $(I_i, R_i)_{i=1}^N$ are given to the input of the two CNNs. The values of the third convolutional block in the Reference CNN are added element-wise to those in the Input CNN. Next, the values of the fourth convolutional block in the input CNN are added to those in the Reference CNN. This process repeats to the second last convolutional block. In this process, we also applied ADAM optimizer and early stopping with the above mentioned parameters. In this way, the color information of the reference image is applied to facilitate the color prediction for the input image. On the other hand, information from the input image helps to identify the areas which holds essential information about the color scheme of the scene. The proposed framework was trained on 60.000 image pairs of the SUN database.

As pointed out in many papers [20, 23, 24, 26], Euclidean loss function is not an optimal solution because it will result in the so-called averaging problem. Namely, the system will produce grayish sepia tone effects. That is why we use a cross-entropy like loss function to compare predicted $\hat{\mathbf{Z }}\in [0,1]^{H\times W\times Q}$ against the ground truth $\mathbf Z \in [0,1]^{H\times W\times Q}$:

$$\begin{aligned} L(\hat{\mathbf{Z }}, \mathbf Z ) = -\sum _{h=1,w=1}^{H,W} v(\mathbf Z _{h,w}) \sum _{q=1}^{Q=313} \mathbf Z _{h,w,q} \cdot log(\hat{\mathbf{Z }}_{h,w,q}), \end{aligned}$$

(1)

where $Q=313$ is the number of quantized ab values (see Fig. 2), $v(\cdot )$ is a weighting term used to rebalance the loss based on color-class rarity, and H and W denote the height and the width of the training images. The weighting term $v(\cdot )$ is obtained using the smoothed empirical distribution of ab pairs in the quantized space (see Fig. 3). For all details of the weighting term, we refer to [23].

4 Experimental Results

Figure 4 presents several colorization results obtained by our proposed method with respect to the inputs, the ground-truth colorful images, and the reference images. Figure 4 also illustrates the results of [23] which were obtained using their web application (available: http://demos.algorithmia.com/colorize-photos). Note that the digital watermarks in the lower right corners were embedded by this application. From this qualitative comparison, we can see that our method is able to reduce visible artifacts, especially for detailed scenes, objects with large color variances (e.g. building). The color filling is nearly flawless. We could reduce the amount of false edges near the object boundaries. Figure 5 shows further results of our method.

Figure 6 shows a comparison with the major state-of-the-art example-based colorization algorithms such as [11,12,13,14]. It can be seen that we could produce more realistic and plausible colors than most state-of-the-art example-based colorization algorithms.

Figure 7 presents the Peak Signal-to-Noise Ratio (PSNR) distribution of our method, Cheng et al. [17], and Deshpande et al. [16]. We have measured the PSNR distribution on 1500 test images from the SUN database [32]. Note that we reimplemented for this experiment the method of [17] using Keras deep learning library [31]. In our experiment, we have applied a 33-dimensional semantic feature vector for [17] and have trained the proposed deep neural network architecture using ADAM optimizer [33] and the images of SUN database. Besides, we have used the source code (available: http://vision.cs.illinois.edu/projects/lscolor) provided by Deshpande et al. [16]. Figure 7 illustrates that the proposed method is able to improve colorization accuracy since it outperforms these two state-of-the-art algorithms.

Unfortunately, there is no widely used quality metrics which clearly indicates the quality of a colorization. Methodical quality evaluation by showing colorized images to human observers is slow, expensive, and subjective. Empirically, we have found that Quaternion Structural Similarity (QSSIM) [22] gives a good base for quantitative evaluation. It is a theoretically well based measure which has been accepted by the colorimetry research community as a potential qualification value. We have measured the QSSIM distribution on 1500 test images from the SUN database. Figure 8 presents the QSSIM distribution of our method, Cheng et al. [17], and Deshpande et al. [16]. It can be seen that the proposed method outperforms the two other state-of-the-art algorithms. A higher QSSIM values indicates better image quality. This experiment was based on the source code (available: http://www.ee.bgu.ac.il/~kolaman/QSSIM) provided by Kolaman et al. [22].

5 Conclusion

In this paper, we have introduced a novel framework which capitalizes on the advantages of example-based and learning-based colorization approaches. Specifically, we have shown a possible solution that combines the information between two CNNs in order to help the input CNN in color prediction for the input image. To this end, we have trained first a reference CNN which facilitates the identification of the specific color scheme of the input scene. We have shown that the semantic enhancement capability of a deep CNN can be switched into a colorization scheme to result in an effective image analysis and interpretation framework. The QSSIM method has been proved a superior measuring method for color modeling. There are many directions for further research. First, it is worth to generalize the proposed method for arbitrary size input images. Another direction of research would be automatizing the search for a suitable reference image to an input image.

References

Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Nagy, B., Benedek, C.: 3D CNN based phantom object removing from mobile laser scanning data. In: International Joint Conference on Neural Networks, pp. 4429–4435 (2017)
Google Scholar
Bochinski, E., Eiselein, V., Sikora, T.: Training a convolutional neural network for multi-class object detection using solely virtual world data. In: IEEE International Conference on Advanced Video and Signal Based Surveillance, pp. 278–285 (2016)
Google Scholar
Lawrence, S., Giles, C.L., Tsoi, A.C., Back, A.D.: Face recognition: a Convolutional Neural Network approach. IEEE Trans. Neural Networks 8(1), 98–113 (1997)
Article Google Scholar
Ciresan, D., Meier, U.: Multi-column deep neural networks for offline handwritten Chinese character classification. In: Proceedings of the International Joint Conference on Neural Networks, pp. 1–6 (2015)
Google Scholar
Yan, Z., Zhang, H., Wang, B., Paris, S., Yu, Y.: Automatic photo adjustment using deep learning. CoRR, abs/1412.7725 (2014)
Google Scholar
Levin, A., Lischinski, D., Weiss, Y.: Colorization using optimization. ACM Trans. Graph. 23(3), 689–694 (2004)
Article Google Scholar
Huang, Y.C., Tung, Y.S., Chen, J.C., Wang, S.W., Wu, J.L.: An adaptive edge detection based colorization algorithm and its applications. In: Proceedings of the 13th Annual ACM International Conference on Multimedia, pp. 351–354 (2005)
Google Scholar
Yatziv, L., Sapiro, G.: Fast image and video colorization using chrominance blending. IEEE Trans. Image Process. 15(5), 1120–1129 (2006)
Article Google Scholar
Reinhard, E., Ashikhmin, M., Gooch, B., Shirley, P.: Color transfer between images. IEEE Comput. Graph. Appl. 21(5), 34–41 (2001)
Article Google Scholar
Welsh, T., Ashikhmin, M., Mueller, K.: Transfering color to greyscale images. ACM Trans. Graph. 21(3), 277–280 (2002)
Article Google Scholar
Irony, R., Cohen-Or, D., Lischinski, D.: Colorization by example. In: Eurographics Symposium on Rendering (2005)
Google Scholar
Charpiat, G., Hofmann, M., Schölkopf, B.: Automatic image colorization via multimodal predictions. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5304, pp. 126–139. Springer, Heidelberg (2008). doi:10.1007/978-3-540-88690-7_10
Chapter Google Scholar
Gupta, R.K., Chia, A.Y.S., Rajan, D., Ng, E.S., Zhiyong, H.: Image colorization using similar images. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 369–378 (2012)
Google Scholar
Bugeau, A., Ta, V.T.: Patch-based image colorization. In: Proceedings of the IEEE International Conference on Pattern Recognition, pp. 3058–3061 (2012)
Google Scholar
Deshpande, A., Rock, J., Forsyth, D.: Learning large-scale automatic image colorization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 567–575 (2015)
Google Scholar
Cheng, Z., Yang, Q., Sheng, B.: Deep colorization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 415–423 (2015)
Google Scholar
Tola, E., Lepetit, V., Fua, P.: DAISY: an efficient dense descriptor applied to wide-baseline stereo. IEEE Trans. Pattern Anal. Mach. Intell. 32(5), 815–830 (2010)
Article Google Scholar
Iizuka, S., Simo-Serra, E., Ishikawa, H.: Let there be color!: joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification. ACM Trans. Graph. (TOG) 35(4), 110 (2016)
Article Google Scholar
Varga, D., Szirányi, T.: Fully automatic image colorization based on Convolutional Neural Network. In: International Conference on Pattern Recognition (2016)
Google Scholar
Simonyan, K., Zisserman, A.: Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR, abs/1409.1556 (2014)
Google Scholar
Kolaman, A., Yadid-Pecht, O.: Quaternion structural similarity: a new quality index for color images. IEEE Trans. Image Process. 21(4), 1526–1536 (2012)
Article MathSciNet Google Scholar
Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 649–666. Springer, Cham (2016). doi:10.1007/978-3-319-46487-9_40
Chapter Google Scholar
Liang, X., Su, Z., Xiao, Y., Guo, J., Luo, X., Deep patch-wise colorization model for grayscale images. SIGGRAPH ASIA 2016 Technical Briefs 13 (2016)
Google Scholar
He, K., Sun, J., Tang, X.: Guided image filtering. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 1–14. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15549-9_1
Chapter Google Scholar
Larsson, G., Maire, M., Shakhnarovich, G.: Learning representations for automatic colorization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 577–593. Springer, Cham (2016). doi:10.1007/978-3-319-46493-0_35
Chapter Google Scholar
Hariharan, B., Arbeláez, P., Girshick, R., Malik, J.: Hypercolumns for object segmentation and fine-grained localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 447–456 (2015)
Google Scholar
Deshpande, A., Lu, J., Yeh, M.C., Forsyth, D.: Learning Diverse Image Colorization. arXiv preprint arXiv:1612.01958 (2016)
Cao, Y., Zhou, Z., Zhang, W., Yu, Y.: Unsupervised Diverse Colorization via Generative Adversarial Networks. arXiv preprint arXiv:1702.06674 (2017)
Limmer, M., Lensch, H.: Infrared Colorization Using Deep Convolutional Neural Networks. arXiv preprint arXiv:1604.02245 (2016)
Chollet, F.: Keras (2015). https://github.com/fchollet/keras
Xiao, J., Hays, J., Ehinger, K., Oliva, A., Torralba, A.: SUN database: large-scale scene recognition from abbey to zoo. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3485–3492 (2010)
Google Scholar
Kingma, D., Adam, J.B.: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Girosi, F., Jones, M., Poggio, T.: Regularization theory and neural networks architectures. Neural Comput. 7(2), 219–269 (1995)
Article Google Scholar

Download references

Acknowledgment

The research was supported by the Hungarian Scientific Research Fund (No. OTKA 120499). We are very thankful to Levente Kovács for helping us with professional advices in high-performance computing.

Author information

Authors and Affiliations

MTA SZTAKI, Institute for Computer Science and Control, Budapest, Hungary
Domonkos Varga & Tamás Szirányi
Department of Networked Systems and Services, Budapest University of Technology and Economics, Budapest, Hungary
Domonkos Varga
Department of Material Handling and Logistics Systems, Budapest University of Technology and Economics, Budapest, Hungary
Tamás Szirányi

Authors

Domonkos Varga
View author publications
You can also search for this author in PubMed Google Scholar
Tamás Szirányi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Domonkos Varga .

Editor information

Editors and Affiliations

Linköping University, Linköping, Sweden
Michael Felsberg
Lund University, Lund, Sweden
Anders Heyden
University of Southern Denmark, Odense, Denmark
Norbert Krüger

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Varga, D., Szirányi, T. (2017). Twin Deep Convolutional Neural Network for Example-Based Image Colorization. In: Felsberg, M., Heyden, A., Krüger, N. (eds) Computer Analysis of Images and Patterns. CAIP 2017. Lecture Notes in Computer Science(), vol 10424. Springer, Cham. https://doi.org/10.1007/978-3-319-64689-3_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-64689-3_15
Published: 28 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64688-6
Online ISBN: 978-3-319-64689-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Twin Deep Convolutional Neural Network for Example-Based Image Colorization

Abstract

Similar content being viewed by others

Grayscale images colorization with convolutional neural networks