1 Introduction

Image super-resolution is a technique that attracts much attention and progress in recent years. Despite the great progress and achievements, still, there is no unique solution exists, in particular for high magnification ratios. Each pixel loss which used by the existence approaches does not properly capture perceptual variances between output and input images [1, 2]. Thus, for the high upscaling factor (i.e., scale factor 4 or more), it is difficult to recover the high frequencies details in the images. Generative adversarial network (GANs) is a conglomerate of deep learning and generative model that is proposed by Goodfellow et al. [3]. GANs models are known to produce realistic samples from latent space in a simple manner. In their original setting, they employ two neural networks based on adversarial training in a minimax game. Generator G is trained to produce fake samples from a noise space, whereas the discriminator learns how to make difference between fake (generated samples) and real (true data) samples. Since the advent of GANs, many works have been appeared which using GANs in different computer vision applications in particular, simulating complex data distributions such as images, videos and texts [4,5,6]. However, they suffer from a major problem of perceptual quality and also are extremely difficult to train. Given this, limits the GANs applicability, and recently some attempts have been appeared based on joint supports of the data distribution and using hierarchical models in contrast to the original GAN which is a direct model. In this paper, we propose a novel model that generalizing the GAN framework to multiple generators and discriminators, in terms of stabilizing the training process as well as improving sample diversity. Borrowing from GMAN [7, 8], we propose to employ two generators with two discriminators, which are based on an image-to-image model. The proposed architecture termed as DualGAN, is shown in Fig. 1. Similar to the regular GAN, the objective of the generators is to increase the mistake of the discriminator. Moreover, unlike the other GAN variations that use multi-generators in their structure, our proposed model simultaneously trains both the generators and the data distribution will be obtained from the mixture of their induced distributions. In terms of the fact, multiple generators may affect the trivial solution, in which all the generators attempt to generate similar sample images. Based on this observation, we address this problem by designing two discriminators in our architecture; such that, one of them determines the real or fake samples while another one acts as classifier in order to identify the related generators that generated the wrong samples. We prove that, our model is able to effectively learn complex data distribution, in order to generate real-look samples and could significantly improve the image’s quality even at the highest scale factor \(\times 8\). The main objective of this paper lies on: (i) propose a new variation generative adversarial model to train a couple of generators and discriminators with enforcing a better Jensen–Shannon (JS) divergence among the generators; (ii) optimizing objective function toward minimizing the JS divergence between the mixture of data distributions and the real data distribution by using feature matching loss; (iii) a comprehensive evaluation on real-world datasets in order to prove the effectiveness of our model with respect to other variation of GANs. The paper including introduction consists of six main sections. In next section, we review and summarize the GAN-based models. Sections 3 and 4, present the proposed architecture with the mixture generator-discriminator extension to the GAN framework. Section 5 contains the experimental results and related discussions. Finally, Sect. 6 concludes the paper. It is worth to mention that, throughout the paper, we use the following notations for sake of brevity; \(I_{\text{LR}}\) (low-resolution image) and \(I_{\text{SR}}\)(super-resolution image).

Fig. 1
figure 1

DualGAN consists of a pairs of generators and discriminators: \(G_{1} , G_{2}\) and \(D_{1} , D_{2}\). The generative model with a couple generators trains for generating realistic artificial images and the discriminative model with a couple discriminators, along with determining whether an image is real or fake, it also identifies the related generators that generate the wrong samples. We use the weight-sharing constraint for all layers of the generative models, \(g_{1} \,{\text{and}}\,g_{2}\). We also use the weights-sharing constraint for the last layer of the discriminative model, \(d_{1} \,{\text{and}}\,d_{2}\). The “weight-sharing constraint” permits the proposed model how to learn a joint data distribution of images and also reduces the model parameters at the optimal level

2 Related works

There was a drastic growth for generative models in the last few years. Substantial methods have been proposed to address the image super-resolution problems [9,10,11,12,13]. The main concept of generator adversarial network (GAN) [3] states an adversarial game between two networks: D-discriminator network and G-generator network. The generator draws the synthetic images from the noise input, and the discriminator receives input from the real and the synthetic samples and determines whether is it fake (generated by generator) or from the real image.

Moreover, GAN alternatively optimizes the generator and discriminator using stochastic gradient-based learning. However, training of GAN suffers from the main problems, as mode collapse and difficulties in the implementation and unstable results [4, 12, 14]. In the standard GAN, there is no way to control what to be generated, since there is no information for the learning generators. However, [15] proposed a new method in order to define more condition for the generator so that the generated image can be designed with desired target. Most GAN-based methods follow same structure and using a generator and a discriminator in their model with a minor variation. Some of the most realist GAN variations in this categories are: InfoGAN [16], DCGAN [13], WGAN [17], ImprovedGAN [14] and DGAN [8]. These methods in fact are straightforward to design and implementation. Recent attempts to improve the GANs results and solve the training issues are by training additional generators and discriminators. D2GAN [18] is a new approach which uses two discriminator in its architecture to find a rational distribution across the data by minimizing the KL (Kullback–Leibler) and the reverse KL divergence. Another framework is proposed by Durugkar et al. [7], which uses several discriminators to improve the generator learning. Recently, Arora et al. [19] proposed MIX + GAN approach which is another direction of GAN. The method is based on training several generators and discriminators with different parameters. However, this method is computationally expensive to train, due to lack of parameter sharing and there is no mechanism to enforce the divergence between generators. Tolstikhin et al. [20] proposed a new variation of GAN, termed as AdaGAN, to introduce a robust reweighting scheme for preparing a training data for GAN. Another model in what we follow is MAD-GAN that is proposed by Ghosh et al. [21]. It trains multiple generators with a multi-class discriminator. Their model is designed to improve the objective function of discriminator to push multiple generators toward generating diverse modes. Reed et al. [5] also proposed a GAN-based method which is able to generate 642 images and can barely generate intense object details. Accordingly, StackGAN [6] is proposed to stack two GANs in order to improve the [5] by generating 2562 images. SS-GAN [22] is another method which comprises two GANs that, one is used for generating a surface normal map, and another GAN takes input from the generated samples and noise z and then produce an output image. In [23], the author presents LR-GAN method which learns to generate image foreground and background by using different generators and a single discriminator. The authors experimentally proved that by separating the generation of foreground and background image content, they can produce sharper images. Some other researchers believe that, instead of using different generators that perform separately to produce different task, the models can use multiple generators with similar structures wherein each generators refine the details of the results from the previous generator. Then, the last generator will be generating the final result. When using this strategy, the model can share the weights and parameters among generators, and it helps to smooth the training process. LAPGAN [24] is another method that uses multiple generators to generate images from coarse to fine using Laplacian pyramid [25]. Both generators have the same structure; each generator takes a noise vector as input, and then, output will be a generated image. The only difference in the structures of these generators is the size of input and output dimension. With respect to the existing GAN-based image super-resolution techniques, that have achieved a successful progress, however, there are still some unsolved problems such as training instability and high-resolution generation [8, 26]. In terms of the fact, in this paper, we want to introduce a new mode of GANs to significantly use the potential advantages of generators and discriminators. The motivation of the proposed model is to jointly produce multiple samples, and it would increase the chance of sharing more details with model distributions. Multiple generators focus to complete the missing details for producing the higher resolution images. Multiple discriminators allow the model to accurately classify the generated samples and stabilize the model training in the best possible way. In addition, it is easy to see that the training difficulty will be decreased in the proposed methodology.

3 Dual generative adversarial network

The standard GAN involves two networks in its structure: G-generative and D-discriminative; which are simultaneously trained. Let X and Z be the true variable and latent variables. The generative model uses data distribution; \(p_{G} \left( {x, z} \right) = p_{G} \left( z \right)p_{G} \left( {x\left| z \right.} \right)\) to generate samples. If \(\left\{ {G_{i} , i \in S} \right\}\), where \(G_{i}\) is a function, the generator defines a distribution \(D_{{G_{i} }}\) from Gaussian distribution and generate h, then apply the \(G_{i}\) on the generated h and achieve \(x = G_{i} \left( h \right)\). Similarly, for the discriminator model \(\left\{{D_{j}, j \in \acute{S}} \right\}\), where \(D_{j}\) is a function from binary space [0, 1]. Training the discriminator enforces the output to get high value 1 when x is from distribution \(D_{\text{true }}\) and a low value 0 when x is from the distribution \(D_{{G_{i} }}\). The GAN framework with a G and a D can be jointly trained as [3]:

$$\mathop {\hbox{min} }\limits_{g} \mathop {\hbox{max} }\limits_{d} V\left( {d,g} \right) = {\mathbb{E}}_{{x\sim\,D_{true} }} \left[ { - \log d\left( x \right)} \right] + {\mathbb{E}}_{{z\sim\,p_{z} }} \left[ { - \log \left( {1 - d\left( {g\left( z \right)} \right)} \right)} \right]$$
(1)

Practically, the above equation will be solved by the following material:

$$\theta_{d}^{t + 1} = \theta_{d}^{t} - \alpha^{t} \nabla_{{\theta_{d} }} V\left( {d^{t} ,g^{t} } \right),\,{\text{and}}\,\theta_{g}^{t + 1} = \theta_{g}^{t} + \alpha^{t} \nabla_{{\theta_{d} }} V\left( {d^{t + 1} ,g^{t} } \right)$$
(2)

where \(\theta_{d}\), \(\theta_{g}\) are the discriminator and generator parameters, \(\alpha\) indicates the learning rate and t is number of iteration. The proposed DualGAN is illustrated in Fig. 1; our contribution in generative model is to use mixture of many distributions which is available in the training space, instead of one. The proposed model consists of two generators \(G_{1} , G_{2}\) and two discriminators \(D_{1} , D_{c }\) such as, one of the discriminator acts as multi-class classifier. A high-resolution image \(I_{\text{HR}} \in [0,1]^{\omega \times h \times c}\) is downsampled to a low-resolution image by: \(I_{\text{LR}} = \hat{d}\left( {I_{\text{HR}} } \right) \in [0,1]^{\omega \times h \times c}\), based on, the width, height and color channel, (\(\omega \times h \times c\)).

3.1 Generative models

Let \(X_{1} , X_{2 }\) be the generated images from \(G_{1}\) and \(G_{2 }\) by using \(x_{1} \sim\,p_{{x_{1} }}\), \(x_{2} \sim\,p_{{x_{2} }}\) distributions, respectively. We denote the distribution of the generators as, \(p_{{g_{1} , }} p_{{g_{2} }}\) and both the generators using multilayer perceptrons [27]:

$$g_{1} \left( z \right) = g_{1}^{m} \left( {g_{1}^{m - 1} ( \ldots g_{1}^{2} \left( {g_{1}^{1} \left( z \right))} \right)} \right)\,{\text{and}}\,{\text{for}}\, g_{2} \left( z \right) = g_{2}^{n} \left( {g_{2}^{n - 1} ( \ldots g_{2}^{2} \left( {g_{2}^{1} \left( z \right))} \right)} \right);$$
(3)

where m and n indicate the number of layers in two generators \(g_{1} , g_{2}\) with the condition \(m = {\text{or}} \ne n\). Each generator has a single distribution, and two generators together induce a mixture distribution from both; we term it as \(P_{\text{MG}}\), and its corresponding coefficient can be as \(\pi = \left[ {\pi_{1} , \pi_{2} } \right]\). As the objective of generator is to minimize the JS divergence between the mixtures of generated data distribution and the true data distribution and maximize the JS divergence among two generators. The generative models gradually decode information from more abstract to more complex details. Note that, this learning process is opposed to the discriminator. In this process \(\theta_{{g_{1} }} = \theta_{{g_{2} }}\), that means, we force the generators to have identical structures and share the weights. However, in the discriminative network, only the last layers of discriminators share the weights. In fact, the generators use the shared high-level representation for fooling the discriminator. Salimans et al. [14] proposed an approach for semi-supervised classification by using GAN model, termed as SSL-GAN. In their work, the discriminator is considered as multi-class classifier and improved the GAN convergence by optimizing the generator using feature matching loss. Here, inspired from the same work [14], we used feature matching loss in order to train the mixture data distribution in the generators.

$$L_{F} \left( x \right) = \mathop {\hbox{min} }\limits_{{\theta_{g1} ,\theta_{g2} }} {\mathbb{E}}_{{x\sim\,p_{\text{data}} \left( x \right)}} \left[ {f\left( x \right)} \right] - {\mathbb{E}}_{x\sim\,p\left( z \right)} [f\left( {G\left( {z;\theta_{G} } \right)} \right)]$$
(4)

The feature matching loss function is used to allow the generators to control the mixture data distribution which on one hand has support which does not overlap with high-density areas of the real data, but still close to the data distribution [28]. Experimentally, we observed that when the generative model is trained with a feature matching loss, Eq. 4 is able to generate samples from mixture data distribution that fall onto the data manifold and has an impressive ability to generate high-quality samples.

3.2 Discriminative model

Let \(d_{1} , d_{2}\) be the discriminators of our proposed model, in which \(d_{1}\) determines the real or fake samples, and another one acts as classifier to classify that, the samples are form which generators. Two discriminators will be defined as:

$$d_{1} \left( {x_{1} } \right) = d_{1}^{t} \left( {d_{1}^{t - 1} ( \ldots d_{1}^{2} \left( {d_{1}^{1} \left( {x_{1} } \right))} \right)} \right)\,{\text{and}}\, d_{2} \left( {x_{2} } \right) = d_{2}^{q} \left( {d_{2}^{q - 1} ( \ldots d_{2}^{2} \left( {d_{2}^{1} \left( {x_{2} } \right))} \right)} \right),$$
(5)

where t, q are the number of layers in the \(d_{1}\) and \(d_{2}\) discriminators. \(d_{1}\) Maps the input to a probability scores and then estimates the output as fake or real samples. In the next step, the output will be transferred to the \(d_{2 }\) in order to find the related generators and return the wrong samples to its corresponding. We force both the discriminators to have the same layers in their architecture to prevent the mode collapse problem, and this is achieved by sharing the weights at the last layers as: \(\theta_{{d_{1} }} = \theta_{{d_{2} }}\). Moreover, this weight-sharing helps to reduce the number of parameters in the discriminative models. Therefore, the proposed framework will be formulated as: \({\hbox{max} }_{{G_{1} ,G_{2} }} {\hbox{min} }_{{D_{1,} D_{2} }} V( {d_{1} ,d_{2} ,g_{1} ,g_{2} })\) for both the generators \(\theta_{{g_{1} }} = \theta_{{g_{2} }}\) and similarly, for the discriminators \(\theta_{{d_{1} }}^{l} = \theta_{{d_{1} }}^{l}\) which is having shared weights in the last layers, and then the function V (.) will be as:

$$\begin{aligned} & {\mathbb{E}}_{{x_{1} \sim\,px_{1} }} \left[ { - \log d_{1} \left( {x_{1} } \right)} \right] + {\mathbb{E}}_{z\sim\,pz} \left[ { - \log (1 - d_{1} \left( {g_{1} \left( z \right)} \right)} \right)] \\ & + {\mathbb{E}}_{{x_{2} \sim\,px_{2} }} \left[ { - \log d_{2} \left( {x_{2} } \right)} \right] + {\mathbb{E}}_{z\sim\,pz} \left[ { - \log \left( {1 - d_{2} \left( {g_{2} \left( z \right)} \right)} \right)} \right] \\ \end{aligned}$$
(6)

The generative models G with two generators work for synthesizing images with a mixture distribution for confusing the discriminative models. Accordingly, the discriminative model D, receive the input from G and real data distribution, tries to classify them as the training data distribution or generated data distribution and also, identify the generators that generated the wrong images. The collaboration between the generators in the generative model and discriminators in the discriminative models is based on the weight-sharing constraint. Our proposed model will be trained by backpropagation [3] with alternating gradient update steps [14].

4 Model training

Learning proposed model relies on samples which are trained from the joint data distributions. Weight-sharing constraints are an important factor in our contribution, which can enable the networks to control their common information and improve the training performance. Moreover, the sharing weight constraint allows the model to minimize the number of parameters and degrade the complexity to the original GAN. In our proposed model, all the generators are part of deep convolutional neural networks which can share the weights in all layers excluding the input layer. The input layer maps the noise z to the first hidden layer activation h. In the other side, the discriminators also employ a convolutional neural network and shares parameters in all layers except for the last layer. The generators used a sequence of upsampling layers which let us to add more details to generate a high-resolution image. However, only downsampling block is used for the discriminators. For generators \(G_{1} , G_{2}\) with their mixture weights \(Multi \left( \pi \right)\), the optimal discriminators \(\hat{D}_{1} ,\hat{D}_{2}\) yield the following equations:

$$\hat{D}_{1} = \alpha \times \frac{{p_{{{\text{true}}\,{\text{data}}}} \left( x \right)}}{{p_{G} \left( {\text{Mix}} \right)}};\,\hat{D}_{2} = \beta \times \frac{{p_{G} \left( {\text{Mix}} \right)}}{{\mathop \sum \nolimits_{j = 1}^{2} \pi p_{{G_{j} }} ({\text{Mix}})}};\,{\text{where}}\,p_{G} \left( {\text{Mix}} \right) = \mathop \sum \limits_{j = 1}^{2} \pi (p_{{G_{j} }} \left( x \right))$$
(7)

In fact, it can be seen that \(\hat{D}_{2}\) is a general case of \(\hat{D}_{1}\) which classifies the wrong samples into their corresponding generators. Based on these observations, we reformulate the objective function for the generative model as:

$${\text{LS}}_{G} = {\mathbb{E}}_{{x\sim\,p_{{{\text{true}}\,{\text{data}}}} }} \left[ {\log \frac{{p_{{{\text{true}}\,{\text{data}}}} \left( x \right)}}{{p_{{G_{1} }} \left( x \right) + p_{{G_{2} }} \left( x \right)}}} \right] + {\mathbb{E}}_{{x\sim\,p_{G} \left( x \right)}} \left[ {\log \frac{{p_{G} \left( x \right)}}{{p_{{G_{1} }} \left( x \right) + p_{{G_{2} }} \left( x \right)}}} \right] - \beta \left[ {\mathop \sum \limits_{k = 1}^{2} \pi {\mathbb{E}}_{{x\sim\,p_{{G_{k} }} \left( x \right)}} [\log \frac{{\pi p_{{G_{k} }} \left( x \right)}}{{\mathop \sum \nolimits_{j = 1}^{2} \pi p_{{G_{j} }} \left( x \right)}}} \right]]$$
(8)

As the original GAN, the objective of generators is to minimize JS divergence between the data distributions while maximize it between the generators [14]. We verify the maximal loss function by setting \(D_{1} = D_{2} = 0; \frac{{\alpha p_{{{\text{true}}\,{\text{data}}}} \left( x \right)}}{{D_{1} }} - p_{G} \left( x \right),\) and \(\frac{{\beta p_{G} \left( x \right)}}{{D_{2} }} - p_{{{\text{true}}\,{\text{data}}}} \left( x \right) = 0\). Note that, the discriminator d1 takes input from G and determines whether the samples are fake or real data. Next, the fake samples are taken as input by discriminator d2 in order to indicate the corresponding generator that generated the fake samples. The first discriminator is binary valued; however, the second discriminator acts as multi-class-classifier (depends on the number of generators, i.e., in this paper, both the discriminators have binary values, since we have only two generators in our model).

4.1 Implementation details

In the proposed model, to design the generators, we followed [27] and use the fractional length convolutional (FL-CONV) instead of standard CONV layer. Each FCONV layer followed by batch normalization and the parameterized rectified linear unit (PReLU) process [29], except the output layer, which uses feature matching loss (Eq. 4) in order to generate a desired pixel range values. However, the discriminators of our model are based on the standard convolutional layer (CONV) except the last layers which are based on fully connection layers (FC). We observed that leaky rectified linear unit (LReLU) [30] works better rather than ReLU, especially for the diverse samples which are produced by multiple generators. We also applied batch normalization in every layer, except the output layer of the discriminators which uses sigmoid units. The generators consist of “six” fractional convolutional layers while the discriminators have six convolutional layer plus two fully connection layers. The generators and discriminators are parameterized by \(\vartheta_{G} , \vartheta_{D}\), respectively. The input layer for generator Gk is parameterized by the mapping \(f\vartheta_{G} \left( z \right)\) that maps the sampled noise z to the first hidden layer activation h. TensorFlow [31] is used to implement our model, Adam optimizer [32] and momentum set to 0.0002 [32] and 0.5, respectively, also weights initialized from an isotropic Gaussian, µ (0, 0.01) and zero biases. The details of the networks are given in Tables 1 and 2. In addition, it is worth to mention that, we implemented the proposed model in a system with following features, Intel i7-6850 K CPU with a 64 GB RAM and an NVIDIA GTX Geforce 1080 Ti GPU and the operating system is Ubuntu 16.04.

Table 1 Designed generative model in DualGAN. FL-CONV indicates the fractional length convolutional; BN is the batch normalization, and PReLU represents the parametric rectified linear units
Table 2 Designed discriminative model in DualGAN. Conv indicate the convolutional layer; BN is the batch normalization, and LReLU represents the leaky rectified linear units

5 Experimental evaluation

We conduct a series of experiments to evaluate the proposed model and compare it with other related approaches. In fact, we want to visualize and evaluate the learning behavior of our model using two generators and demonstrate its stability and efficacy based on different datasets. The experiments are conducted on three widely used datasets: BSD-100, DIV2 K and CIFAR. Results and evaluations on these dataset show that our model is able to generate more faithful and more diverse samples than the baselines. We compared our proposed DualGAN with some alternative approaches. We select the baselines from CNN-based methods such as, SRCNN [11], VDSR [33], LapSRN [34] and also several known variation of GAN including, DCGAN [13], ProGAN [35], BEGAN [36], GOGAN [37], Unrolled GAN [38], GMAN [7], MAGAN [39], ACGAN [15], COGAN [27], D2GAN [18] and InfoGAN [16].

For re-implementing the baselines, we followed their source codes with the same setting as ours. From results, it is observed that, the CNN-based methods despite preserving sharp edges, they produces blurry textures, and the perceptual quality of GAN-based methods is better, even they could improve the high frequency details. In addition, we used two well-known image quality metrics: peak signal-to-noise ratio (PSNR) and structural similarity values (SSIM) [26]. The results are given in Tables 3, 4 , 5 and Figs. 2, 3, 4, 5 and 6.

Table 3 Average PSNR/SSIM for BSD-100 dataset
Table 4 Average PSNR/SSIM for CIFAR-10 dataset
Table 5 Average PSNR/SSIM for DIV2 K dataset
Fig. 2
figure 2

Visual comparison of SR results at scaling factor 8. The top images are the ground truth images. We used several baselines as D2GAN [18], ProGAN [35], UR-DGAN [38], DCGAN [13], CoGAN [27], InfoGAN [16], BEGAN [36], GMAN [7], ACGAN [15], GoGAN [37], Johnson et al. [40], MAGAN [39] and our DualGAN. However, D2GAN and SFT-GAN are capable of generating richer and visual texture in comparing with other methods. Our model yields the better results comparing others. (Zoom in for best review)

Fig. 3
figure 3

Visual comparison for 4 × SR, based on different GAN structures

Fig. 4
figure 4

Comparison of PSNR and SSIM values on the CIFAR-10 dataset using three different network structures: DCGAN [13], D2GAN [18] and our proposed DualGAN. The result is evaluated at 4×

Fig. 5
figure 5

(Left) Convergence of different methods for 4× super resolutions. We set the size of input images to 128 × 128 for all methods and the results evaluated on CIFAR-10 dataset. The baselines are: SRCNN [11], FSRCNN [45], VDSR [33], DRNN [42], SFT-GAN [44], RDN [46], SRDenseNet [41], SCN [10] MemNet [47], LapSRN [34], GAN [3], SRGAN [4], ProGAN [35], DCGAN [13], GP-GAN [43], InfroGAN [16], Johnson et al. [40], D2GAN [18] and proposed DualGAN. The GAN-based methods are indicated by red, while, CNN-based methods are indicated by blue. (Right), the trade-off between runtime and upscaling scales

Fig. 6
figure 6

Image quality improvement and comparing with other techniques at 4 × SR. The first image in both the samples is the ground truth data; next is the proposed DualGAN. The baselines are DCGAN [13], Johnson et al. [40], LapSRN [34], MemNet [47], Original GAN [3], SRGAN [4] and ProGAN [35]. The results convey that our model is able to generate the sharper images with respect to the other baselines. However, LapSRN has a worse performance and could not discover the high frequencies details properly. ProGAN and DCGAN are the second best results in order to generate the sharper and clean images

5.1 Results and comparisons

In order to demonstrate the effectiveness of our model, extensive qualitative and quantitative performance are prepared. We also train our model with different scaling factors:\(\left\{ {4 \times } \right\}, \left\{ {6 \times } \right\}\,{\text{and}}\,\left\{ {8 \times } \right\}\) between low- and high-resolution images. We used the source codes of various algorithms to evaluate the runtime on the same machine which is used to implement our model. Figure 2 shows an overview of twelve methods including the current prominent works in GAN and CNN in terms of PSNR on DIV2 K datasets which is well-suited for visual comparison, and it contains the images with sharps edges and textured regions. From the results, it observes that the GAN-based methods have a good performance on edge reconstruction; however, they suffer from blur region. Even the state of the art D2GAN [18], GoGAN [37] and DCGAN [13] does not provide clean and sharp details at the high scaling factors. While the proposed model with respect to the baselines is able to produce the sharper edges and exhibits an acceptable results at the high scaling factors. The second best results are for ACGAN [15] and BEGAN [36], while the worse visualization results are for DCGAN [40]. Note that, the results of Fig. 2 are evaluated at 8 × scaling factor. Similarly, we show visual comparison of GAN variations for 4 × in Fig. 3. It clearly observes that our method accurately reconstructs the fine lines and grid patterns.

Next, we show the quantitative results in Table 3-5, on 4 × and 8 × factors. We compared our model to several GAN- and CNN-based models, such as: SRDenseNet [41], VDSR [33], LapSR [34], DRNN [42], DCGAN [13], GP-GAN [43], D2GAN [18], SRGAN [4]. We evaluated the results on three datasets: BSD-100, CIFAR-10 and DIV2 K. The evaluation metrics are PSNR and SSIM. Our model performs favorably against the current approaches and having comparable results with GMAN [7] and StackGAN [6]. Based on BSD-100 dataset, the best results belong to GP-GAN, D2GAN and ours, at 8 scaling factor. For the CIFAR-10 dataset, at 4 scaling factor, the best results correspond to GP-GAN and DCGAN, while at the 8 scaling factor, D2GAN performs better than other baselines. Similarly, the results based on DIV2 K dataset imply that, at 4 scaling factor, SRGAN and SRDenseNet, performs better than other baselines in terms of PSNR and SSIM, while, at the 8 scaling factor, only SRGAN has a pleasant result. In sum, the methods based on GAN outpace CNN-based methods. Therefore, we can conclude that GAN-based methods are well-suited methods in image super-resolution.

In addition, to validate the effectiveness of our model comparing to other approaches, we plot the convergence curve in terms of PSNR and SSIM on the CIFAR-10 dataset. The results are given in Fig. 4. The results convey that our model requires less iteration to achieve a good result and also have a robust performance comparing to the DCGAN and D2GAN. However, state of the art D2GAN [18] does not provide a stable performance and needs more iteration to achieve comparable performance with DCGAN [13].

Execution time: we evaluated the trade-offs between the runtime and performance of PSNR on the CIFAR-10 dataset for different scaling factors. Results are plotted in Fig. 5. We evaluated the results with the same machine which we tested our model. Figure 5a, shows the performance of PSNR versus runtime for 4 scaling factors. The CNN-based methods are drawn with the blue color, and the GNN-based methods are presented with the red color (in order to make it clearer). Figure 5b, shows the performance of PSNR with different scaling factors. From the results, it observes that the speed of our model is faster than all the existing methods and has a competitive performance with SFT-GAN [44], GMAN [7] and InfoGAN [16].

Quality of the generated images: another experiments are designed to show the quality of the generated images by our model against state of the art methods in Fig. 6. We selected two practically well-suited images from DIV2 K dataset for a visual comparison since they contain sharp and smooth edges. The results convey that the proposed model clearly provides better results in comparison with the others and is able to correctly reconstruct the fine structures, grid patterns and the dark spots in the image backgrounds. The experiments proved our claim regarding the performance of the proposed DualGAN model.

6 Conclusion

In this paper, we propose a simple and effective framework, DualGAN, for fast and accurate image super resolution. The proposed model consists of two generators and discriminators which additionally extended to the GAN framework. The generators used mixture data distribution in order to generate a realistic image and the discriminators designed to accurately classify the inputs and also identify the generators that generated the wrong samples. We showed the effectiveness of our proposed model in comparison with the other variation of GAN-based methods. Our model not only has a simple implementation but also presents superior results. Using multi-generators with a mixture data distribution optimizes the networks and helps to smooth the training process. The main aspects of this work are to balance the network with a couple of generators-discriminators; proposing mixture data distribution and also train the generators with feature matching loss, which can reduce the network parameters and speed up the training. With this proposed methodology, we believe that the results are more stable and efficient rather than other popular generative models. In addition, for the future direction, we would like to estimate the number of generators and discriminators needed for a particular dataset.