Potential of generative adversarial net algorithms in image and video processing applications– a survey

Sharma, Akanksha; Jindal, Neeru; Rana, P. S.

doi:10.1007/s11042-020-09308-4

Potential of generative adversarial net algorithms in image and video processing applications– a survey

Published: 24 July 2020

Volume 79, pages 27407–27437, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Multimedia Tools and Applications Aims and scope Submit manuscript

Potential of generative adversarial net algorithms in image and video processing applications– a survey

Download PDF

Akanksha Sharma¹,
Neeru Jindal¹ &
P. S. Rana²

683 Accesses
7 Citations
Explore all metrics

Abstract

Generative Adversarial Network (GAN) has gained eminence in a very short period as it can learn deep data distributions with the help of a competitive process among two networks. GANs can synthesize images/videos from latent noise with a minimized adversarial cost function. The cost function plays a deciding factor in GAN training and thus, it is often subjected to new modifications to yield better performance. To date, numerous new GAN models have been proposed owing to changes in cost function according to applications. The main objective of this research paper is to present a gist of major GAN publications and developments in image and video field. Several publications were selected after carrying out a thorough literature survey. Beginning from trends in GAN research publications, basics, literature survey, databases for performance evaluation parameters are presented under one umbrella.

Wasserstein Divergence for GANs

Generative Adversarial Networks: A Survey on Training, Variants, and Applications

A generative adversarial network for image denoising

Article 02 May 2019

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Generative Adversarial Networks have become extremely popular in artificial intelligence and deep learning field. Along with the development of new models, numerous applications of GANs have also been proposed. The spectrum of task-specific applications is so wide that it ranges from jellyfish swarm detection to testing of gear safety in vehicles. These claims of popularity are supported by the number of research publications in this field as shown in Fig. 1. Although other generative models like variational autoencoders are also available, GANs have many advantages. GANs can handle sharp estimated density functions, eliminate deterministic bias, and generate desired samples and good compatibility with the neural architecture. Especially in the computer vision field, GANs enjoy success in many applications e.g. Image generation, image to image translation, image super-resolution and shadow GANs etc. [35, 36, 101]. Due to the availability of vast information on GANs, this paper has narrowed down its focus on GANs in the field of image/video processing. An inherent relation between image and video is that an image is a motionless picture that does not change with time, whereas a video has third dimension time and comprises moving images as shown in Figure 2.

This paper provides a critical study on all the major developments in the fields of image and video processing that utilize generative adversarial networks. Despite vast applications, the main purpose of GANs is as a target density estimator. GANs can implicitly learn the distribution of the original set and generate samples from the estimated distribution. The reason why GANs are so efficient in image synthesis is that learning/ estimating data distribution for higher dimensions is a tedious task, which requires the construction of likelihood functions. Currently, there is an enormous requirement of image/video algorithms using GANs as artificially generated images demand is increasing in deep learning simulations given by WANG et al. [36] and Gonog [35]. Figure 3 shows paper publications from 2014 to 2020 April on year basis, for image /video processing using generative adversarial networks. Papers were recognized through online manual search including all journals, conferences, review and others using keywords “image processing using generative adversarial networks” and “video processing using generative adversarial networks” separately on the semantic scholar search engine [41, 44,45,46]. Semantic Scholar is AI-backed search engine for articles and highlights the important and significant papers [25, 27, 32, 34].

There has been substantial progress in image/video applications using GANs due to their advantages and disadvantages [47, 89, 90, and]. To summarize, the contributions of the paper are:

To provide a vital understanding of the different images as well as video processing GANs, their architectures, cost functions, models that exist or are developing for driving today’s world.
In scenarios where one is interested to measure and compare the quality of the GANs; quantitative and qualitative measures of GANs are effectively presented with their pros and cons under one umbrella in this paper.
An extensive demonstration of image/video real-world applications using GANs over human activities and finally concluded with future challenges of GANs.

1.1 Basic building blocks and architecture of GAN

xGenerative Adversarial Network consists of the generator (G) and its discriminator (D) as shown in Fig. 4. The generator provides fake data by taking noise vectors as input. This fake data is given to the discriminator with the training data. The discriminator works as a simple classifier to classify the samples original or fake from the input. This is executed by assigning probabilities to the samples [51, 62, 64, 65]. The real sample has a probability of ‘1’ whereas the forged sample has ‘0’ probability. The information in the form of the gradient is backpropagated to generator networks. This task helps the generator to learn the features of the training dataset and in turn, it generates samples images that are equivalent to the statistical distribution of original distribution.

In the next step, the discriminator accepts both the original data and the generated fake samples. Now, it performs the task of decision making whether the images are fake or not, and this learning process repeats on. The discriminator estimates the ratio of densities and passes it to the generator in the form of a gradient. The features are learned cooperatively, oscillating between two blocks, the generator and its discriminator. In the starting of the min-max game, winning for discriminator is an effortless task. But as the training process continues, the game becomes more challenging for both the player because the generator starts converging towards original data [78, 80, 96, 98, 108].

1.2 Input, training and cost functions

1.2.1 Input noise

Multiple approaches are used to provide the inputs to Generator Network. The inputs to the generator can be provided throughout to any layer in the model. G can be divided into two vectors and the first noise vector can be provided at the first layer, while the second noise vector can be provided at the last layer. Another approach is to provide multiple random noise vectors to the inner hidden layers of the generator network.

1.2.2 Training

The training steps for GAN are summarized in Fig. 5. GAN is a structured probabilistic model. It consists of latent variables (noise) ‘z’ and apparent variables x (original distribution). The G function (the generator) takes ‘z’ (the noise vectors) as its input and uses θ^(G) as parameters, while the D function (the discriminator) takes ‘x’ (data samples from fake and original distribution) as input and uses θ^(D) as parameters.

The answer to the optimization of the game between the discriminator and the generator is finding its minimum value, i.e. an optimal point in parameter zone where all the other parameters have an equal or higher cost. It is solved by using Nash equilibrium, local minima J^(D) related to θ^(D) and local minima of J^(G) with respect to θ^(G) is calculated as a possible solution. By training the discriminator, an estimate is obtained for p_data(x)/p_model(x) for each point in the x domain. Computation of the vast majority of divergences and their gradients are enabled by this ratio. GANs base their supervised form of learning on this ratio to make approximations. Simultaneous stochastic gradient descent is applied for training each step. Two small (mini-batches) are selected, one from ‘x’ and the other one from z. Thus, any of the gradient-based optimization algorithms can be used to update the two gradient steps simultaneously. The most commonly used optimization algorithm is Adam.

1.2.3 Role of cost functions

The cost functions of both D and G play a crucial role in the training of the GAN model. The discriminator cost function is defined in such a way that it maximizes the probability of the sample being counterfeit, while the generator cost function maximizes the probability that the generated sample is real. The vanilla GAN model explored three approaches for generator cost function, namely, the min-max approach, the heuristic approach and the maximum likelihood approach. Newer GAN models have explored different mechanisms to calculate the distance between original and generated distribution. The original GAN paper used KL divergence and Jensen-Shannon divergence [42]. Changing the cost function leads to different training behaviour and outcomes, hence researchers have explored several methods like Earth-Movers distance [4], χ² distribution [89], etc. Cost functions of major GAN variants are concised in Table 1.

Table 1 Summary of major Discriminator cost functions

Full size table

2 Literature survey

It was Goodfellow et al. [42] who proposed the GAN firstly with the objective of an algorithm that is efficient in replicating the data distribution of original data using maximum likelihood approximation and Jensen-Shannon divergence. In practice, the two blocks, G and D were two convolutional neural networks; the former performed the task of generation while the latter performed the task of classification of samples into fake and original by allocating them probabilities between 0 and 1.

But soon after, many types of research independently discovered that GAN models lack training stability and suffer from mode collapse [122, 123, 133]. DCGAN [107] model was the first one to offer a simple solution. Instead of using the sigmoid activation function, it used the ReLU function. DCGAN also stitched the already popular deep convolutional nets [2] with the GAN model to further improve the image quality.

As generated image quality improved, more emphasis was laid on conditional image generation. Utilizing the initial GAN model, Mirza et al. [93] came up with a conditional GAN model to generate samples based on class labels. CGAN can also learn multi-modal data distribution and can be used to generate descriptive tags that are not part of the training labels. Taking inspiration from CGAN, InfoGANs [20] focused on learning detangled representation in an unsupervised manner. It works on the principle of maximizing mutual information by optimizing a lower bound of mutual information.

As using ReLU activation proved insufficient to stabilize the training, researches explored different cost functions and divergence functions. Denton et al. [30] took inspiration from the Laplacian pyramid framework and implemented it using a cascade of the convolutional network. The Laplacian pyramid helps in generating images with rough texture and appearance, and as the pyramid progresses, finer details are added. LS GAN introduced by Mao et al. [89, 90] used a new loss function for its GAN model, called the least-squares loss function. It is a distinct case of minimizing the Pearson χ² divergence. It also works on pulling the threshold or the decision boundary closer to the model of original data samples.

A major landmark was achieved by Salimans et al. [113], where the authors proposed several techniques to combat training instability, non-convergence, and mode collapse. Major GAN models and their details are summarized in Table 2. The training issues faced by researchers are summarized in table number 3 (Table 3).

Table 2 Training Challenges in Generative Adversarial Networks

Full size table

Table 3 Summary of Major GAN Models along with their datasets, performance parameters, and contributions

Full size table

WGAN [4] was a pioneer in GAN models, not only did it use the Wasserstein metric (called the earth movers distance); it also provided the concept of weight clipping. It continuously estimates the EM distance, which is useful in debugging hyper-parameter searches. WGAN’s improved version [47] provided optimization in the form of a gradient penalty. The authors found that weight clipping leads to difficulty in optimization. Due to k-lipschitz constraints, WGAN models leaned towards learning extremely simple functions and ignored higher moments. This issue was solved by using the gradient penalty in accordance with lipschitz continuity. Using the k-lipschitz continuity, where k is lipschitz constant, the issue of complicated optimization was resolved. Other changes that were made are cancelation of batch normalization for discriminator model and using the two-sided penalty in order to keep k close to 1. Improved WGAN proved to be much more stable for training GANs.

WGANs paved the way for more GAN models to utilize EM distance. Berthelot et al. [11] proposed a new method where loss is derived from EM distance to train auto-encoder based GAN called BEGAN. It is a new equilibrium enforcing method which provides a new convergence method, called proportional control theory, with a robust GAN architecture. Another model that used GAN alongside auto-encoder was EBGAN [161], where discriminator is used as energy function which allocates low energy to the zones which are closer to original data distribution and high energy elsewhere.

Improved WGAN model heavily influenced researchers to come up with new normalization and penalty methods. One such model was proposed by Kodali et al. [63] where GAN training was studied as regret minimization, where a new gradient penalty method called Deep Regret Analytic GAN, was used. Miyato et al. [94] suggested the spectral normalization technique that embodies weight normalization which has been regarded as easy to include in existing models as compared to weight normalization, weight clipping, and gradient penalty.

Another improvement was proposed by Alexia Jolicoeur-Martineau [57] in the form of relativistic GAN and relativistic averaged GANs. The author argued that the probability of real data being real should be decreased as the generator becomes more efficient. Experimental results showed that RaGANs with gradient penalty outperformed WGAN-GP and reduced the time to reach the state-of-the-art result by four times. RaGAN even produced better quality pictures from a very small set (n = 2011), which is not possible with LSGAN.

3 Current trends

The basic objective of generative adversarial networks is image generation. However, GANs have been molded to perform a plethora of applications in several fields. Due to flexible nature, numerous GAN models have been combined with other techniques such as attention [158], relativism [57], etc. to perform specific tasks. In most cases, GANs have outperformed existing models. In the present work, the emphasis is laid on applications of GAN in the field of image/video applications, which are discussed henceforth [138, 147].

3.1 Image processing

GANs are basically synthetic image generation algorithms. Over the years, they have been used to perform several tasks in the field of image processing.

3.1.1 Image generation

As the primary task of GAN is image generation, it has been summarized in Table 4. The rest of the applications are discussed in upcoming sections.

Table 4 Summary of Major GAN Models used for Image Translation

Full size table

3.1.2 Inpainting

Image inpainting is used to fill relevant data in images which are incomplete, damaged or missing information without any prior knowledge. Among the most popular GANs which incorporate inpainting are PatchGAN and its derivatives. Demir et al. [28] proposed PG GAN which combined two previous versions of GAN, namely Global or GGAN and patchGAN. PGGAN shares its layers with both these GAN networks but later bifurcates to produce two adversarial losses that feed the generator network. Yu et al. [148] presented a new method for inpainting using gated convolution which has masks and inputs. It uses a dynamic feature selection mechanism. Introducing a new GAN model called SNPatchGAN as the authors used spectral normalized discriminators. It can eliminate undesirable or distracting objects, improve image layouts, clean-up watermarks and generate a new one as well. Later Yu et al. [149] proposed another GAN model that was capable of utilizing surrounding image features for better predictions. It is capable of handling multiple, multi-size holes at arbitrary locations. The above mentioned GAN models are used for 2D image inpainting. Wang et al. [132] proposed a 3D GAN to carry out 3D shape inpainting, which utilized all the tools including 3D Encoder - Decoder, GAN along with a LRCN. The 3D models are processed in the form of 2D shapes. While the GAN framework captures the global 3D structure, the LRCN produces the finer details.

3.1.3 Image translation

The endeavor to achieve translation requires more effort because the context has to be preserved while doing so. Copious amounts of algorithms have been deployed to achieve near-perfect translation. Inspired from CGAN, pix-2-pix is capable of generating images from label information, reconstruction of objects from edge information, filling and coloring the image, etc. It has become popular among researchers and has been implemented on different types of data, as it is capable of adapting its loss according to the task at hand. It produces extraordinary results, especially for those translation tasks which involve highly structured graphical outputs.

Image to image translation is implemented by mapping between the images by training the set of image pairs. Zhu et al. [164] presented a novel algorithm to translate the images when paired samples are not available. Cycle uniformity loss was used to implement this as this type of mapping is under-constrained. Inverse mapping was also used in the same algorithm. This GAN is known as Cycle GAN due to its cycle consistency loss in action. It also performed object transfiguration task, photo enhancement, season and style transfer, etc. Although compelling results were obtained, they were not uniform for all cases and applications. It succeeds when color and texture changes are required. However, in tasks that require geometric changes, the performance was subpar and left scope for further improvement. Another popular GAN model for cross-domain style transfer is DISCO GAN [60]. It addressed the task of discovering the cross-domain relation when provided with unpaired data. Using the discovered cross-domain relations, disco GAN successfully transferred style from one domain to the other. It preserved the key attributes, like the identity of the face and orientation. However, it does not handle mixed modalities well. It worked without open pair labels and then learns to relate the datasets from diverse domains.

3.1.4 Super resolution

Super-resolution is used to enhance the visual quality and details in an image given in Table 5. Ledig et al. [66] solved the problem of improving the texture details on a large scale while super resolving. The authors use perceptual loss which is the blending of content and adversarial loss. Content loss played a critical part in super-resolution as it takes into account the perceptual similarity rather than pixel similarity. Wang et al. [129] made significant improvements in SRGAN and introduced Enhanced SRGAN. The authors changed both the loss functions as well as the network architecture of SRGAN. They implemented the technique used in RaGAN to use a relativistic discriminator. While updating the network architecture, they used Residual in Residual Dense Block as the building block. To achieve better brightness consistency, improving perceptual loss and texture recovery, features extraction is performed before activation in ESRGAN. This provided improved visual quality with new realistic textures.

Table 5 Summary of major GAN models used for Image Resolution Improvement

Full size table

3.1.5 Segmentation

Segmentation is performed in image processing and multiple algorithms have been put to use in this regard. Ehsani et al. [105] took a new leap with SeGAN in a bid to complete the appearance of occluded objects from scenes. To complete this task, the knowledge of “which pixels to paint?” and “what color to paint them?” is crucial. These two questions result in the segmentation of the invisible parts and then the generation of invisible parts. SeGAN optimizes both these tasks jointly. SeGAN is trained on synthetic photorealistic images and can reliably segment natural images. It is also capable of depth layering using the occluder occluded relationships. A comparison of Typically used Image GANs is given in a summarized way in Table 6.

Table 6 Advantages, Disadvantages and features of typical GANs [17]

Full size table

3.1.6 Real-world image applications

There is a rapid growth of applications deploying real-time image applications using generative adversarial networks. GAN can generate real-like image samples using random latent vector z. There is no need to know real data distribution and assumption of mathematical conditions. These are the main reasons that GAN is used in several academic/engineering/almost in every field. Several researchers have presented prominent applications of GANs in real-world image processing applications. [2, 3, 77, 167].

High-resolution human face images can be generated from low-resolution images while up-sampling and using the trained model inferring the photo-realistic details. Bulat et al. [15] proposed real-world face super-resolution because many existing methods remain to fail to generate good results while implemented on real-world low-resolution and low-quality face images. To solve this problem, a two-stage process was presented, in which firstly a High-to-Low Generative Adversarial Network (GAN) is trained for degradation and downsample high-resolution images were required during training. After this, the output of the network is given to Low-to-High GAN to train image super-resolution using these time extirpated low and high-resolution images. Figure 6 shows the simulation results of face super resolution for low resolution faces in the real-world.

Realistic pedestrian images in real scenes were generated using the GAN model in 2018 [102] and then train the CNN-based pedestrian detector using this augmented data. Generative Adversarial Network (GAN) of multiple discriminators was used, with the purpose to generate realistic pedestrians and learn the background simultaneously. To solve the issue of different sizes of the pedestrians, a Spatial Pyramid Pooling layer was used in the discriminator. Simulations on cross-dataset were also performed i.e., training the model on one dataset and testing on another dataset. It was concluded that PSGAN is also able to produce well with improvement in the performance of CNN-based detectors. Figure 7 shows the PS-GAN model used to generate realistic pedestrian images.

WaterGAN [73] is a water generative adversarial network used to generate underwater realistic images from the in-air image and along this color of monocular underwater images was also corrected using the unsupervised pipeline as shown in Fig. 8. High-resolution images can be captured using autonomous cameras and remotely operated vehicles to map the seafloor. With WaterGAN, a large training dataset was generated for raw underwater as well for true color in-air. It served as input to a novel network for the color correction of underwater images.

3.2 Video processing

GANs have also been utilized in video processing. Some of its major applications are discussed in the upcoming sections. The applications are also summarized and important publications are listed in Table 8.

3.2.1 Frame generation and prediction

Intermediate frame generation is an important task. Previous methods that were deployed to perform this task using interpolation yielded videos of less quality with extreme blurriness. Keyframe generation is an important research area as it is used in slow motion filming, compressing videos, forensic analysis etc. Wen et al. [134] used two concatenated GANs for frame generation. One GAN captured the motions while the other one generated the frame details. The novel technique used three losses, namely, adversarial loss, normalized product correlation loss and gradient difference.

For both video generation and recognition tasks, a frame transform model is needed. However, due to the availability of a large number of ways for frames and object change, the model of dynamics is a challenging task. Vondrick et al. [124] presented a GAN video for separating the scene’s foreground and background video using spatiotemporal convolutional architecture. Simulation results prove that this model generates full frame rate tiny video. It has the utility to predict plausible future of still images. Moreover, visualizations and experiments also show that the model learns suitable features with minimum supervision. Mathieu et al. [92] trained CNN to produce future frames when an input sequence is given. Three feature learning strategies were also given to solve the problem of blur prediction observed from Mean Squared Error (MSE). UCF101 dataset was used in this work to compare the predictions. Tulyakov et al. [121] suggested MoCoGAN, Motion and content decomposed Generative Adversarial Network for the generation of video. It will generate a video by mapping random vectors series to video frames series. A random vector has content and motion parts. The content part remains fixed and the motion is kept as a variable process. Image and video discriminators use adversarial learning scheme. Experimental results on the various dataset with performance comparable to the existing approaches, verify the efficacy of the framework.

3.2.2 Video De-blurring

Video de-blurring is a difficult task as it requires modeling along with the spatial domain which comprises image planes and temporal domain which comprises of the neighboring frames from the video. Another challenge is the retrieval of sharp images along the lines of pixel-wise error. To combat these issues, Zhang et al. [160] suggested that 3D convolution can be used in the spatial-temporal domains using DBLRNet for video de-blurring. To address the problem of the generation of sharp images, the DBLRNet was used as the generator in the GAN architecture. In addition to the regular adversarial loss, a content loss was also used. This GAN was named de-blurring GAN and was tested on a benchmark dataset, achieving better results.

Shen et al. [115, 116] presented an encoder and decoder triple-branch architecture. For sharpening foreground (FG) and background (BG) details; two branches were learned, and the last branch produces harmonious. This model was further endowed with a supervised, human-aware awareness mechanism up to end fashion. To encode FG human information, the soft mask is used and it used the FG or BG decoder branches to give attention to domains. To further get advantage from Human-aware Deblurring, the HIDE dataset, having blurry and sharp image pairs was also proposed. HIDE covered a large number of scenes, motion patterns, and background scenes.

3.2.3 Haze removal

Pang et al. [104] proposed the Haze Removal GAN or the HRGAN for removing Haze from videos in order to provide better visual quality in surveillance videos and their analysis. It contains a specialized generator and discriminator network. The generator works on an estimation of transmission maps, haze-free images and atmospheric light all at once. Changes were also proposed to cover adversarial loss and pixel-wise loss all created by the discriminator. The authors demonstrated the superiority of HRGAN against state-of-the-art de-hazing algorithms.

3.2.4 De-identification

xGANs have also been used to hide information in videos. Brkic et al. [14] used GANs for person de-identification which means that removal of non-biometric features and replacing them with the ones generated via GANs. The non-biometric features include clothing colors, hairstyles etc. In some cases, even the faces can be replaced. Such tools are very helpful when a person’s information is to be concealed as simple methods like video de-blurring do not offer much safety. The de-identified videos are also immune to re-identification attacks.

3.2.5 Video super-resolution

Lucas et al. [83] made the first of its kind attempt to use GAN for video super resolution. They made a new GAN model called the VSRResFeatGAN model where the generator is named VSRResNet. The generator is pre-trained with the mean square loss which in turn leads to a better performance quantitatively as compared to other VSR models. This model of GAN used a new evaluation metric called the PercepDist metric to accurately examine the perceptual quality of videos as opposed to formerly used SSIM metrics.

Chen et al. in 2017 [21] suggested a video super resolution framework using GAN with temporal information fusion. It was observed that the temporal information of consecutive video frames contributes to the task of video super-resolution. The main contribution of this work was twofold: a new generator architecture was suggested with various mechanisms of temporal fusion of information, with early fusion, slow fusion as well as 3D convolution, Also compared with others as shown in Fig. 9. Then a discriminator architecture based on SRGAN with an adaptive training routine was applied to train the GAN and to keep the burden of hand-tuning easy for the large number of model parameters. Figure 7 shows the video frames of the using temporal fusion, bicubic interpolation and per-frame based SRGAN.

Several GANs are used nowadays for video resolution according to the requirement of the application. But to differentiate between them, PSNR is selected as a common parameter given in Table 7.

Table 7 Difference in super-resolution Video GANs based on evaluation parameters

Full size table

3.2.6 Real-world video applications

To spread out from image processing to video is quite a difficult task due to three dimensions of videos, limitation of memory and training stability introduces challenges. In 2019 [53], video enhancement using divide-and-conquer approach using adversarial learning was suggested, which divide and merge on perception-based, frequency-based as well as dimension-based problems. Mainly photo enhancement process was decomposed into multiple problems, which was recovered back using bottom to up. At the top-level, to learn additive and multiplicative components; a perception-based division was presented. It was required to convert a low-quality image/video into high-quality. However, at the intermediate level, frequency-based division using a generative adversarial network was used to supervise photo enhancement. The lower level used a dimension-based division to enable the GAN model for better estimation of the distribution distance at multiple data and to train the GAN model.

Video generation was divided into frame and sequence generation using GANs [53], and the task became easy to solve as well. For this, a two-step training scheme was used: a generator was trained with static frames as shown in Fig. 10. Afterward, to generate natural look scenes, a recurrent model was trained from the previously trained frames generator. Both training steps used the adversarial steps. However, to avoid training instabilities while using GANs, an approach of multiple discriminators was used.

Chu et al. [24] proposed a temporally self-supervised algorithm and adversarial learning to get temporally coherent solutions without loss of spatial details. Ping-Pong loss was suggested for better long-term temporal consistency. It efficiently avoids artifacts in the recurrent networks without depressing features. Moreover, the progressive growth of Generative Adversarial Networks is shown in Fig. 11 for higher resolution video generation in 2018 [1]. In particular, videos of low resolution and short-duration were generated, and then gradually increase in resolution and duration by adding new spatio-temporal convolutional layers was performed. The progressive model learns spatio-temporal information and generates higher resolution video. Table 8 presents major Publications of Generative Adversarial Networks in various fields.

Table 8 Major Publications of Generative Adversarial Networks in various fields

Full size table

4 Performance evaluation

With the availability of a plethora of GAN models for different applications, it has been observed that a suitable metric GAN’s evaluation is still a challenge, for comparison between generative models. However, still there exists no globally accepted benchmark metrics for evaluating the performance of a GAN architecture in all aspects. But, there are some essential and desirable characteristics of GAN evaluation metrics as defined below. An effective GAN evaluation metric should possess the following characteristics.

It should favor models that create highly distinguishable generated samples from real ones.
It should be a sensitive over-fitting of the model.
It should have well-defined boundary values.
It must be sensitive to image distortions and transformations.
It should be able to match human perceptual judgments and rankings of models.
It must have low computational complexity. GANs performance is measured using qualitatively and quantitative measures.

Qualitative evaluation is visual check by humans which includes naturalness, shapes, perspective, and structure as well. It is the common and most used way to evaluate GANs as given in Table 9.

Table 9 Summary of Qualitative Performance Evaluation Metrics used in image GANs [13, 31]

Full size table

While qualitative measures help to inspect models, it has some drawbacks also. Firstly, image quality evaluation with human vision is cumbersome and biased [76]. It is difficult to replicate and does not reflect the capacity of the model. Secondly, the high variance in human inspectors makes it compulsory to be normal over a huge number of subjects [29]. For instance, it also fails to observe whether a model drops modes. Actually, mode dropping helps in visual sample quality. Hence quantitative measures have an eminent part in GANs evaluation as summarized in Table 10 [13]. It will be important to note that most evaluation schemes do not examine the disentanglement in the latent zones. The table also represents relative ratings in terms of high, moderate and low. “-” means the value is missing or unknown and requires further research. “*” shows that there are multiple scores available in that group measure.

Table 10 Summary of Quantitative Performance Evaluation Metrics used in GANs [13, 31, 36]

Full size table

4.1 Pros and cons

Based on the above analysis, the advantages and inherent limitations of the most significant evaluation metrics can be summarized, and the conditions under which they produce meaningful results. Some metrics enable us to study the problem of over-fitting, perform model selection on GAN models and compare GAN models without resorting to human evaluation based on selected samples. There is no particular standard to select the best score. Basically, several scores evaluate aspects of the image generation process, rather it is very difficult that a single score may be used to measure. However, some measure is more considered than others. In the over-fitting case, nearest neighbor visualization and rapid categorization quality measures are mostly used. Overall, it is an open problem to find a measure for diversity and visual fidelity evaluation simultaneously. Diversity denotes that all modes can be covered whereas visual fidelity implies that there should be a high likelihood for generated samples. Parzen window estimation is an example of likelihood favours trivial. But, it almost fails to estimate the true likelihood for high dimensional spaces. IS and FID are two mostly used scores, that depend on deep networks (pre-trained) to equate original and generated samples. IS calculates correlation and diversity of generated images with quality. However, it can evaluate Pg only as an image generation model in comparison to its similarity to Pr. It may encourage GAN models to learn sharp images; in spite of Pr. Some evaluation methods like MS-SSIM try to evaluate the diversity of generated images. It can detect mode collapse; as well as how can a generator detects the true data distribution. MMD is always able to categorize generative/noise images from real images. Both have low computational complexity. Given these advantages, even though MMD is biased, still it is recommended. Figure 12 shows the comparison of typical GANs for performance parameters F1, Precision and Recall. where the best F1 score for a fixed model is selected and vary the budget. It is observed that even for this simple task, GAN models struggle to achieve a high F1 score. Analogous plots for precision or recall for various thresholds have been given also.

5 Future challenges

GANs have become extremely popular in a very short duration of time, making the world of GANs a magnum opus in itself. These are robust algorithms that can be manipulated and molded to suit according to requirements. Owing to this ability, numerous application-specific GANs have been developed in diverse fields. In the future, more of such applications can be developed and amalgamated with newer models and mechanisms like attention and relativism.

In the future, GANs can be used in the following potential resolutions:
Creating infographics from text
Generating website designs:
Compressing data
Drug discovery and development
Generating music
Researches can find a more stable distance metric for GAN loss function.
As GAN lack any universal performance metric, there is an urgent need to invent one as soon as possible.
The newest deep learning mechanisms are the attention mechanism [154] and relativism. It will be great to see an amalgam of both these new technologies in the future along with reinforcement learning [6].
Best aspect of GANs is that the general models can be easily molded to form new variants or new application specific variants in miscellaneous fields of audio processing, biological sciences, medical science, astronomy etc.
Mode Collapse: An open-ended futuristic research area is the tendency of the system to collapse, also called mode collapse. The system is a combination of two interacting entities. Feedback and input from each are pivotal to the survival of the other. In this situation, they might go on forever or failure of one might cause overall failure.
Equilibrium: From the system’s perspective, the system is designed to minimize the cost function of each generator and the discriminator but it isn’t designed to reach Nash Equilibrium which causes the system to keep going without reaching anywhere. Hence, the convergence of the system is not guaranteed. Hence another open-ended research area is algorithms that help a system reach equilibrium.
Repetition: GANs often fall into a trap wherein when one image is accepted by the discriminator i.e. manages to fool the discriminator, they either create the same image again and again or give minor variations of the image as output. This limits the diversity in the output and introduces inflexibility in the system.
Evaluation metrics: As of now, there is a lack of proper evaluation metrics for GANs. The ones being used are nearest neighbor algorithm for measuring the distance between the produced image/sample and the training set.
Training instability—Saddle points become indefinite in GANs. So the optimal solution is to find a saddle point than its local minimum.

6 Conclusion

In this paper, image/video processing GANs have been thoroughly reviewed. Publication trends, development, various new variants of GAN and applications have been discussed in-depth in an attempt to bring all relevant information under one umbrella. The recent rush among the scientific community to utilize GANs in core as well as interdisciplinary fields is proof that GANs have a very promising future. Their ability to deal with missing data and unlabelled data is also very peculiar. They have obvious advantages over other generative models. However, they have also introduced some new problems. Their training instability, mode collapse, and non-convergence have led to the development of various new models. But in a paper by Google brain, after a thorough examination, authors reached the conclusion that it cannot be established that any of the new variants have any consistent upper hand over the initially proposed vanilla GAN algorithm. Conclusions can be summarized as below:

It can be very easily concluded that GANs have a plethora of applications; however, there is still a need for an algorithm that provides stability in GAN training. Much of the doubts remain due unavailability of mathematical proofs in support of the GAN mechanism. Some authors have tried to investigate the underlying cause of training failures and mode collapse, but it is more of speculation rather than a solution. In terms of stable training, the loss function plays an important role and various loss functions have been given to deal with this viewpoint. Having reviewed different types of loss functions, it is observed that spectral normalization has good generalization i.e., it is able to be applied to every GAN, is easy to be implemented and has a very low computational cost.
Despite numerous old and new performance evaluation metrics, GANs lack a universal metric that keeps all the parameters in view. This paper has given the strengths and limitations of quantitative and qualitative performance measures used for evaluating GANs. The current favorites among researchers, IS and FID, both fail to represent all the information required to represent while analyzing GANs. Ultimately some directions for future research have been concluded for evaluation measures. There should be comparative empirical and analytical studies for evaluation measures in the future. A code repository for evaluation measures must be available on authenticated websites for researchers.

Finally, it is concluded that there are many prospects for future research and applications in many fields in particular.

References

Acharya D (2018) Towards High Resolution Video Generation with Progressive Growing of Sliced Wasserstein GANs. arXiv,1–22
Agnese J, Herrera J, Tao H, Zhu X (2020) A survey and taxonomy of adversarial neural networks for text-to-image synthesis. Data Min Knowl Disc. https://doi.org/10.1002/widm.1345
Alqahtani H, Kavakli-Thorne M, Kumar G (2019) Applications of generative adversarial networks (GANs): an updated review. Archives Computat Methods Eng. https://doi.org/10.1007/s11831-019-09388-y.1-28
Arjovsky M, Chintala S, Bottou L (2017) Wasserstein GAN. arXiv preprint arXiv:1701.07875
Arora S, Zhang Y (2017) Do GANs actually learn the distribution? An empirical study. arXiv preprint arXiv:1706.08224
Azar MG, Munos R, Kappen HJ (2013) Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model. Mach Learn 91(3):325–349
MathSciNet MATH Google Scholar
Bansal A, Ma S, Ramanan D, Sheikh Y (2018) Recycle-GAN: Unsupervised video retargeting. In Proceedings of the European Conference on Computer Vision, pp 119–135
BenTaieb A, Hamarneh G (2018) Adversarial stain transfer for histopathology image analysis. IEEE Trans Med Imaging 37(3):792–802
Google Scholar
Berg T, Liu J, Woo Lee S, Alexander ML, Jacobs DW, Belhumeur PN (2014) Birdsnap: Large-scale fine-grained visual categorization of birds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2011–2018
Bermudez JD, Happ PN, Feitosa RQ, Oliveira DA (2019) Synthesis of multispectral optical images from SAR/optical multitemporal data using conditional generative adversarial networks. IEEE Geosci Remote Sens Lett 16:1220–1224
Google Scholar
Berthelot D, Schumm T, Metz L (2017) BeGAN: boundary equilibrium generative adversarial networks. arXiv preprint arXiv:1703.10717
Bhattacharjee P, Das S (2017) Temporal coherency based criteria for predicting video frames using deep multi-stage generative adversarial networks. In advances in neural information processing systems, pp 4268-4277
Borji A (2018) Pros and cons of GAN evaluation measures. Comput Vis Image Underst
Brkić K, Hrkać T, Kalafatić Z, Sikirić I (2017) Face hairstyle and clothing colour de-identification in video sequences. IET Signal Process 11(9):1062–1068
Google Scholar
Bulat A, Yang J, Tzimiropoulos G (2018) To learn image superresolution, use a GAN to learn how to do image degradation first. ECCV, 185–200
Cao J, Hu Y, Yu B, He R, Sun Z (2019) 3D aided duet GANs for multi-view face image synthesis. IEEE Trans Inform Forensics Secur 14:2028–2042
Google Scholar
Cao Y, Jia LL, Chen YX, Lin N, Yang C, Zhang B, Liu Z, Li X, Dai H (2019) Recent advances of generative adversarial networks in computer vision. IEEE Access, 14985–15006, 7
Chang W, Yang G, Yu J, Liang Z (2018) Real-time segmentation of various insulators using generative adversarial networks. IET Comput Vis 12(5):596–602
Google Scholar
Che T, Li Y, Jacob AP, Bengio Y, Li W (2016) Mode regularized generative adversarial networks. arXiv preprint arXiv:1612.02136
Chen X, Duan Y, Houthooft R, Schulman J, Sutskever I, Abbeel P (2016) InfoGAN: interpretable representation learning by information maximizing generative adversarial nets. In Advances in Neural Information Processing Systems, pp 2172–2180
Chen K Y, Lu C Y, Xing Y (2017) Video super-resolution using temporal fusion generative adversarial network
Chen X, Yu J, Kong S, Wu Z, Fang X, Wen L (2019) Towards real-time advancement of underwater visual quality with GAN. IEEE Trans Ind Electron
Choi Y, Choi M, Kim M, Ha JW, Kim S, Choo J (2018) StarGAN: Unified Generative Adversarial Networks for multi-domain image-to-image translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8789–8797
Chu M, Xie Y, Mayer J, Leal-Taixe L, Thuerey N (2018)Learning Temporal Coherence via Selfsupervision for GAN-based Video generation. arXiv,1–22
Coates A, Ng A, Lee H (2011) An analysis of single-layer Networks in unsupervised feature learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp 215–223
Costa P, Galdran A, Meyer MI, Niemeijer M, Abràmoff M, Mendonça AM, Campilho A (2018) End-to-end Adversarial retinal image synthesisIEEE transactions on medical imaging 37(3):781–791
Cousins S, Shawe-Taylor J (2017) High-probability minimax probability machines. Mach Learn 106(6):863–886
MathSciNet MATH Google Scholar
Demir U, Unal G (2018) Patch-based image inpainting with generative adversarial networks. arXiv preprint arXiv:1803.07422
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database
Denton EL, Chintala, Fergus R (2015) Deep generative image models using a laplacian pyramid of adversarial networks. In Advances in Neural Information Processing Systems, pp 1486–1494
Dinh L, Sohl-Dickstein J, Bengio S (2016) Density estimation using real nvp. arXiv preprint arXiv:1605.08803
Donahue J, Krähenbühl P, Darrell T (2016) Adversarial feature learning. arXiv preprint arXiv:1605.09782
Dong J, Yin R, Sun X, Li Q, Yang Y, Qin X (2019) Inpainting of remote sensing SST images with deep convolutional generative adversarial network. IEEE Geosci Remote Sens Lett 16(2):173–177
Google Scholar
Fahlman SE, Hinton GE, Sejnowski TJ (1983) Massively parallel architectures for Al: NETL thistle and Boltzmann machines. In National Conference on Artificial Intelligence
Frey BJ, Hinton GE, Dayan P (1996) Does the wake-sleep algorithm produce good density estimators? In Advances in neural information processing systems, pp 661–667
Frey BJ, Brendan JF, Frey BJ (1998) Graphical models for machine learning and digital communication. MIT press
Gao Y, Liu Y, Wang Y, Shi Z, Yu J (2019) A universal intensity standardization method based on a many-to-one weak-paired cycle generative adversarial network for magnetic resonance images. IEEE Trans Med Imaging 38:2059–2069
Google Scholar
Ge H, Yao Y, Chen Z, Sun L (2018) Unsupervised transformation network based on GANs for target-domain oriented image translation. IEEE Access 6:61342–61350
Google Scholar
Ghamisi P, Yokoya N (2018) Img2dsm: height simulation from single imagery using conditional generative adversarial net. IEEE Geosci Remote Sens Lett 15(5):794–798
Google Scholar
Gong M, Niu X, Zhang P, Li Z (2017) Generative Adversarial Networks for change detection in multispectral imagery. IEEE Geosci Remote Sens Lett 14(12):2310–2314
Google Scholar
Goodfellow IJ, Bulatov Y, Ibarz J, Arnoud S, Shet V (2013) Multi-digit number recognition from street view imagery using deep convolutional neural networks. arXiv preprint arXiv:1312.6082
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S and Bengio Y (2014) Generative Adversarial nets. In Advances in neural information processing systems, pp 2672–2680
Gretton A, Borgwardt KM, Rasch MJ, Schölkopf B, Smola A (2012) A kernel two-sample test. J Mach Learn Res 13:723–773
MathSciNet MATH Google Scholar
Griffin G, Holub A, Perona P (2007) Caltech-101 object category dataset
Griffin G, Holub A, Perona P (2007) Caltech-256 object category dataset
Grzegorczyk M (2016) A non-homogeneous dynamic Bayesian network with a hidden Markov model dependency structure among the temporal data points. Mach Learn 102(2):155–207
MathSciNet MATH Google Scholar
Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville AC (2017) Improved training of wasserstein GANs. In Advances in Neural Information Processing Systems, pp 5767–5777
Gurumurthy S, Kiran Sarvadevabhatla R, Venkatesh Babu R (2017) DeliGAN: Generative Adversarial Networks for diverse and limited data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 166–174
He Y, Zhang J, Shan H, Wang L (2019) Multi-task GANs for view-specific feature learning in gait recognition. IEEE Trans Inform Forensics Secur 14(1):102–113
Google Scholar
Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S (2017) GANs trained by a two time-scale update rule converge to a local Nash equilibriumIn advances in neural information processing systems, pp 6626-6637
Hinton GE, Sejnowski TJ (1986) Learning and relearning in Boltzmann machines parallel distributed processing: explorations in the microstructure of cognition 1(282-317) 2
Hu B, Tang Y, Chang EI, Fan Y, Lai M, Xu Y (2017) Unsupervised learning for cell-level visual representation in histopathology images with generative adversarial networks. arXiv preprint arXiv:1711.11317
Huang Z, Paudel DP, Li G, Wu J, Timofte R, Gool LV (2019) Divide-and-Conquer Adversarial Learning for High-resolution Image and Video enhancement, arXiv,1–17
Huo Y, Xu Z, Bao S, Bermudez C, Moon H, Parvathaneni P, Landman BA (2018) Splenomegaly segmentation on multi-modal MRI using deep convolutional networks. IEEE Trans Med Imaging
Im DJ, Kim CD, Jiang H, Memisevic R (2016) Generating images with recurrent adversarial networks. arXiv preprint arXiv:1602.05110
Isola P, Zhu JY, Zhou T, Efros AA (2017) Image-to-image translation with conditional Adversarial Networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1125–1134
Jolicoeur-Martineau A (2018) The relativistic discriminator: a key element missing from standard GAN. arXiv preprint arXiv:1807.00734
Juefei-Xu F, Boddeti VN, Savvides M (2017) GANg of GANs: generative adversarial networks with maximum margin ranking. arXiv preprint arXiv:1704.04865
Karras T, Aila T, Laine S, Lehtinen J (2017) Progressive growing of GANs for improved quality and stability and variation. arXiv preprint arXiv:1710.10196
Kim T, Cha M, Kim H, Lee JK, Kim J (2017) Learning to discover cross-domain relations with Generative Adversarial Networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp 1857–1865. JMLRorg
Kim D, Jang HU, Mun SM, Choi S, Lee HK (2018) Median filtered image restoration and anti-forensics using adversarial networks. IEEE Signal Process Lett 25(2):278–282
Google Scholar
Kingma DP, Salimans T, Jozefowicz R, Chen X, Sutskever I, Welling M (2016) Improving variational inference with inverse autoregressive flow. In Advances in Neural Information Processing Systems. arXiv:1606.04934
Kodali N, Abernethy J, Hays J, Kira Z (2017) On convergence and stability of GANs. arXiv preprint arXiv:1705.07215
Krizhevsky A, Hinton G (2010) Convolutional deep belief networks on cifar-10. Unpublished manuscript 40(7)
LeCun Y (1998) The MNIST database of handwritten digits. https://yann.lecun.com/exdb/mnist/
Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A, Shi W (2017) Photo-realistic single image super-resolution using a Generative Adversarial Network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4681–4690
Ledig C, Theis L, Huszar F, Caballero J, Cunningham A, Acosta A, Aitken A, Tejani A, Totz J, Wang Z, (2017) Photo-realistic single image superresolution using a generative adversarial network. CVPR
Lee HJ, Kim ST, Lee H, Ro YM (2019) Lightweight and effective facial landmark detection using adversarial learning with face geometric map generative network. IEEE Trans Circuits Syst Video Technol
Lehmann EL, Romano JP (2006) Testing statistical hypotheses. Springer, Science and Business Media
MATH Google Scholar
Li Y, Shen L (2018) cC-GAN: a robust transfer-learning framework for HEp-2 specimen image segmentation. IEEE Access 6:14048–14058
Google Scholar
Li J, Liu S, He H, Li L (2018) A novel framework for gear safety factor prediction. IEEE Trans Industrial Informatics
Li H, Li G, Lin L, Yu H, Yu Y (2018) Context-aware semantic inpainting. IEEE transactions on cybernetics
Li J, Skinner KA, Eustice RM, Matthew JR (2018) WaterGAN: unsupervised generative network to enable real-time color correction of monocular underwater images. IEEE Robotics and Automation Letters 3(1):387–394
Google Scholar
Li J, He H, Li L, Chen G (2019) A novel generative model with bounded-GAN for reliability classification of gear safety. IEEE Trans Ind Electron 66(11):8772–8781
Google Scholar
Li P, Prieto L, Mery D, Flynn PJ (2019) On low-resolution face recognition in the wild: comparisons and new techniques. IEEE Trans Inform Forensics Secur 14:2000–2012
Google Scholar
Liang X, Lee L, Dai W, Xing EP (2017) Dual motion GAN for future-flow embedded video prediction. In proceedings of the IEEE international conference on computer vision, pp 1744–1752
Liao K, Lin C, Zhao Y, Gabbouj M (2020) DR-GAN: automatic radial distortion rectification using conditional GAN in real-time. IEEE Trans Circuits Syst Video Technol 30(3):725–733
Google Scholar
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, Cham, pp 740–755
Google Scholar
Lin D, Fu K, Wang Y, Xu G, Sun X (2017) MARTA GANs: unsupervised representation learning for remote sensing image classification. IEEE Geosci Remote Sens Lett 14(11):2092–2096
Google Scholar
Liu Z, Luo P, Wang X, Tang X (2018) Large-scale celebfaces attributes (celeba) dataset
Lopez-Tapia S, Lucas A, Molina R, Katsaggelos A K (2018) A single video super-resolution GAN for multiple Downsampling operators based on pseudo-inverse image formation models
Lopez-Tapia S, Lucas A, Molina R, Katsaggelos A K (2019) GAN-based video super-resolution with direct regularized inversion of the low-resolution formation model, ICIP conference
Lucas A, Lopez-Tapiad S, Molinae R, Katsaggelos AK (2019) Generative adversarial networks and perceptual losses for video super-resolution. IEEE Trans Image Process 28:3312–3327
MathSciNet MATH Google Scholar
Lucas A, Lopez-Tapia S, Molina R, Katsaggelos AK (2019) Generative adversarial networks and perceptual losses for video super-resolution. IEEE Trans Image Process 28(7):3312–3327
MathSciNet MATH Google Scholar
Lucic M, Kurach K, Michalski M, Gelly S, Bousquet O (2018) Are GANs created equal? A large-scale study. In Advances in neural information processing systems, pp 698–707
Lucic M, Kurach K, Michalski M, Bousquet O, Gelly S (2018) Are GANs created equal? A large-scale study. NeurIPS 1–10
Ma S, Fu J, Wen Chen C, Mei T (2018) DA-GAN: Instance-level image translation by deep attention Generative Adversarial Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5657–5666
Ma D, Tang P, Zhao L (2019) SiftingGAN: generating and sifting labeled samples to improve the remote sensing image scene classification baseline in vitro. IEEE Geosci Remote Sens Lett 16:1046–1050
Google Scholar
Mao X, Li Q, Xie H, Lau RY, Wang Z, Paul Smolley S (2017) Least squares generative adversarial networks. In proceedings of the IEEE international conference on computer vision, pp 2794–2802
Mao X, Li Q, Xie H, KLau RY, Wang Z, Smolley SP (2018) On the effectiveness of least squares generative adversarial networks. IEEE Trans Pattern Anal Mach Intell 41:2947–2960. https://doi.org/10.1109/TPAMI.2018.2872043
Article Google Scholar
Mardani M, Gong E, Cheng JY, Vasanawala SS, Zaharchuk G, Xing L, Pauly JM (2019) Deep generative adversarial neural networks for compressive sensing MRI. IEEE Trans Med Imaging 38(1):167–179
Google Scholar
Mathieu M, Couprie C, Le Y (2016) Deep multi-scale video prediction beyond mean square error. ICLR:1–14
Mirza M, Osindero S (2014) Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784
Miyato T, Kataoka T, Koyama M, Yoshida Y (2018) Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957
Nie D, Trullo R, Lian J, Wang L, Petitjean C, Ruan S, Shen D (2018) Medical image synthesis with deep convolutional adversarial networks. IEEE Trans Biomed Eng 65(12):2720–2730
Google Scholar
Nilsback ME and Zisserman A (2008). Automated flower classification over a large number of classes. In 2008 Sixth Indian Conference on Computer Vision Graphics and Image Processing, pp 722–729, IEEE
Niu X, Gong M, Zhan T, Yang Y (2019) A conditional adversarial network for change detection in heterogeneous images. IEEE Geosci Remote Sens Lett 16(1):45–49
Google Scholar
Odena A, Olah C, Shlens J (2017) Conditional image synthesis with auxiliary classifier GANs. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp 2642–2651, JMLRorg
Ohnishi K, Yamamoto S, Ushiku Y, Harada T (2018) Hierarchical video generation from orthogonal information: optical flow and texture. In Thirty-Second AAAI Conference on Artificial Intelligence
Oliveira DA, Ferreira RS, Silva R, Brazil EV (2018) Interpolating seismic data with conditional generative adversarial networks. IEEE Geosci Remote Sens Lett 99:1–5
Google Scholar
Oord AVD, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kavukcuoglu K (2016) Wavenet: a generative model for raw audio. arXiv preprint arXiv:1609.03499
Ouyang X, Cheng Y, Jiang Y, Li C L, Zhou P (2018) Pedestrian-synthesis-GAN: generating pedestrian data in real scene and beyond.arXiv 1–22
Pan Y, Qiu Z, Yao T, Li H, Mei T (2017) To create what you tell: generating videos from captions. In proceedings of the 25th ACM international conference on multimedia, pp 1789–1798. ACM
Pang Y, Xie J, Li X (2018) Visual haze removal by a unified generative adversarial network. IEEE Trans Circuits Syst Video Technol
Pascual S, Bonafonte A, Serrà J (2017) SEGAN: speech enhancement generative adversarial network. arXiv preprint arXiv:1703.09452
Quan TM, Nguyen-Duc T, Jeong WK (2018) Compressed sensing MRI reconstruction using a generative adversarial network with a cyclic loss. IEEE Trans Med Imaging 37(6):1488–1497
Google Scholar
Radford A, Metz L, Chintala S (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434
Rezende DJ, Mohamed S (2015) Variational inference with normalizing flows. arXiv preprint arXiv:1505.05770
Richardson E, Weiss Y (2018) On GANs and gmms. In Advances in Neural Information Processing Systems, pp 5852–5863
Royer A, Bousmalis K, Gouws S, Bertsch F, Mosseri I, Cole F, Murphy K (2017) XGAN: unsupervised image-to-image translation for many-to-many mappings. arXiv preprint arXiv:1711.05139
Saito Y, Takamichi S, Saruwatari H (2018) Statistical parametric speech synthesis incorporating generative adversarial networks. IEEE/ACM Trans Audio Speech Language Process 26(1):84–96
Google Scholar
Sakkos D, Ho ES, Shum HP (2019) Illumination-aware multi-task GANs for foreground segmentation. IEEE Access, Illumination-Aware Multi-Task GANs for Foreground Segmentation
Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved techniques for training GANs. In Advances in neural information processing systems, pp 2234–2242
Shan H, Zhang Y, Yang Q, Kruger U, Kalra MK, Sun L, Wang G (2018) 3-D convolutional encoder-decoder network for low-dose CT via transfer learning from a 2-D trained network. IEEE Trans Med Imaging 37(6):1522–1534
Google Scholar
Shen Z, Sheng W, Xu L T, Kautz J, Yang M H (2018) Deep Semantic Face Deblurring, CVPR
Shen Z, Wang W, Lu X, Shen J, Ling H, Xu T, Shao L (2019) Human-Aware Motion Deblurring, ICCV
Shi Y, Li Q, Zhu XX (2018) Building footprint generation using improved generative adversarial networks. IEEE Geosci Remote Sens Lett
Snell J, Ridgeway K, Liao R, Roads BD, Mozer MC, Zemel RS (2017) Learning to generate images with perceptual similarity metrics. In 2017 IEEE International Conference on Image Processing, pp 4277–4281. IEEE
Theis L, Oord AVD, Bethge M (2015) A note on the evaluation of generative models. arXiv preprint arXiv:1511.01844
Tuan YL, Lee HY (2019) Improving conditional sequence generative adversarial networks by stepwise evaluation. IEEE/ACM Trans Audio Speech Language Process 27(4):788–798
Google Scholar
Tulyakov S, Liu MY, Yang X, Kautz J (2018) MocoGAN: Decomposing motion and content for video generation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1526–1535
Van Horn G, Branson S, Farrell R, Haber S, Barry J, Ipeirotis P, Belongie S (2015) Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 595–604
Van Horn G, Mac Aodha O, Song Y, Shepard A, Adam H, Perona P, Belongie S (2017) The inaturalist challenge 2017 dataset. arXiv preprint arXiv:1707.06642 1(2)
Vondrick C, Pirsiavash H, Torralba A (2016) Generating videos with scene dynamics. In Advances In Neural Information Processing Systems, pp 613–621
Walker J, Marino K, Gupta A, Hebert M (2017) The pose knows: video forecasting by generating pose futures. In proceedings of the IEEE international conference on computer vision, pp 3332–3341
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
Google Scholar
Wang Y, Zhang L, van de Weijer J (2016) Ensembles of generative adversarial networks. arXiv preprint arXiv:1612.00991
Wang W, Huang Q, You S, Yang C, Neumann U (2017) Shape inpainting using 3d generative adversarial network and recurrent convolutional networks. In Proceedings of the IEEE international conference on computer vision, pp 2298–2306
Wang X, Yu K, Wu S, Gu J, Liu Y, Dong C, Loy CC (2018) EsrGAN: enhanced super-resolution generative adversarial networks. In: European Conference on Computer Vision. Springer, Cham, pp 63–79
Google Scholar
Wang Y, Zhou L, Yu B, Wang L, Zu C, Lalush DS, Shen D (2018) 3D auto-context-based locality adaptive multi-modality GANs for PET synthesis. IEEE Trans Med Imaging
Wang Y, Zhou L, Yu B, Wang L, Zu C, Lalush DS, Shen D (2018) 3D auto-context-based locality adaptive multi-modality GANs for PET synthesis. IEEE Trans Med Imaging
Wang Z, Chen Z, Wu F (2018) Thermal to visible facial image translation using generative adversarial networks. IEEE Signal Process Lett 25(8):1161–1165
Google Scholar
Welinder P, Branson S, Mita T, Wah C, Schroff F, Belongie S, Perona P (2010) Caltech-UCSD birds 200
Wen S, Liu W, Yang Y, Huang T, Zeng Z (2018) Generating realistic videos from keyframes with concatenated GANs. IEEE Trans Circuits Syst Video Technol
Wolterink JM, Leiner T, Viergever MA, Išgum I (2017) Generative Adversarial Networks for noise reduction in low-dose CT. IEEE Transactions on Medical Imaging 36(12):2536–2545
Wu W, Qi H, Rong Z, Liu L, Su H (2018) Scribble-supervised segmentation of aerial building footprints using adversarial learning. IEEE Access 6:58898–58911
Google Scholar
Xiang S, Li H (2017) On the effects of batch and weight normalization in generative adversarial networks. arXiv preprint arXiv:1704.03971
Xiao H, Rasul K, Vollgraf R (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747
Xiong W, Luo W, Ma L, Liu W, Luo J (2018) Learning to generate time-lapse videos using multi-stage dynamic Generative Adversarial Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2364–2373
Xu C, Ren J, Zhang D, Zhang Y, Qin Z, Ren K (2019) GANobfuscator: mitigating information leakage under GAN via differential privacy. IEEE Trans Inform Forensics Secur 14:2358–2371
Google Scholar
Xuan Q, Chen Z, Liu Y, Huang H, Bao G, Zhang D (2018) Multi-view generative adversarial network and its application in pearl classification. IEEE Trans Ind Electron 66(10):8244–8252
Google Scholar
Yang J, Kannan A, Batra D, Parikh D (2017) Lr-GAN: layered recursive generative adversarial networks for image generation. arXiv preprint arXiv:1703.01560
Yang G, Yu S, Dong H, Slabaugh G, Dragotti PL, Ye X, Firmin D (2018) DAGAN: deep de-aliasing generative adversarial networks for fast compressed sensing MRI reconstruction. IEEE Trans Med Imaging 37(6):1310–1321
Google Scholar
Yang Q, Yan P, Zhang Y, Yu H, Shi Y, Mou X, Wang G (2018) Low-dose CT image denoising using a generative adversarial network with Wasserstein distance and perceptual loss. IEEE Trans Med Imaging 37(6):1348–1357
Google Scholar
Yang Y, Zhou J, Ai J, Bin Y, Hanjalic A, Shen HT, Ji Y (2018) Video captioning by adversarial lstm. IEEE Trans Image Process 27(11):5600–5611
MathSciNet Google Scholar
Yi Z, Zhang H, Tan P, Gong M (2017) DualGAN: unsupervised dual learning for image-to-image translation. In proceedings of the IEEE international conference on computer vision, pp 2849–2857
Yu F, Seff A, Zhang Y, Song S, Funkhouser T, Xiao J (2015) LSUN: construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365
Yu J, Lin Z, Yang J, Shen X, Lu X, Huang TS (2018) Free-form image inpainting with gated convolution. arXiv preprint arXiv:1806.03589
Yu J, Lin Z, Yang J, Shen X, Lu X, Huang TS (2018) Generative image inpainting with contextual attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5505–5514
Yu B, Zhou L, Wang L, Shi Y, Fripp J, Bourgeat P (2019) Ea-GANs: edge-aware generative adversarial networks for cross-modality MR image synthesis. IEEE Transactions on Medical Imaging
Yuan Y, Tian C, Lu X (2018) Auxiliary Loss Multimodal GRU Model in Audio-Visual Speech Recognition. IEEE Access 6:5573–5583
Google Scholar
Zeng Y, Lu H, Borji A (2017) Statistics of deep generated images. arXiv preprint arXiv:1708.02688
Zhan Y, Hu D, Wang Y, Yu X (2018) Semisupervised hyperspectral image classification based on generative adversarial networks. IEEE Geosci Remote Sens Lett 15(2):212–216
Google Scholar
Zhang L, Liu P, Gulla (2019) J Artificial Intell Mach Learning https://doi.org/10.1007/s10994-018-05777-9
Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas DN (2017) StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. In proceedings of the IEEE international conference on computer vision, pp 5907–5915
Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas D (2017) StackGAN++: realistic image synthesis with stacked generative adversarial networks. arXiv preprint arXiv:1710.10916
Zhang M, Gong M, Mao Y, Li J, Wu Y (2018) Unsupervised feature extraction in Hyperspectral images based on Wasserstein generative adversarial network. IEEE Trans Geosci Remote Sens
Zhang H, Goodfellow I, Metaxas D, Odena A (2018) Self-attention generative adversarial networks. arXiv preprint arXiv:1805.08318
Zhang Z, Song Y, Qi H (2018) Decoupled learning for conditional Adversarial NetworksIn 2018 IEEE Winter Conference on Applications of Computer Vision, pp 700–708. IEEE
Zhang K, Luo W, Zhong Y, Ma L, Liu W, Li H (2019) Adversarial spatio-temporal learning for video deblurring. IEEE Trans Image Process 28(1):291–301
MathSciNet Google Scholar
Zhao J, Mathieu M, LeCun Y (2016) Energy-based generative adversarial network. arXiv preprint arXiv:1609.03126
Zhou Z, Cai H, Rong S, Song Y, Ren K, Zhang W, Wang J (2017) Activation maximization generative adversarial nets. arXiv preprint arXiv:1703.02000
Zhu JY, Zhang R, Pathak D, Darrell T, Efros AA, Wang O, Shechtman E (2017) Toward multimodal image-to-image translation. In Advances in Neural Information Processing Systems, pp 465–476
Zhu JY, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, pp 2223–2232
Zhu JY, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. ICCV
Zhu L, Chen Y, Ghamisi P, Benediktsson JA (2018) Generative adversarial networks for hyperspectral image classification. IEEE Trans Geosci Remote Sens 56(9):5046–5063
Google Scholar
Zhu X, Zhang L, Zhang L, Liu X, Shen Y, Zhao S (2020) GAN-based image super-resolution with a novel quality loss. Math Probl Eng 2020:1–12. https://doi.org/10.1155/2020/5217429
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, Thapar Institute of Engineering and Technology, Patiala, India
Akanksha Sharma & Neeru Jindal
Department of Computer Science Engineering, Thapar Institute of Engineering and Technology, Patiala, India
P. S. Rana

Authors

Akanksha Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Neeru Jindal
View author publications
You can also search for this author in PubMed Google Scholar
P. S. Rana
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Neeru Jindal.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sharma, A., Jindal, N. & Rana, P.S. Potential of generative adversarial net algorithms in image and video processing applications– a survey. Multimed Tools Appl 79, 27407–27437 (2020). https://doi.org/10.1007/s11042-020-09308-4

Download citation

Received: 13 July 2019
Revised: 16 April 2020
Accepted: 09 July 2020
Published: 24 July 2020
Issue Date: October 2020
DOI: https://doi.org/10.1007/s11042-020-09308-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Potential of generative adversarial net algorithms in image and video processing applications– a survey

Abstract

Similar content being viewed by others

Wasserstein Divergence for GANs

Generative Adversarial Networks: A Survey on Training, Variants, and Applications

A generative adversarial network for image denoising

Explore related subjects

1 Introduction

1.1 Basic building blocks and architecture of GAN

1.2 Input, training and cost functions

1.2.1 Input noise

1.2.2 Training

1.2.3 Role of cost functions

2 Literature survey

3 Current trends

3.1 Image processing

3.1.1 Image generation

3.1.2 Inpainting

3.1.3 Image translation

3.1.4 Super resolution

3.1.5 Segmentation

3.1.6 Real-world image applications

3.2 Video processing

3.2.1 Frame generation and prediction

3.2.2 Video De-blurring

3.2.3 Haze removal

3.2.4 De-identification

3.2.5 Video super-resolution

3.2.6 Real-world video applications

4 Performance evaluation

4.1 Pros and cons

5 Future challenges

6 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation