Keywords

1 Introduction

Currently, most medical practitioners diagnose disorders using computer-aided imagery. Generally speaking, low-resolution photos made it difficult to diagnose several of the disorders. In order to create artificial images as well as their segmented images, the deep convolutional network will be used. These synthesized photos have a high resolution.

Generative adversarial networks (GANs) have emerged, offering new technologies and a framework for the use of medical pictures. GANs are quickly becoming a cutting-edge foundation as a result of achieving increased performances in a number of medical applications. The technical characteristics of common GAN approaches utilized in the medical imaging domain are extensively elucidated. Unsupervised learning is accomplished using sophisticated neural networks called generative adversarial networks (GANs).

2 What is a GAN (Generative Adversarial Network)?

Generative adversarial networks (GANs), a method for deep learning, allow computers to synthesize new, artificial data from collections of pre-existing data. In particular, a GAN can produce high-quality data with little to no labeling through competition between the generator and discriminator networks [1, 2].

There are two competing neural network models in GAN. Using the noise vector (usually a low-dimensional random vector sampled from a normal or uniform distribution typically between 50 and 512 dimensions, and is randomly generated for each sample during training) as an input, one creates samples (and so named generator). The purpose of the noise vector is to introduce randomness into the generator network and to allow it to produce a diverse set of 2 outputs. By providing different random vectors as input to the generator network, we can generate a wide range of new data. In order to ensure that the generated outputs are diverse and not just copies of the training data, the noise vector is an important factor in the success of GANs.

The second model, referred to as the discriminator, is given samples from the generator and training data [3]. The generator has been trained to make images that closely resemble actual data, while the discriminator has been trained to completely distinguish between produced data and true data. The adversarial network’s generator and discriminator compete against one another until symmetry is established, at which point the network is trained.

2.1 Overview of GAN Structure

GANs compete two neural networks against each other to establish the probability distribution of a dataset. GAN has two neural networks in it:

  • Generator, G.

  • Discriminator, D.

A generative network seeks to create artificial images that appear realistic. It accepts a random vector as input (let us say a 100-dimensional array of numbers from a Gaussian distribution) and produces a highly realistic image that appears to be a part of our training set.

On the other hand, the discriminator network accurately determines if an image is fake (i.e., created by the generator) or real (i.e., direct from the source of the input). These processes are repeated many times, so that the generator and the discriminator get better and better at their respective roles with each iteration. Fig. 1 will help you understand how it works.

Fig. 1
A block flow diagram of the GAN. The flow begins with latent random noise, followed by a generator, fake samples, discriminator model, real or fake, and loss. Real images, followed by samples point to the discriminator model.

An illustration of a generative adversarial network (GAN)

2.2 Mathematical Equation

The discriminator examines generated images and real images (i.e., training samples) separately. It determines if the discriminator’s input image is fake or real. The probability that the input x is real is represented by the output D(x). The discriminator is trained in the same manner as a deep network classifier. We want D(x) = 1 if the input is true, that is, image is real. It should be zero if it is a generated image.

The discriminator finds qualities that contribute to realistic images through this method. On the other hand, we want the generator to produce images that are identical to the true image, with D(x) = 1. It backpropagates the desired value all the way back to the generator in order to train the generator to generate images that are more similar to what the discriminator recognizes as real.

The generator becomes stronger at producing realistic images that the discriminator cannot tell apart from actual ones as the training goes on. The discriminator also grows stronger at picking up even the smallest variations between the two sorts of images. The generator eventually creates visuals that are similar to real images as the two models converge.

The following formula can be used to mathematically explain it [4]:

$$ \underset{\mathrm{G}}{\min}\underset{\mathrm{D}}{\max}\mathrm{V}\left(\mathrm{D},\mathrm{G}\right) $$
(1)
$$ \mathrm{V}\left(\mathrm{D},\mathrm{G}\right)={\mathrm{E}}_{x\sim {p}_{{\mathrm{data}}^{(x)}}}\left[\log \mathrm{D}(x)\right]+{\mathrm{E}}_{z\sim {p}_{z^{(z)}}}\left[\log \left(1-\mathrm{D}\left(\mathrm{G}(z)\right)\right.\right] $$

where,

  • x = real data,

  • z = noise vector,

  • G(z;θ_g) = The generator network operates by performing a mapping function from the noise vector to a synthetic data point, with the parameters of the generator network denoted as θ_g.

  • D(x;θ_d) = The discriminator network is a function that receives a data point as its input and generates a scalar output that denotes whether the input is authentic or artificial. The parameters of the discriminator network are represented by θ_d.

  • p_data(x) = The probability distribution of the actual data.

  • p_z(z) = The probability distribution function of the noise vector z.

The principal objective of GANs is to enhance the discriminative capacity of the discriminator in discerning genuine and synthesized images. The aforementioned process is accomplished via a minimization-maximization methodology, wherein the generator endeavors to minimize its objective, while the discriminator strives to maximize it. The primary aim of the objective function is to enhance the likelihood of detecting artificially generated images as counterfeit and genuine images as authentic, thereby optimizing the likelihood of observed data. The cross-entropy function is a widely adopted method for computing the loss in deep learning, which involves the calculation of p multiplied by the natural logarithm of q. In the context of real images, the appropriate label to assign is p, which has a value of 1. In the case of generated images, the label is inverted, specifically by subtracting it from one. GANs are commonly characterized as a minimax game in which the objective of the generator is to minimize the value of V, while the discriminator aims to maximize it.

2.3 Major Applications of GAN

Wherever new, plausible data is required, GANs can be used in a wide range of applications. GANs are specifically used to produce new images and videos.

  • Image Generation: GANs are commonly used for generating realistic images. For example, they can be used to generate realistic-looking faces, landscapes, or even artwork.

  • Style Transfer: GANs have the potential to facilitate the transfer of style from one image to another, thereby enabling the creation of an entirely new image that incorporates the content of one image and the style of another.

  • Data Augmentation: GANs can be used for generating new data from existing data, which can be useful for training machine learning models with limited datasets.

  • Disease Diagnosis and Prediction: GANs can be used for identifying patterns in medical data and predicting the likelihood of a patient developing a certain disease.

  • Medical Image Analysis: GANs can be used for generating synthetic medical images, such as CT or MRI scans, which can be used for training machine learning models. GANs can also be used for image segmentation, enhancing the quality of medical images, and reducing image noise.

  • Medical Data Augmentation: GANs have the potential to generate synthetic medical data, thereby serving as a means of augmenting limited datasets and enhancing the precision of machine learning models.

Overall, GANs have the potential to revolutionize the field of medicine, by improving the accuracy of disease diagnosis, speeding up drug discovery, and enabling personalized treatment plans.

3 Self-supervised Generative Adversarial Learning

We will first define the term “Self-Supervised Learning” and then discuss how it enhances GANs. Self-supervised is the most similar to unsupervised learning when compared to the prominent families of supervised and unsupervised learning. An effective method for learning representations from unlabeled data is self-supervised (SS) learning [5]. Self-supervised learning algorithm learns from data itself, with no data labeled examples. The algorithm must identify patterns within the dataset to facilitate the process of acquiring knowledge from it [4].

With the help of pseudo-labels [5], self-supervised approaches enable the classifier to learn better feature representation [6]. These methods specifically suggest learning the model to recognize a geometric transformation that has been done to the input image in order to learn an image feature.

There exist several approaches to the implementation of self-supervised learning. One approach to comprehending the attributes of the data is to employ a neural network. Subsequently, the neural network can be employed to forecast the designations of novel data. The identification of data structure can also be accomplished through the utilization of a Convolutional Neural Network (CNN). A CNN can be utilized to forecast the outcomes of novel data.

There are some situations where self-supervised learning is superior to supervised learning. For example, a CNN trained just through self-supervised learning can classify images more accurately than a CNN taught only through supervised learning. This is due to the fact that a CNN that is learned only through supervised learning is limited by the training set that is made accessible to it. A CNN that has been trained only through self-supervised learning can understand the data’s structure from scratch, improving its ability to generalize to new data [6, 7].

4 Conditional and Unconditional GANs

The issues with training GANs will now be linked to self-supervised learning. GANs are a type of unsupervised generative modelling in which you may just input data and let the model generate false data from it. Modern GANs, on the other hand, use a method called conditional-GANs [8], which convert the generative modelling challenge into a supervised learning task that needs labeled data. For easier generative modelling, conditional-GANs incorporate class labels within the generator and discriminator [9].

The term “unconditional GANs” eliminates the necessity for class labels in generative modelling. This chapter will demonstrate how self-supervised learning tasks can do away with labeled data when using GANs.

5 Thermal Imaging Systems

The surface skin temperature [10] can be measured using thermal imaging devices. These systems might contain a temperature reference source in addition to an infrared thermal camera [11, 12].

The surface skin temperature of a subject may typically be measured reliably by thermal imaging devices without being in immediate contact to the subject under evaluation [13]. Thermal imaging systems [14] have advantages over other techniques of measuring temperature since they require a closer proximity or touch (Fig. 2).

Fig. 2
A photograph of a thermal imaging device that captures a woman seated in the front. The maximum surface temperature measured is 35.4 degrees Celsius approximately.

Shows how to set up thermal imaging properly to analyze persons individually

5.1 Why are Thermal Imaging Devices Beneficial?

There are various advantages of using thermal imaging systems/cameras, which are listed below:

  1. 1.

    100% non-invasive: The proximity of the evaluator to the subject under scrutiny is not a requisite for the operation of thermal imaging devices.

  2. 2.

    Speed and accuracy: In contrast to traditional forehead or oral thermometers that require close proximity or physical contact with the subject under evaluation, thermal imaging devices have the potential to provide more rapid and precise monitoring of surface skin temperature.

  3. 3.

    Flexible and cost-effective diagnostic approach: Thermal camera should be readily available on the market at reasonable price and the same equipment is used to record both thermal and geometric data.

6 Need of Data Augmentation in GANs

Generating annotated medical imaging data is a challenging and costly task. The creation of deep learning models that can be generalized necessitates the acquisition of substantial amounts of data. Standard data augmentation is a commonly employed technique aimed at enhancing the generalizability of machine learning models. Generative adversarial networks offer a novel method for data augmentation [6, 7].

Insufficient data during the training of GANs often leads to the issue of discriminator overfitting [15] which in turn causes the training process to diverge. Our proposed approach involves utilizing an adaptive discriminator augmentation technique that effectively enhances the stability of training in scenarios where data availability is limited [16, 17]. This approach is applicable for both initial training and does not necessitate modifications to either loss functions or network architectures. The utilization of unlabeled data holds significant value in the improvement of deep learning efficacy. GANs are a potent category of neural networks capable of generating lifelike novel images based on unannotated source images [15, 18]. GANs have been employed in the past to augment data, including the creation of supplementary training images for classification purposes and the enhancement of synthetic images [19].

In order to overcome overfitting and underfitting [2], data augmentation with GANs was demonstrated to boost model accuracy and decrease model loss, hence enhancing the generalizability of the model [20] (Fig. 3).

Fig. 3
A set of 5 thermal images of knees. Image a depicts a thermal imaging of a knee osteoarthritis patient. Images b to e are GAN-generated images.

Thermal image of knee osteoarthritis patient (a) and its augmented GAN-generated images (be)

7 Improved Medical Image Generation via Self-supervised Learning

In the domain of deep learning, it is customary to utilize extensive labeled datasets to effectively train a deep neural network. Various self-supervised learning techniques have been suggested as a means to acquire universal visual characteristics in an automated manner, thereby circumventing the laborious and time-intensive process of manually annotating vast quantities of data. Self-supervised generative adversarial neural networks, also known as unconditional GANs, are utilized for the purpose of generating synthetic thermal images.

The widespread use of deep CNNs in computer vision applications can be attributed to their remarkable ability to extract features from visual data. These applications include but are not limited to image classification, semantic and instance segmentation, object recognition, and image captioning. The efficacy of deep learning models is notably impacted by the quantity of data utilized during the training process, as they have the ability to expand and enhance in intricacy with the incorporation of supplementary training data.

8 Methods

Despite the prevalence of comprehensive color image databases for diverse objects in the public sphere, there exists a dearth of comparable databases for thermal images, with either a lack of availability or restricted representation of object categories. The synthesis of thermal images is of great significance due to the arduous nature and high expenses associated with obtaining authentic data. The process of gathering and annotating extensive datasets comprising millions of images is arduous, costly, and time-intensive [21].

GANs have demonstrated remarkable efficacy in generating diverse images through the use of pre-existing images and stochastic noise, a widely acknowledged fact. Currently, unconditional GANs have the ability to generate images that exhibit a high degree of realism, diversity, and quality.

8.1 Training Dataset

The selection of an appropriate unlabeled dataset is an essential part of transfer learning via self-supervised pre-training.

Our training dataset [22] is based on the knee areas [23] of the human body which are captured with a FLIR thermal camera. While diagnosing arthritis, thermography is frequently used to examine deep-bodily joints that are challenging to evaluate with a standard X-ray [19, 24]. The size of all thermal images is 312 KB with dimensions of 320 × 240 pixels [25] (Fig. 4).

Fig. 4
A framework of the generator and discriminator models. A. Random noise is followed by convolution 2 D transpose and convolution 2 D and fake sample generated. B. The real and data instances are followed by the discriminator, which leads to loss. It ends with the dense layer.

Generator and discriminator models used in this technique

8.2 Results and Conclusion

The application of thermal imaging technology [26, 27] is employed for the purpose of diagnosing infectious skin conditions and investigating a wide range of disorders, wherein alterations in body temperature may indicate the presence of inflammation in injured tissues or clinical abnormalities that result in changes in blood circulation [23].

The results of the current study indicate that thermal imaging has the potential to serve as a dependable diagnostic modality for detecting measurable patterns in skin temperatures [28]. It has been shown that changes in pain intensity associated with arthritic, repetitive strain, muscular, and circulatory issues can be correlated with temperature variance [29,30,31].

We believe that this non-intrusive method makes it possible to find the earliest clinical features, with high reliability [32].

8.3 GAN Results

The GAN generated fake images from the given thermal images of Knee dataset and comparison of generator & discriminator loss on a trained GAN architecture are visualised (Figs. 5 and 6).

Fig. 5
2 sets of thermal photos of knees. A. 5 Thermal images of knees in different views. B. A matrix of real thermal images of the knees. A matrix of the GAN generated fake images of the knees.

(a) Different thermal images of knee pair and its lateral view (right and left); (b) Matrix of input real image and the GAN-generated fake image matrix

Fig. 6
2 simulation graphs of discrimination and adversary losses versus epoch. A. The line peaks high between 3500 and 5000 epochs. B. The line peaks high between 3200 and 4500 epochs. Data are estimated.

Comparison of the generator and discriminator loss on a GAN architecture that trained on knee dataset

8.4 Outlook and Conclusions

This chapter has explored various techniques for producing simulated thermal images using the provided knee dataset. Future research in the field of data augmentation will focus on various topics, including the development of a taxonomy of augmentation methods. To improve the quality of GAN samples, researchers may explore novel combinations of meta-learning and data augmentation techniques, investigate the correlation between data augmentation and classifier architecture, and apply these concepts to diverse data types. Furthermore, the integration of innovative data augmentation techniques can enhance the variety and magnitude of the training dataset, consequently augmenting the efficacy of the GAN model [33].

In our upcoming study, we intend to investigate performance benchmarks for geometric and color space augmentations on numerous datasets from various image recognition tasks. To show how well these augmentations work in situations when there isn’t a lot of data, we are going to impose these dataset’s size restrictions. The qualities of the temperature profile that is connected with a thermal image have not yet been investigated while creating synthetic thermal images, which may be a future course of action.

The GAN framework has undergone several modifications in various research articles, utilizing diverse network designs, loss functions, evolutionary techniques, and other methodologies. The study has led to a significant improvement in the quality of samples generated by GANs. An important avenue for further investigation pertains to the augmentation of GANs’ sample quality, as well as the assessment of their efficacy across diverse datasets. To advance the exploration of GAN sample combinatorics, we aim to employ supplementary augmentation techniques, including the transfer of diverse styles onto GAN-generated samples.

Future research in generative models with data augmentation should also focus on StyleGAN2, StyleGAN2-ADA, DiffAugment, and Variational Autoencoder (VAE). Trying to produce high-resolution outputs from GAN samples is one of the main challenges. It will be interesting to explore how we might utilize these GAN networks to produce high-resolution images as a result.