Towards enhancing shadow removal from images

Tariq, Muhammad Hamza; Salman, Ahmad; Khurshid, Khawar

doi:10.1007/s11042-024-19824-2

Towards enhancing shadow removal from images

Published: 10 July 2024

(2024)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Multimedia Tools and Applications Aims and scope Submit manuscript

Towards enhancing shadow removal from images

Download PDF

177 Accesses
1 Altmetric
Explore all metrics

Abstract

Addressing challenges posed by shadows in computer vision tasks, this study employs advanced deep learning techniques for shadow removal in RGB images. The proposed method leverages generative adversarial networks, incorporating two specialised generators with spatial attention and nearest-neighbour upsampling. By detecting and eliminating shadows, the approach aims to enhance image quality. To refine the shadow-free image, a post-processing step integrates traditional image processing techniques, including histogram matching, custom filters, shadow boundary detection, and estimation. This innovative technique significantly improves the quality of output image after shadow removal, addressing persistent issues such as residual shadows and overall image quality degradation. Furthermore, the study introduces a benchmark dataset, Extended ISTD, comprising 5,352 images with shadow, shadow-mask, and shadow-free samples. This dataset accommodates variety of shadows, including dark/hard and multi-color contrast shadows, expanding on the existing ISTD dataset. Training the proposed GAN model on this comprehensive dataset and applying the specially crafted post-processing step result in an impressive root mean square error (RMSE) of 5.68 and 2.80 for the entire and shadow-specific image regions, respectively, on ISTD test data. This signifies a performance improvement of up to 111.4% in RMSE and 23.86% in peak signal-to-noise ratio (PSNR). The methodology excels in generating high-quality shadow-free images, even in challenging scenarios involving various shadow types.

Integration of GAN and Adaptive Exposure Correction for Shadow Removal

Learning to Remove Shadows from a Single Image

Article 09 June 2023

MGRLN-Net: Mask-Guided Residual Learning Network for Joint Single-Image Shadow Detection and Removal

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Artefacts are undesirable and unintended elements that appear in an image, causing a decline in the overall image quality. These artefacts can emerge in digital images due to various factors within the digital camera’s mechanisms, such as blooming, aliasing, compression, and noise. They can also originate naturally as shadows, fog, haze, smoke, and smog. When such a degraded image is subjected to further processing or input into deep learning or computer vision models, it can diminish the effectiveness of the algorithms [8]. Consequently, the likelihood of errors in tasks like object detection, classification or regression by deep learning models increases, given their high sensitivity to the data on which they were trained [40]. These models might inadvertently learn these undesirable traits, or if such traits are encountered in real-time during the inference phase, they could significantly reduce the models’ efficiency and accuracy [3]. Thus, it is imperative to conduct pre-processing on images, and these anomalies must be eradicated to attain favourable performance by deep learning models.

Shadows represent natural phenomena in which certain areas of an image are darker due to partial or complete obstruction of the light source [1]. Depending on their intensity, shadows can be categorised into three main types: hard shadows, soft shadows, and umbra/penumbra [16]. A hard shadow is formed when the light source is entirely blocked, causing the surface texture to disappear completely. In contrast, a soft shadow emerges when the light source is partially blocked, resulting in a partial disappearance of surface texture. Being two fundamental components of a shadow, umbra is the darker portion located at the bottom of the shadow, while penumbra is the lighter region often found at the boundaries of the shadow.

The presence of shadows in images can significantly impact the accuracy and effectiveness of various algorithms based on deep learning and computer vision. These shadows can lead to issues such as object merging, object loss, and misinterpretations of remote sensing images. Consequently, the removal of shadows from images becomes a crucial task to enhance visual quality. This improvement facilitates the pragmatic application of these algorithms, including tasks like object detection, object classification, and object tracking with expectation of high accuracy. Various approaches have been proposed in literature for eliminating shadows from RGB images. These can be divided into two broad categories, i.e., traditional image processing techniques and deep-learning based models.

Prior to the advent of cutting-edge deep-learning approaches, conventional image processing methods are employed for identifying and eliminating shadows from images [13, 39]. However, these traditional algorithms had limitations, being non-scalable and only effective for specific scenarios [14]. Some of these techniques relied on user input, which was impractical for real-time applications [1]. Over the past few years, deep-learning models have gained prominence due to their exceptional performance on vision-related tasks [36]. Generative Adversarial Networks (GAN) have become the preferred choice for shadow detection and removal. Other approaches involve Convolutional Neural Networks (CNN), like those in [3] and [8], which focus on shadow removal. Additionally, [42] employs transformer-based concepts, particularly spatial attention maps, for shadow detection and removal.

Deep-learning models excel but are heavily reliant on the training dataset. However, the currently available datasets for shadow detection and removal are, in general, limited in size, containing only a few thousand images and thus representing a small portion of all possible shadow variation. With access to millions of images in the training dataset, deep-learning models have the potential to achieve outstanding results as generic feature extractors [42]. The inadequacy of data leads to performance compromise by deep learning models. To address the problem to some extent, this study introduces a novel triplet dataset for the purpose of shadow detection and removal. The proposed approach harnesses the capabilities of deep-learning models, specifically GAN, followed by post-processing steps. These steps amalgamate traditional image processing techniques to generate state-of-the-art outcomes.

1.1 Literature review

Accurately detecting shadows and subsequently applying post-processing to the identified regions can play a pivotal role in creating the ultimate shadow-free image. Previously, the primary focus centred on precisely identifying shadows through conventional image processing techniques. A series of effective approaches, as demonstrated in [10] and [19], prove proficient in detecting shadow boundaries within individual RGB images. Leveraging properties extracted from shadow samples, the methodology presented in [21] dynamically formulate a feature space and compute decision parameters. This is followed by a sequence of transformations to generate the shadow mask. In [15], authors introduce a groundbreaking deep CNN for shadow detection. This model, composed of a 7-layer network architecture, efficiently produce shadow masks. Another recent advancement in this domain is the introduction of an innovative instance single-stage detector [37]. This detector incorporates a bidirectional relation learning module that grasps the interplay between object instances and their corresponding shadow instances. Additionally, the instance single-stage detector employ a deformable maskIoU head to enhance the precision of generated shadow masks.

While these proposed methodologies adeptly generate accurate shadow masks, their limitations lay in their sole ability to detect shadows through mask generation. Once shadows are accurately pinpointed, the subsequent step involves reconstructing the areas concealed by the shadow regions. This reconstruction can be accomplished using either traditional image processing techniques or deep-learning-based models.

Conventional image processing methods offer a route to obtaining images devoid of shadows. In the study conducted by [39] and [14], a manual approach involves drawing three color lines on the image to execute shadow detection and removal. Using these input lines, the algorithm engages in region matching, ultimately generating a shadow-free image. An alternate approach proposed by [4] capitalises on the YCbCr color space. By utilising the luminance Y-Channel, this method endeavours to produce a shadow-free image. Likewise, [1] and [13] employ multi-channel thresholding for shadow detection, and utilise shadow-matting techniques for subsequent shadow removal. On the other hand, various pre-and post-processing approaches are also proposed from which the process of shadow removal may benefit to suppress unwanted artefacts. For example, an efficient histogram equalisation is presented in [32] for uniform and non-uniform backgrounds followed by its improvement in [34] and [33] where object edges and image’s natural structural information is preserved using novel 2D histogram equalisation.

Nonetheless, traditional image processing techniques have their limitations. These methods are not easily adaptable to different situations and tend to yield shadow-free images solely under specific circumstances. Furthermore, some of these traditional techniques rely on user inputs, which pose impractical challenges in real-time scenarios and applications.

Over the recent years, deep learning CNN models have undergone exponential advancements, offering remarkable outcomes in terms of accuracy and efficiency. An instance is Background Estimation Document Shadow Removal Network (BEDSR-Net) [20], which presents an approach for identifying and eliminating shadows in document images. This methodology comprises two sub-networks: one estimating background color and generating an attention map, and the other producing the shadow-free image. Another deep learning-based approach, DeshadowNet [26], adeptly detects and removes shadows from RGB images end-to-end. This method employs a multi-scale feature extractor to glean information from the input image, while a multi-context embedding module generates the shadow-free image utilising these extracted features.

Table 1 Comparison of recent shadow removal methods

Full size table

Within the realm of shadow detection and removal, GANs in [5, 36, 40], and [25] have garnered significant attention. GANs utilise generators to predict shadow-free images and discriminators to ascertain whether the generated image is indeed a shadow-free rendition of the original. A notable instance is Mask-ShadowGAN [9], which employs the concept of cycle GANs to learn from both shadow and shadow-free samples concurrently. Another recent approach by [29] introduces a three-layer CNN architecture for shadow detection, feature extraction, and final shadow-free image generation. Additionally, there are CNN models such as Dual Hierarchical Aggregation Networks (DHAN) [3] and spatial Recurrent Neural Networks (RNNs) with Direction-Aware Spatial Context (DSC) features [8], both are designed to predict shadow-free images. Building on the transformer concept, [42] and [35] present a methodology employing transformers followed by an encoder-decoder structure to produce the shadow-free image. The advantages and limitations of the most recent studies is presented in Table 1. Nonetheless, a major limitation of deep-learning models lies in their need for extensive training datasets, often in the millions, to perform optimally. Existing publicly available datasets, however, comprise only a limited number of samples, leading to a narrow variety across the dataset. Moreover, current deep-learning models demonstrate proficiency in monochromatic color images but struggle to yield satisfactory outcomes in the case of multi-color contrast images.

Datasets hold paramount importance for deep learning models, as these models necessitate extensive training data to effectively acquire desired features and perform with efficiency. Publicly accessible datasets for shadow detection and removal can be categorised into four types: Unpaired, paired (shadow-detection), paired (shadow-removal), and triplet datasets. Unpaired dataset comprises shadow and shadow-free images taken from diverse scenarios within a single training sample. The USR dataset [9] is an example of an unpaired dataset, encompassing 4,215 samples. Notably complex, this dataset demands intricate and computationally intensive models. Paired datasets for shadow detection involve shadow and shadow-mask samples. This type is used exclusively for shadow detection purposes. SBU [30] and UCF [44] are examples of such datasets, containing 4,727 and 245 training samples, respectively. Paired datasets tailored for shadow removal contain shadow and shadow-free samples, serving the sole purpose of shadow elimination. In this category, SRD [26] consists of 3,088 samples, while both UIUC [7] and LRSS [6] hold less than a hundred samples each. The triplet dataset, encompassing shadow, shadow-mask, and shadow-free samples, proves versatile, facilitating both shadow detection and removal tasks. ISTD [36] stands as the sole publicly accessible triplet dataset, housing 1,870 samples. Table 2 enlists popular publicly available datasets.

Table 2 Comparison of publicly available datasets

Full size table

1.2 Our contribution

The presented study aims to bridge the existing research gaps by tackling certain shortcomings in the current methods for shadow detection and removal. Therefore, key contributions are as follows:

This research proposes the creation of a novel benchmark dataset called Extended ISTD dataset. This dataset comprises 5,352 triplet samples, marking it as the largest dataset intended for shadow detection and removal. It notably incorporates samples featuring dark/hard shadows and multi-color contrast shadows. This strategic inclusion of diverse samples enhances the overall distribution of variability across the dataset.
A stacked conditional GAN similar to Stacked Conditional Generative Adversarial Networks (ST-CGAN) [36] has been proposed, featuring modified encoders and decoders in the generator modules. Spatial attention has been utilised in encoders, followed by nearest neighbour upsampling in decoders, to address the boundary irregularities associated with standard GANs.
A post-processing stage follows the outcome of the proposed GAN architecture. This stage encompasses a combination of various traditional image processing techniques, such as histogram matching, custom filters, detection and estimation of shadow boundaries. The collective utilisation of these techniques refines the shadow-free images generated by the deep learning model.

2 Methodology

The overall methodology (see Fig. 1) includes dataset preparation, proposed GAN model, which comprises two stacked GANs, for effective shadow removal followed by specially crafted post-processing for result enhancement. In general, when the initial shadow image is input into GAN-based systems, the resultant prediction is not a complete shadow-free image. There might be residual lighter shadows, imprecisely preserved shadow boundaries, and instances where dark shadow areas remain unaltered. To address these concerns, two measures are undertaken: the introduction of the Extended ISTD dataset for robust model training and the application of a comprehensive post-processing stage that amalgamates diverse image processing techniques. The proposed dataset, Extended ISTD, exhibits an augmented image distribution by encompassing a variety of multi-color shadow images and intensively dark shadow regions. The model output, which is the shadow-free image predicted by GANs, undergoes further refinement by harnessing the capabilities of established traditional image processing techniques. These techniques incorporate morphology operations, histogram matching, custom filters, and shadow edge detection. Consequently, the proposed methodology, combining GANs with subsequent traditional image processing techniques as a post-processing step, demonstrates the capacity to yield seamless, error-free shadow-free images.

Numerous deep-learning models have emerged for the purpose of shadow detection and removal. These existing models, though data-driven and proficient in specific contexts like monochromatic or soft shadow images, encounter challenges when dealing with multi-color contrast images and regions of intense darkness. In this context, the proposed approach employs GAN-based system similar to ST-CGAN as the deep-learning model of choice with modifications in generators’ encoders and decoders. The advantage of this selection lies in ST-CGAN’s ability to predict both shadow masks and shadow-free images. The original model is trained on the ISTD dataset [36] with 1,870 triplet images, thereby catering to both shadow detection and removal objectives. Nonetheless, the ISTD dataset is hampered by its limited number of training samples and their skewed distribution, often concentrating on monochromatic shadow images. Consequently, the trained model’s performance on unseen data, encompassing multi-color or darker shadow images, tends to fall short.

2.1 Extended ISTD Triplet Dataset

To aid in the assessment of shadow understanding techniques, we propose a new extended version of ISTD dataset and added 3,482 triplet samples in the original ISTD dataset [36]. The purposed extended version i.e., Extended ISTD comes with 5,352 triplet samples. To the best of our knowledge, this is the first large scale benchmark dataset which can be simultaneously used for shadow detection as well as shadow removal purpose. To increase the variety in training dataset, following three steps are adopted which are discussed below:

2.1.1 Extension with shadow and shadow-free images

SRD dataset, a few samples of which are shown in Fig. 2, represents a paired dataset containing 3,088 pairs of shadow and shadow-free images, designated for shadow removal tasks. This dataset notably boasts a comparably large sample count when contrasted with other datasets available to the public. However, lack of availability of shadow-masks in the SRD dataset pertains to its incompatibility with the proposed methodology, as it necessitates the inclusion of shadow, shadow-mask, and shadow-free images.

To acquire shadow mask images for SRD dataset, Algorithm 1 is proposed which involves taking shadow and shadow-free samples from various scenarios to generate shadow masks. Initially, pixel-wise subtraction is conducted between the shadow and shadow-free images. The resultant image’s single-channel component is selected, then subjected to thresholding to generate a binary image. A $3\times 3$ median filter is applied to the binary image, followed by a morphological closing operation, ultimately producing the desired shadow mask. This collection of triplets, encompassing shadow, shadow-free, and the newly derived shadow-mask, is subsequently incorporated into the Extended ISTD dataset.

2.1.2 Extension with shadow-only images

Internet-sourced random images (Public Domain Licensed), as shown in Fig. 3, offer a diverse array of shadow scenarios with varying contrasts, illuminations, and background textures. Integrating 394 such vibrant images with hard and soft shadows into the dataset presents the opportunity to significantly augment the variability. However, a key constraint arises in situations where only the shadow image is accessible, mandating the efficient generation of shadow-free and shadow-mask counterparts. The process of generating shadow-free images entails manual selection of the shadow region, followed by background estimation while eliminating the present shadows. For shadow images, in which the shadow-free counterpart can be adeptly generated, Algorithm 1 is applied to produce the corresponding shadow mask. This collection of triplets, consisting of the shadow, manually generated shadow-free image, and shadow mask produced is subsequently integrated with the pre-existing dataset.

2.1.3 Data augmentation

This process aims to enhance the variability within the dataset. The existing triplet samples in the dataset are subjected to augmentation, wherein various augmentation techniques such as rotation, flipping, and adjustments to contrast and brightness are randomly applied to the set of triplet images. These augmented images might appear visually similar to the human eye, yet they constitute distinct and singular samples for a CNN model. The triplet samples produced through augmentation are subsequently integrated into the dataset during proposed model training.

In summary, the Extended ISTD dataset diverges from the original ISTD dataset in three key aspects: (a) by extending the SRD dataset through the utilisation of shadow-masks (via Algorithm 1) derived from available shadow and shadow-free image pairs, (b) by integrating publicly accessible internet images containing both hard and soft shadows across various scenarios and lighting conditions. In this process, manual delineation of shadow regions is followed by software-based [11] shadow removal to obtain shadow-free images, which are then utilised in conjunction with Algorithm 1 to generate shadow-masks, and (c) by subsequently applying data augmentation techniques to the pool of 5,352 triplet samples, thereby enriching the dataset’s variability to enhance the training of deep neural networks. It is worth mentioning that Extended ISTD dataset is an extension over the training set of original ISTD dataset, the test set of both datasets remain same for fair comparative analysis on various shadow removal approaches. Original ISTD dataset [36] is available with already segregated training and test sets.

2.2 Proposed architecture

The chosen training strategy of ST-CGAN is grounded in generative adversarial learning, comprising two generators (G1, G2) and two discriminators (D1, D2) as illustrated in Fig. 4. The primary function of generator G1 is to take the shadow image as input and generate the corresponding shadow mask. Discriminator D1’s role is to assess the accuracy of the shadow mask generated by G1. The generated shadow mask from G1 is then combined with the initial shadow image and forwarded to generator G2 which is tasked with predicting the ultimate shadow-free image. Simultaneously, discriminator D2 evaluates whether the predicted shadow-free image is genuinely devoid of shadows or not.

In ST-CGAN, vanilla U-Net architecture [27] is used which is challenged by the notorious checkerboard artefacts [18]. This results in deterioration of the boundary region of shadows and the output shadow-free image appears with unnatural shadow residue. To address this issue to some extent, we have modified the generators G1 and G2 to include a spatial attention mechanism in the encoders. Moreover, in decoders, the nearest-neighbour 2D upsampling followed by simple 2D convolution is used instead of transposed 2D convolution (see Fig. 5). The functional relations governing the proposed generators are given in (1)-(7). Therefore, in both the encoders of the U-Net architectures of generator G1 and G2, the first convolution block becomes

$$\begin{aligned} C_{1}(x,y)= & {} \Sigma _{x,y}\Big (\Sigma _{i,j}I_{0}(i,j)W_{1}(x-i,y-j)\Big ), \text {for all { i},\,{ j} in the filter { W}},\end{aligned}$$

(1)

$$\begin{aligned} A_{Q,K}(x,y)= & {} \Phi \Big (QK^T/\sqrt{(}|K|\Big ),\end{aligned}$$

(2)

$$\begin{aligned} C_{2}(x,y)= & {} \Sigma _{x,y}\Big (\Sigma _{i,j}A_{Q,K}(i,j)W_{2}(x-i,y-j)\Big ), \text {for all { i},\,{ j} in the filter { W}},\end{aligned}$$

(3)

$$\begin{aligned} Y_{k}(x,y)= & {} max_{(p,q)\in R(i,j)}C_{2}(p,q), \end{aligned}$$

(4)

where, for pixels x, y in image, $C_{1}(x,y)$ and $C_{2}(x,y)$ are two convolution layers, $I_{0}(x,y)$ is the input image and $I_{1,2,3}$ will be the preceding encoder block’s maxpool layer’s output. $A_{Q,K}(x,y)$ is the attention layer with queries Q and keys K to be $C_{1}(x,y)$ each, while $|\cdot |$ is the matrix dimension size and $\Phi (\cdot )$ is the softmax function. $Y_{k}(x,y)$ is maxpooling layer with $k\times k$ pooling window. R(i, j) represents the set of indices (p, q) in the input feature map. It is worth mentioning that rest of the encoder blocks do not house attention layers.

Each decoder blocks is modified as follows,

$$\begin{aligned} I_{u}^{l}(x,y)= & {} C_{l}\Big ([x/s_x],[y/s_y]\Big )^\frown C_{2}^{L-l}(x,y), l = 0, 1,..., L-1,\end{aligned}$$

(5)

$$\begin{aligned} C_{1}(x,y)= & {} \Sigma _{x,y}\Big (\Sigma _{i,j}I_{u}^l(i,j)W_{1}(x-i,y-j)\Big ), \text {for all { i},\,{ j} in the filter { W}},\end{aligned}$$

(6)

$$\begin{aligned} C_{2}(x,y)= & {} \Sigma _{x,y}\Big (\Sigma _{i,j}C_{l+1}(i,j)W_{2}(x-i,y-j)\Big ), \text {for all { i},\,{ j} in the filter { W}}, \end{aligned}$$

(7)

where, $s_x$ and $s_y$ are the scale factors in x and y dimensions. In the proposed architecture, $L=3$ and $(^\frown )$ represents concatenation operation with corresponding second convolution layer of the $(L-l)^{th}$ encoder block via skip connection. $C_0(x,y)$ represents last bottleneck layer, $I_{u}^0$ is the upsampling layer of the first ($l+1$) decoder block. The architecture of the proposed generators is given in Table 3. It is worth mentioning that the discriminator architecture of ST-CGAN is followed in this study.

Table 3 The architecture of generators (G1/G2) of the proposed GAN

Full size table

While excelling with monochromatic shadow images, the chosen model falls short in delivering desirable outcomes for multi-color and deep shadow scenarios, as mentioned earlier. To address this limitation, the model undergoes training using the newly introduced Extended ISTD dataset. This dataset is meticulously designed to augment the shadow variation, achieved by deliberately incorporating multi-color shadow samples and dark shadow samples. During the training process, the dataset is partitioned to comprise 80% of samples in the training set and 20% in the validation set.

2.3 Post-processing

The proposed GAN, once trained on the Extended ISTD dataset, exhibits the ability to eliminate shadows from images. However, certain scenarios still result in residual lighter shadows or the absence of preserved shadow boundaries. To enhance image quality and address potential artefacts, the intermediate shadow-free image produced by the deep learning model undergoes a subsequent post-processing phase. This phase capitalises on various combined image processing techniques to generate a refined shadow-free image. The overall process is depicted in Fig. 6. Within this proposed post-processing step, shadow detection holds pivotal significance. The emphasis of the post-processing stage centres on predicting and refining the underlying texture solely within the shadow region. If shadow detection is executed accurately, signifying the efficient generation of a shadow mask, the post-processing step proves effective in generating a shadow-free version of the image. The steps are outlined in Algorithm 2.

The shadow mask produced by the deep-learning model undergoes refinement through the application of image processing techniques. This refinement process utilises both the input shadow image ($I_{sha}$) and the intermediate shadow-free image ($I_{int-sha-free}$) generated by the deep learning model. Algorithm 1 is employed for the generation of the shadow mask ($I_{mask}$). This post-processing takes the input shadow image ($I_{sha}$) and the intermediate shadow-free image ($I_{int-sha-free}$) generated by the proposed GAN as inputs, yielding the shadow mask ($I_{mask}$). Concurrently, an inverted shadow mask ($I_{inv-mask}$) is created by taking the inverse of the shadow mask ($I_{mask}$) generated by Algorithm 1. The shadow mask ($I_{mask}$) and its inverted counterpart ($I_{inv-mask}$) are subsequently utilised for extracting the shadow region and the shadow-free region.

The intermediate shadow-free image ($I_{int-sha-free}$) is subsequently divided into two distinct images: the shadow region ($I_{sha-reg}$), comprising solely the pixels belonging to the shadowed area, and the shadow-free region ($I_{sha-free-reg}$), encompassing only the pixels within the unshadowed area. The shadow region ($I_{sha-reg}$) is derived by subtracting the inverted shadow mask ($I_{inv-mask}$) from the intermediate shadow-free image ($I_{int-sha-free}$). Similarly, the shadow-free region ($I_{sha-free-reg}$) is obtained by subtracting the generated shadow mask ($I_{mask}$) from the intermediate shadow-free image ($I_{int-sha-free}$) on a pixel-by-pixel basis.

After the isolation of the shadow region ($I_{sha-reg}$) and the shadow-free region ($I_{sha-free-reg}$), a channel-wise histogram matching procedure is implemented exclusively on the extracted shadow region ($I_{sha-reg}$). Notably, this histogram matching operation is conducted with the extracted shadow-free region ($I_{sha-free-reg}$) serving as the reference image.

The filtered image is achieved through the process of pixel-wise averaging between the histogram-matched image ($I_{hist-match}$) and the shadow region ($I_{sha}$). In this context, $n_{1}$ and $n_{2}$ represent the weights assigned to $I_{hist-match}$ and $I_{sha}$, respectively, where the combined sum of $n_{1}$ and $n_{2}$ equals 1.

The filtered image ($I_{filt}$) is then concatenated with shadow-free region ($I_{sha-free-reg}$ produced with the help of the output of the proposed GAN model to get $I_{filt-conc}$. This approach ensures that the aimed histogram matching and average filters are exclusively employed on the shadow region.

The shadow boundaries ($I_{sha-bound}$) are derived through the utilisation of the shadow mask. By using the shadow mask, the detection of shadow edges is facilitated, and the Canny edge detector is employed to identify these edges. This results in a shadow free image, efficiently generated with preserved shadow boundaries.

The resulting $I_{sha-free}$ represents the ultimate shadow-free image, refined from the intermediate shadow-free image produced by the proposed GAN. Through comprehensive qualitative and quantitative assessments, it is evident that the presented methodology consistently surpasses other state-of-the-art approaches in terms of shadow removal performance.

3 Results and discussion

3.1 Evaluation metrics

To assess the efficacy of shadow removal, the Root Mean Square Error (RMSE) is calculated between the predicted shadow-free image and the ground truth shadow-free image as given in (8) with n being total number of pixels in each image. The RMSE quantifies the disparity between the two images by gauging their variations. In essence, a smaller RMSE value signifies a reduced discrepancy between the images, thereby indicating superior algorithm performance.

$$\begin{aligned} \text {RMSE} = \sqrt{\frac{1}{n} \Sigma _{i=1}^{n}{\Big ({ I_{sha-free} - I_{gnd-truth}}\Big )^2}}. \end{aligned}$$

(8)

PSNR in (9) denotes the peak signal-to-noise ratio existing between two images. This ratio serves as an assessment criterion to gauge the likeness between the generated shadow-free image ($I_{sha-free}$) and the ground truth shadow-free image ($I_{gnd-truth}$).

$$\begin{aligned} \text {PSNR} = 10\text {log}_{10}\Big (\frac{r^2}{\text {MSE}}\Big ). \end{aligned}$$

(9)

In the above equation, r represents the highest observed variation within the input shadow image, while MSE signifies the mean square error computed between the input shadow image ($I_{sha}$) and the produced shadow-free image ($I_{sha-free}$). The measurement unit of PSNR is expressed in decibels (dB).

Another similarity metric i.e., structural similarity index, SSIM, as given in (10), is calculated. It takes into account the perceived change in structural information, luminance, and contrast of the images. When comparing a shadow-free image to a ground truth image, the SSIM metric provides a quantitative measure of how close the refined image is to the ground truth in terms of visual quality.

$$\begin{aligned} \text {SSIM}(X, Y) = \frac{{(2 \mu _X \mu _Y + c_1)(2 \sigma _{XY} + c_2)}}{{(\mu _X^2 + \mu _Y^2 + c_1)(\sigma _X^2 + \sigma _Y^2 + c_2)}}, \end{aligned}$$

(10)

where, $X$ and $Y$ are the two images being compared, in our case $I_{sha-free}$ and $I_{gnd-truth}$ respectively. $\mu _X$ and $\mu _Y$ are the means of $X$ and $Y$, $\sigma _X$ and $\sigma _Y$ are their standard deviations of, $\sigma _{XY}$ is the covariance between $X$, and $Y$, $c_1$, and $c_2$ are constants to stabilise the division with weak denominator.

The variables $\mu _X$, $\mu _Y$, $\sigma _X$, $\sigma _Y$, and $\sigma _{XY}$ are calculated over a sliding window in the images. Typically, a Gaussian weighting function is applied to the window to give higher weights to the central pixels. The constants $c_1$ and $c_2$ are used to avoid instability when the denominator is close to zero. They are defined in (11) as:

$$\begin{aligned} c_1 = (k_1L)^2, \quad c_2 = (k_2L)^2, \end{aligned}$$

(11)

where, $L$ is the dynamic range of pixel values (e.g., 256 per RGB channels for 8-bit images), and $k_1$ and $k_2$ are constants to stabilise the division. The constants $k_1$ and $k_2$ are usually set to small values, such as $k_1 = 0.01$ and $k_2 = 0.03$, to ensure numerical stability.

SSIM provides a value between -1 and 1, where 1 indicates perfect similarity.

3.2 Experimental protocol

The training of the proposed model and subsequent post-processing are conducted on Google Colab Pro graphical processing units (GPUs). The environmental setup for training and validating the proposed methodology involves Python 3.7.14, TensorFlow (Keras), and TorchVision 0.6.1. During training, hyperparameters are meticulous fine-tuned, involving learning rate scheduling and an augmentation in the number of samples per batch. Learning rate scheduling and a progressive increment in batch size is performed systematically over the course of specific epochs transitioning from 8, 16, 32, and eventually up to 64 samples per batch. Moreover, Adam solver [17] is used to train the proposed GAN and weights are initialised with zero-mean and unit variance in scratch training on ISTD dataset. To boost learning capability, on-the-fly data augmentation with flipping and rotation (45$^{\circ }$, 90$^{\circ }$, and 180$^{\circ }$) is used. The multi-objective loss function balanced by tripple scaling ($\lambda $s) are kept similar as adopted in [36] for training ST-CGAN. This careful adjustment of hyperparameters aims to avert the model from converging at local minima, leading to the lower achievement of generator as well as discriminator. For generators G1 and G2, binary cross entropy and L1 [31] loss, respectively, are used in training. In summary, optimum parametric setting is ensured for all the algorithms including the proposed for best performance outcome.

The source codes of all approaches in comparative study are taken from their authors’ repositories and for fair comparison, optimal experimental settings are kept same as suggested by their respective published articles to reproduce the results for shadow removal on ISTD datasets. On Extended ISTD dataset, all models, including the proposed, are retrained using transfer learning till convergence on network weights previously trained on ISTD dataset.

3.3 Quantitative and qualitative comparison

This section delves into a comprehensive comparison of the shadow removal outcomes yielded by the presented methodology against those of other state-of-the-art approaches including methods such as ST-CGAN [36], DHAN [3], SpA-Former [42], LG-SNet [23], DSC [8], and M-SGAN [9]

For the purpose of quantitative comparison, the ISTD test set is chosen, comprising 540 triplet samples. It is worth mentioning that test set of the ISTD dataset is mutually exclusive with the training and validation sets of Extended ISTD dataset on which all the competing architectures are trained including the proposed. Table 4 contrasts the RMSE, PSNR, and SSIM of the proposed methodology with those of other deep learning-based models when both training and testing is performed on original ISTD dataset. Here, ‘S’ denotes the shadow region, ‘NS’ signifies the non-shadow region or shadow-free area, and ‘E’ represents the entire image. The scores for other approaches are extracted from their authors’ published articles which are also reproduced with the same outcomes to keep comparisons fair and validate the experimental protocol followed in this study. This comparison reveals that the proposed methodology surpasses other existing approaches, achieving an RMSE of 6.01 and a PSNR of 29.43 across the entire image. It is evident from Table 4 that the proposed architecture yields favourable scores as compared to all other approaches with better RMSE and PSNR metrics. This advocates the effectiveness of proposed changes in the standard ST-CGAN architecture. Similarly, Table 5 presents the comparison when all the algorithms are trained on Extended ISTD dataset while test set remains the same i.e., original ISTD test set. Here, again, the proposed methodology surpasses the rest with state-of-the-art performance highlighting the benefits of versatile training data, robust architecture and effective post-processing.

Table 4 Quantitative comparison for the shadow removal of proposed methodology with other techniques when training is done on ISTD dataset

Full size table

Table 5 Quantitative comparison for the shadow removal of the proposed methodology with other techniques when training is done on Extended ISTD dataset

Full size table

Figure 7 showcases the visual juxtaposition of the proposed shadow removal approach against other methodologies using the ISTD test dataset. Meanwhile, Fig. 8 illustrates the outcomes of shadow removal on freely available, randomly selected internet-sourced shadow images apart from ISTD test set. The selected test images encompass scenarios containing dark/hard shadow regions as well as those with multi-color contrast shadow effects. In both representations, all models are trained on Extended ISTD training set. This visual assessment of the generated shadow-free images reveals the clear superiority of the proposed methodology over existing alternatives. The method’s effectiveness is particularly evident in its ability to produce high-quality shadow-free images, even in the presence of intricate backgrounds, challenging dark/hard shadow conditions, and multi-color contrast shadow settings. As a result, the proposed methodology excels in comparison to the established state-of-the-art techniques.

Table 6 Performance comparison of the proposed model (trained on Extended ISTD dataset) with and without modified GAN’s generator and post-processing

Full size table

ST-CGAN in is relatively better on soft shadows but leave residue when hard shadows are present in an image specially on vibrant color backgrounds. This is attributed towards lack of fully capturing shadow-dependent features in generator CNNs where attention to the edge details is imperative. On the other hand, DHAN comprises dual hierarchical aggregation network and shadow matting GANs. Despite reasonable performance as compared to ST-CGAN and LG-SNet in Fig. 7, such a setup is critically dependent on large data availability and its variability [38]. Moreover, matting GANs are prone to artefacts in synthesised images, especially in the regions with complex textures or structures [28] or with dynamic scenes when lighting condition or color shades change abruptly [24]. These artefacts can degrade the visual quality of the output and may require additional post-processing steps to mitigate. This is consistent especially with the visual finding in this study. SpA-Former presents a promising approach to shadow removal by leveraging transformer architectures and spatial attention mechanisms. While transformers excel at modelling sequential data, they may struggle to efficiently capture spatial relationships in images. This limitation can affect the model’s ability to understand spatial contexts and dependencies critical for tasks like shadow removal from images [2]. LG-SNet is a deep learning approach designed specifically for shadow removal tasks [23]. While this architecture has demonstrated effectiveness in certain scenarios, it may have weaknesses that could impact its performance. For shadow removal task, LG-SNet performance may degrade when faced with diverse shadow types, illumination conditions, and environmental settings that are not adequately represented in the training data [43]. On the other hand, DSC is a method designed to detect shadows in images by using directional information and spatial context features. However, the method may produce unnatural effects or false positives in regions that are not shadows or at the shadow edges (see Fig. 8 for DSC results in first and third columns), especially when processing images with complex textures, structures or varying lighting [41]. LG-SNet and DSC yield superior results in qualitative analysis as compared to ST-CGAN, SpA-Former and DHAN. The challenges of shadows on diverse background, lighting variation, complex structure, and abrupt color change require careful attention to acquire robust features and ability to overcome irregularities and anomalies in shadow and non-shadow regions of image. The proposed architectural changes in the ST-CGAN generators using spatial attention and nearest-neighbour 2D upsampling mitigate shadow residue with its attention to edge details. Moreover, such modifications results in model robustness in contrast with the architectures (e.g., DHAN and LG-SNet) where extensive data with variation in shadow pattern is required. Furthermore, textural information is favourably preserved by better understanding spatial contexts, a major problem in DSC [8]. Nonetheless, quantitative and qualitative findings affirm the necessity of effective post-processing to address the limitations inherent in deep learning-based generative techniques. Table 6 presents an ablation study where efficacy of the proposed GAN model with modified generators is justified and also advocates the use of the proposed post-processing approach. It is evident that Model-A with similar architecture to the ST-CGAN (see Table 5 also produces similar results. On the other hand, Model-B with the proposed generator but without the post-processing attains better results followed by further enhancement by the final Model-C.

Despite relative superior performance, the proposed algorithm still shows minor artefacts at the edges of hard shadows (see Fig. 7, first and second columns, and Fig. 8, last column). The main reason behind this limitation is unavailability of pixel colour information which gets blocked by hard shadow resulting in failure of GAN-based system and post-processing to recover. Although attention-mechanism in the proposed approach helps to mitigate this issue to some extent, the challenge remains valid in general. Moreover, shadow removal operations over a textured surface leaves unnatural smoothing. The proposed algorithm depicts less visual anomaly in comparison to the other approaches but careful observation reveals texture deformation over the shadow regions as shown in Fig. 8, first and second columns.

The Extended ISTD dataset which is generated using standard ISTD, SRD and further internet-sources images exhibit diversity in shadow types, lighting conditions, and background complexities, but it also has certain limitations and potential biases. In terms of shadow types, this dataset typically include various kinds of shadows, including cast shadows, self-shadows, and object shadows, ensuring a representative sample for training and evaluation. Lighting conditions encompass a range of scenarios, from indoor to outdoor environments, different times of day, and varying weather conditions, providing variability for robust model training. Similarly, background complexities vary with simple backgrounds like plain walls to complex scenes with multiple objects and textures, enabling models to learn to remove shadows effectively in diverse contexts. However, potential biases may exist in these datasets. For instance, there might be an overrepresentation of certain types of shadows or lighting conditions compared to others, leading to imbalances in the training data and potentially affecting model performance on underrepresented scenarios. Additionally, datasets may lack sufficient variability in terms of cultural or geographic contexts, which could impact model generalization to real-world applications across different regions or demographics. Furthermore, there are limitations in the quality and resolution of the images in this dataset, which could affect the realism and applicability of trained models in practical settings.

To mitigate these biases and limitations, future efforts could focus on collecting more diverse datasets that include a broader range of shadow types, lighting conditions, and background complexities. Incorporating data augmentation techniques to simulate additional variations and addressing biases through careful curation and sampling strategies can enhance the representativeness and robustness of datasets for shadow removal tasks. Additionally, efforts to ensure inclusivity and diversity in dataset collection processes can help improve the generalisation of trained models to diverse real-world scenarios and populations.

Shadow removal techniques play a crucial role across various real-world scenarios, including digital photography enhancement, surveillance imagery analysis, and autonomous driving systems. In digital photography, shadow removal enhances image quality by reducing distractions and improving overall clarity. Similarly, in surveillance imagery analysis, shadow removal aids in identifying objects and individuals obscured by shadows, thus enhancing security and investigative processes. In the context of autonomous driving systems, accurate shadow removal contributes to improved object detection and scene understanding, ultimately enhancing safety and decision-making algorithms. However, modern deep learning-based approaches, such as GANs and vision transformers, face challenges in handling dynamic lighting conditions and complex backgrounds, which can lead to inaccuracies in shadow removal. Mitigation strategies could involve incorporating additional contextual information, such as scene geometry or temporal data, to improve robustness. The improvements may focus on developing adaptive GAN architectures capable of dynamically adjusting to varying lighting conditions and complex environments, as well as exploring techniques for scalable training on large and diverse datasets to enhance generalisation capabilities. By addressing these challenges, such methods could further advance the effectiveness and applicability of shadow removal techniques in diverse real-world scenarios.

Deep learning algorithms like the ones presented in Tables (4) and (5) can effectively remove shadows from images, leading to enhanced image quality and better visual appeal. This can be particularly useful in applications like photography, surveillance, autonomous driving, etc., where clear and well-lit images are crucial. Once trained, these models can automate the process of shadow removal, reducing the need for manual intervention and saving time for users. This automation can lead to increased efficiency in various image processing workflows. Nonetheless, such models require large and diverse datasets for training to generalise well. However, obtaining such datasets with a wide variety of shadow types and scenes can be challenging, leading to potential biases in the trained models. Training deep learning models, especially complex ones like GANs, requires significant computational resources, including high-end GPUs and large amounts of memory. This can make deployment and scaling of these models costly and impractical for some applications. This highlights the importance of comprehensive dataset for tasks like shadow detection and removal covering maximum possible patterns of shadows in simple to complex environmental variety. Failure in providing diverse data may result in overfitting by deep architectures, where they memorise the training data instead of learning informative features. This can lead to poor performance on unseen data, especially if the training dataset is limited or not representative of the real-world scenarios the model will encounter. Real-world images often contain various complexities such as different types of shadows, varying lighting conditions, and occlusions. Ensuring that the trained models are robust enough to handle such variations is a significant challenge. In terms of resourse utilisation, Table 7 tabulates the number of parameters, training and inference time taken by approaches compared in Tables 4 and 5 on Google Colab Pro. Although different approaches are trained on Extended ISTD dataset for different number of epochs, Table 7 mentions the training time against 280 epochs for standardised comparison.

Table 7 Comparison of model parameters, training time on Extended ISTD dataset for 280 epochs and inference time per image on Google Colab Pro

Full size table

Future research can focus on designing more robust and efficient deep learning architectures tailored specifically for shadow removal tasks. This includes exploring novel network structures, attention mechanisms, and fusion strategies to handle various types of shadows and lighting conditions effectively. Investigating and exploring techniques for uncertainty estimation can provide insights into model confidence and help identify potential failure cases. Moreover, developing techniques to enhance the robustness of deep learning models against adversarial attacks and perturbations can improve their reliability in real-world scenarios. This includes adversarial training, robust optimisation methods, and adversarial defence mechanisms tailored for shadow removal tasks. Customising and fine-tuning pre-trained models for specific application domains can improve their performance and efficiency. This involves understanding the unique characteristics and challenges of the target domain and adapting the model accordingly through domain-specific loss functions and training strategies. Enhancing the interpretability and explainability of deep learning models can facilitate their pragmatic adoption. Research can focus on developing methods for visualising model decisions, understanding feature importance, and providing insights into the underlying shadow removal process.

Another important factor that is crucial for adoption of deep architectures for shadow removal is their processing capability in real-time applications like autonomous vehicles which require quick and video-based shadow removal procedures. Current state-of-the art methods are generally complex and computationally expensive. Therefore, achieving promising performance on light-weight and efficient deep architectures will be the focus of attention in the upcoming studies. To achieve this, network pruning and model quantisation approaches can be adopted. Network pruning and quantization present promising avenues for making complex and computationally heavy deep learning algorithms like GAN and transformer-based systems more efficient without sacrificing performance. Pruning involves identifying and removing redundant or insignificant parameters from neural networks, leading to reduced model size and computational requirements while preserving accuracy. Techniques such as magnitude-based pruning, weight pruning, and structured pruning can effectively reduce model complexity without compromising performance. Additionally, quantization techniques aim to represent network parameters with reduced bit precision, thereby decreasing memory usage and improving inference speed. Methods like uniform quantization, non-uniform quantization, and quantization-aware training enable efficient representation of model parameters while maintaining performance. By employing network pruning and quantization, researchers can enhance the efficiency of advanced deep learning algorithms for shadow removal tasks, making them more viable for resource-constrained environments without compromising performance. Studies like [22] and [12] provide insights into the effectiveness of these techniques in optimizing deep learning models for real-world deployment.

4 Conclusion

This study introduces an innovative approach for the detection and removal of shadows from RGB images, all the while preserving the underlying background texture. To facilitate this objective, a substantial triplet dataset (Extended ISTD) is introduced. This dataset, comprising shadow, shadow-mask, and shadow-free images, serves to enhance the distribution of varying samples across the dataset. This augmentation is achieved by incorporating dark/hard shadow samples and multi-color contrast shadow samples into the training dataset. For robust shadow removal, stacked GAN architecture with generators studded with spatial attention and nearest-neighbour upsampling is proposed. Additionally, a post-processing phase is incorporated, harnessing the capabilities of diverse traditional image processing techniques, such as histogram matching, custom filters, shadow boundary detection, and estimation. This post-processing step contributes to refining the intermediate shadow-free image produced by the specially crafted GANs-based deep learning model. Following training on the Extended ISTD dataset and subsequent post-processing, the proposed deep learning model is adept at generating high quality shadow-free images with accuracy. With the proposed methodology addressing shadow removal, future efforts will be directed toward extending its capabilities beyond single RGB images to encompass video shadow detection and removal.

Availability of data and materials

Data and materials used in this manuscript will be made available on request.

Code availability

Source code will be made available on request.

References

Anoopa S, Dhanya V, Kizhakkethottam JJ (2016) Shadow detection and removal using tri-class based thresholding and shadow matting technique. Procedia Technol 24:1358–1365
Article Google Scholar
Carion N, Massa F, Synnaeve G et al (2020) End-to-end object detection with transformers. In: European conference on computer vision, Springer, pp 213–229
Cun X, Pun CM, Shi C (2020) Towards ghost-free shadow removal via dual hierarchical aggregation network and shadow matting gan. In: Proceedings of the AAAI conference on artificial intelligence, pp 10680–10687
Deb K, Suny AH (2014) Shadow detection and removal based on ycbcr color space. Smart Comput Rev 4(1):23–33
Article Google Scholar
Ding B, Long C, Zhang L, et al (2019) Argan: Attentive recurrent generative adversarial network for shadow detection and removal. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10213–10222
Gryka M, Terry M, Brostow GJ (2015) Learning to remove soft shadows. ACM Trans Graph (TOG) 34(5):1–15
Article Google Scholar
Guo R, Dai Q, Hoiem D (2012) Paired regions for shadow detection and removal. IEEE Trans Pattern Anal Mach Intell 35(12):2956–2967
Article Google Scholar
Hu X, Fu CW, Zhu L et al (2019) Direction-aware spatial context features for shadow detection and removal. IEEE Trans Pattern Anal Mach Intell 42(11):2795–2808
Article Google Scholar
Hu X, Jiang Y, Fu CW et al (2019b) Mask-shadowgan: Learning to remove shadows from unpaired data. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2472–2481
Huang X, Hua G, Tumblin J et al (2011) What characterizes a shadow boundary under the sun and sky? In: 2011 International conference on computer vision, IEEE, pp 898–905
Inpaint (2023) Inpraint software. https://theinpaint.com
Jacob B, Kligys S, Chen B et al (2018) Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2704–2713
Khan S, Pirani Z, Fansupkar T et al (2019) Shadow removal from digital images using multi-channel binarization and shadow matting. 2019 Third International conference on I-SMAC (IoT in Social. Mobile, Analytics and Cloud)(I-SMAC), IEEE, pp 723–728
Google Scholar
Khan S, Narvekar M, Fansupkar T et al (2021) Shadow removal using multi-channel binarization, color-line clustering and illumination estimation. In: 2021 4th Biennial international conference on nascent technologies in engineering (ICNTE), IEEE, pp 1–6
Khan SH, Bennamoun M, Sohel F et al (2014) Automatic feature learning for robust shadow detection. In: 2014 IEEE Conference on computer vision and pattern recognition, IEEE, pp 1939–1946
Khan SH, Bennamoun M, Sohel F et al (2015) Automatic shadow detection and removal from a single image. IEEE Trans Pattern Anal Mach Intell 38(3):431–446
Article Google Scholar
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
Kinoshita Y, Kiya H (2020) Checkerboard-artifact-free image-enhancement network considering local and global features. In: 2020 Asia-Pacific signal and information processing association annual summit and conference (APSIPA ASC), IEEE, pp 1139–1144
Lalonde JF, Efros AA, Narasimhan SG (2010) Detecting ground shadows in outdoor consumer photographs. In: European conference on computer vision, Springer, pp 322–335
Lin YH, Chen WC, Chuang YY (2020) Bedsr-net: A deep shadow removal network from a single document image. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12905–12914
Liu J, Fang T, Li D (2011) Shadow detection in remotely sensed images based on self-adaptive feature selection. IEEE Trans Geosci Remote Sens 49(12):5092–5103
Article Google Scholar
Liu Z, Li J, Shen Z et al (2017) Learning efficient convolutional networks through network slimming. In: Proceedings of the IEEE international conference on computer vision, pp 2736–2744
Liu Z, Yin H, Mi Y et al (2021) Shadow removal by a lightness-guided network with training on unpaired data. IEEE Trans Image Process 30:1853–1865
Article Google Scholar
Meng Q, Zhang S, Li Z et al (2023) Automatic shadow generation via exposure fusion. IEEE Transactions on Multimedia
Nagae T, Abiko R, Yamaguchi T et al (2021) Shadow detection and removal using gan. In: 2020 28th European signal processing conference (EUSIPCO), IEEE, pp 630–634
Qu L, Tian J, He S et al (2017) Deshadownet: A multi-context embedding deep network for shadow removal. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4067–4075
Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: Medical image computing and computer-assisted intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, Springer, pp 234–241
Takahashi N, Mitsufuji Y (2021) Densely connected multi-dilated convolutional networks for dense prediction tasks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 993–1002
Valanarasu JMJ, Patel VM (2023) Fine-context shadow detection using shadow removal. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1705–1714
Vicente TFY, Hou L, Yu CP et al (2016) Large-scale training of shadow detectors with noisily-annotated shadow examples. In: European conference on computer vision, Springer, pp 816–832
Vidaurre D, Bielza C, Larranaga P (2013) A survey of l1 regression. Int Stat Rev 81(3):361–387
Article MathSciNet Google Scholar
Vijayalakshmi D, Nath MK (2022) A novel multilevel framework based contrast enhancement for uniform and non-uniform background images using a suitable histogram equalization. Digital Signal Process 127:103532
Article Google Scholar
Vijayalakshmi D, Nath MK (2023) A strategic approach towards contrast enhancement by two-dimensional histogram equalization based on total variational decomposition. Multimedia Tools App 82(13):19247–19274
Article Google Scholar
Vijayalakshmi D, Nath MK (2023) A systematic approach for enhancement of homogeneous background images using structural information. Graph Model 130:101206
Article Google Scholar
Wan J, Yin H, Wu Z et al (2022) Crformer: A cross-region transformer for shadow removal. arXiv preprint arXiv:2207.01600
Wang J, Li X, Yang J (2018) Stacked conditional generative adversarial networks for jointly learning shadow detection and shadow removal. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1788–1797
Wang T, Hu X, Heng PA et al (2022) Instance shadow detection with a single-stage detector. IEEE Transactions on Pattern Analysis and Machine Intelligence
Xu L, Ren J, Yan Q et al (2015) Deep edge-aware filters. In: International conference on machine learning, PMLR, pp 1669–1678
Yu X, Li G, Ying Z et al (2017) A new shadow removal method using color-lines. In: International conference on computer analysis of images and patterns, Springer, pp 307–319
Zhang L, Long C, Zhang X et al (2020) Ris-gan: Explore residual and illumination with generative adversarial networks for shadow removal. In: Proceedings of the AAAI conference on artificial intelligence, pp 12829–12836
Zhang L, Long C, Zhang X et al (2023) Exploiting residual and illumination with gans for shadow detection and shadow removal. ACM Trans Multimed Comput Commun Appl 19(3):1–22
Google Scholar
Zhang XF, Gu CC, Zhu SY (2022) Spa-former: Transformer image shadow detection and removal via spatial attention. arXiv preprint arXiv:2206.10910
Zheng Q, Qiao X, Cao Y et al (2019) Distraction-aware shadow detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5167–5176
Zhu J, Samuel KG, Masood SZ et al (2010) Learning to recognize shadows in monochromatic natural images. In: 2010 IEEE Computer Society conference on computer vision and pattern recognition, IEEE, pp 223–230

Download references

Funding

The authors did not receive support from any organisation for the submitted work.

Author information

Authors and Affiliations

School of Electrical Engineering and Computer Science, National University of Sciences and Technology (NUST), Sector H-12, Islamabad, 44000, Pakistan
Muhammad Hamza Tariq & Ahmad Salman
Department of Computer Science, Namal University, 30 km Talagang Road, Mianwali, 42250, Pakistan
Khawar Khurshid

Authors

Muhammad Hamza Tariq
View author publications
You can also search for this author in PubMed Google Scholar
Ahmad Salman
View author publications
You can also search for this author in PubMed Google Scholar
Khawar Khurshid
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

First author carried out all the experiments and generated results. Second author contributed in designing the experiments and involved in writing. Third author provided support in Introduction and Methodology sections.

Corresponding author

Correspondence to Ahmad Salman.

Ethics declarations

Ethics approval

The manuscript is prepared by considering all the ethical guidelines of Multimedia Tools and Application Journal. There are no conflict of interests with any person or organisation. No human/animal subjects were harmed in this study.

Consent to participate

Consent was taken from all the authors to participate in the study resulting in this manuscript.

Consent for publication

Consent was taken from all the authors for the publication of this manuscript in Multimedia Tools and Applications.

Conflict of interest/Competing interests

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Tariq, M.H., Salman, A. & Khurshid, K. Towards enhancing shadow removal from images. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-19824-2

Download citation

Received: 11 September 2023
Revised: 30 April 2024
Accepted: 04 July 2024
Published: 10 July 2024
DOI: https://doi.org/10.1007/s11042-024-19824-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Towards enhancing shadow removal from images

Abstract

Similar content being viewed by others

Integration of GAN and Adaptive Exposure Correction for Shadow Removal

Learning to Remove Shadows from a Single Image

MGRLN-Net: Mask-Guided Residual Learning Network for Joint Single-Image Shadow Detection and Removal

Explore related subjects

1 Introduction

1.1 Literature review

1.2 Our contribution

2 Methodology

2.1 Extended ISTD Triplet Dataset

2.1.1 Extension with shadow and shadow-free images

2.1.2 Extension with shadow-only images

2.1.3 Data augmentation

2.2 Proposed architecture

2.3 Post-processing

3 Results and discussion

3.1 Evaluation metrics

3.2 Experimental protocol

3.3 Quantitative and qualitative comparison

4 Conclusion

Availability of data and materials

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Consent to participate

Consent for publication

Conflict of interest/Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation