ASDN: A Deep Convolutional Network for Arbitrary Scale Image Super-Resolution

Shen, Jialiang; Wang, Yucheng; Zhang, Jian

doi:10.1007/s11036-020-01720-2

ASDN: A Deep Convolutional Network for Arbitrary Scale Image Super-Resolution

Published: 22 February 2021

Volume 26, pages 13–26, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Mobile Networks and Applications Aims and scope Submit manuscript

ASDN: A Deep Convolutional Network for Arbitrary Scale Image Super-Resolution

Download PDF

394 Accesses
6 Citations
Explore all metrics

Abstract

Deep convolutional neural networks have significantly improved the peak signal-to-noise ratio of Super-Resolution (SR). However, image viewer applications commonly allow users to zoom the images to arbitrary magnification scales, thus far imposing a large number of required training scales at a tremendous computational cost. To obtain a more computationally efficient model for arbitrary-scale SR, this paper employs a Laplacian pyramid method to reconstruct any-scale high-resolution (HR) images using the high-frequency image details in a Laplacian Frequency Representation. For SR of small-scales (between 1 and 2), images are constructed by interpolation from a sparse set of precalculated Laplacian pyramid levels. SR of larger scales is computed by recursion from small scales, which significantly reduces the computational cost. For a full comparison, fixed- and any-scale experiments are conducted using various benchmarks. At fixed scales, ASDN outperforms predefined upsampling methods (e.g., SRCNN, VDSR, DRRN) by about 1 dB in PSNR. At any-scale, ASDN generally exceeds Meta-SR on many scales.

Accelerating the Super-Resolution Convolutional Neural Network

Learning a Deep Convolutional Network for Image Super-Resolution

Single Image Super-Resolution Using Lightweight CNN with Maxout Units

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Deep neural networks have made good progress in Single-image Super-Resolution (SISR), adeptly extracting image priors from data sets and efficiently learning mapping functions from LR to HR patches. However, for applications that allow users to zoom to arbitrary scales (e.g., face image SR [5] and satellite image SR [18]), multi-scale methods which learn the LR to HR mapping functions independently at each of several scales [12, 15, 29] become inefficient. Meta-SR [8] shows that SR of arbitrary decimal scales can be achieved by training one single model with the dynamic meta-upscaling module. But meta-SR can only generate HR images on scales for which it has trained, making it computationally impractical to train for all scales of interest for any-scale SR.

To alleviate the need for so many training scales, we find image patches have the same similarity at different scales. The self-similarity-based SR method [17] enhances the textural content with similar patches across different scales. Furthermore, image edges are scalable, and different-scale images have similar edge information, represented by high-frequency image information. In order to seek the missing high-frequency information of SR images, a Laplacian pyramid based-method is proposed to interpolate between a sparse set of trained scales. Indeed, the Laplacian filter is an edge detector, and the Laplacian noise term can be used to detect the outliners for robust tracking [24]. Therefore, similar high-frequency image information across different scales can be highlighted through the Laplacian pyramid structure. Moreover, the Laplacian pyramid structure has been proved to reduce the training data requirements for multi-scale SR in MS-LapSRN [14], generating the 3 × HR images with the 4 × SR results and predicting 8 × HR images by progressively deploying through the network for 2 × SR. Therefore, it is feasible to reduce the training costs with a Laplacian Pyramid [13] network structure.

Unlike previous Laplacian Pyramid networks for multi-scale SR, we seek to train a model to predict any-scale SR images. Obviously, a large upsampling ratio can be expressed as an integer power of ratios in a small range. Therefore, given a network for super-resolution at scales in a small range (such as the real-number interval (1,2]), arbitrary larger scales (real numbers greater than 2) can be implemented by recursion. Inspired by the classical Laplacian pyramid method [3], which reconstructs HR images by restoring the residual images between two Laplacian pyramid levels, we introduce a Laplacian Frequency Representation to learn the mapping function for SR of scales in the small range (1,2]. Our algorithm represents the HR images of any continuous decimal scale in the range by the two neighboring Laplacian pyramid levels. For SR of the large decimal ratios, we progressively upscale the coarse HR images, and recursively deploy them through the network multiple times with a small decimal ratio in the range to gradually refine the HR images.

In this paper, we propose our network as Any-Scale Deep Super-Resolution Network (ASDN) based on the multi-scale parallel reconstruction architecture. Each reconstruction branch shares the Feature Mapping Branch (FMB) and predicts the Laplacian pyramid levels through the Image Reconstruction Branch (IRB). Our network requires a minimal amount of training data and computational resources but effectively generates any-scale SR results.

We present extensive comparisons on both fixed integer scales and any decimal scale on commonly used benchmarks, and provide the results of the ASDN and the fine-tuned ASDN (FSDN), for the reference in comparison with the existing multi-scale SR methods. ASDN outperforms all of the other predefined upsampling methods and even some single upsampling models, without training on the specific-scale data samples. FSDN has state-of-the-art performance for fixed scale SR, comparing favorably to all existing methods. For any-scale SR factor, we retrain many previous network structures [12, 15, 29] with our any-scale SR method into any-scale SR categories for comparison. Our ASDN is effective for SR of any desired scale and specifically achieves the state-of-the-art performance on scales within the small range (1,2].

In summary, our work provides the following contributions:

(1)
Laplacian Frequency Representation: We propose a Laplacian frequency representation mechanism to reconstruct image SR at small scales, those continuously varying between 1 and 2. The HR images are the weighted interpolation of their two neighboring Laplacian pyramid levels, which efficiently reduces the training scale demands for learning the SR at continuous scales.
(2)
Recursive Deployment: We introduce Recursive Deployment for generating the HR images of the larger upsampling ratios, as we find that the HR images of the larger scales can be gradually upsampled and recursively deployed with small ratios. This extends any-scale SR from small scales to larger ones without requiring additional training scales.
(3)
Any-scale Deep SR Network: We propose an Any-Scale Deep Super-Resolution Network (ASDN) to generate HR images of any random scale with one unified network, providing enormous computational savings over directly applying existing CNN-based multi-scale methods for any-scale applications.

2 Related works

2.1 Image super-resolution using CNN

Image super-resolution has evolved greatly over the past decades, and numerous image SR methods [12, 15, 29] have been proposed to improve image reconstruction performance. With the fast development of the computation processor, CNN-based SR methods have demonstrated state-of-the-art results by optimizing an end-to-end network to learn the LR-HR mapping function. Dong et al. [4] initially introduced convolutional layers into image SR, which have been proved effective for the task. However, the network consists of only three layers, unable to observe superior results with the deeper model. He et al. [10] solved this problem by residually skip connecting layers inside the network to help the gradient flow across the deeper models. Later on, more skip connection structures, dense connection [23] were proved to accelerate network convergence by feature reusing across the layers. RDN [29] and DBDN [25], embed the dense convolutional neural network into image SR to further improve image reconstruction accuracy. Then, the attention module was adopted into the SR to help the network focus on the high-frequency feature learning. Liu et al. [16] introduced the spatial attention to mask out the high-frequency component locations in the HR images,and RCAN [28] replaced normal feature layers with residual channel layers to adaptively rescale channel-wise features to reduce the unnecessary computations for abundant low-frequency features. However, these methods mainly focus on multi-scale SR (e.g., 2×,3 ×, and 4 ×). In this paper, we propose to reconstruct any-scale SR with a few numbers of training scales, which can significantly reduce the computational cost.

Any-scale SR model is seldom investigated in image SR. Recently, Meta-SR [8] proposed a meta-upscale module for arbitrary scale SR, which dynamic magnifies image with decimal scales, by training and testing with 40 different scales at the stride of 0.1. However, Meta-SR [8] did not provide a systemic approach or experimental results for any scale that not included in the 40 trained scales. In other words, only training with 40 different scales, Meta-SR can not solve the SR of undetermined decimal scales. Nevertheless, if we use enormous scales of data to train the Meta-SR model for the full any-scale SR approximation, it might take a very long time to optimize the network for its convergence, which is not practical. Different from these methods trained with all the scales of interest, we propose a novel network ASDN for SR of any potential scale, which adopts our any-scale SR method, including Laplacian Frequency Representation and Recursive Deployment.

2.2 Laplacian pyramid structure

The Laplacian Pyramid [3] is used for restoring HR images by preserving residual image information. As shown in Fig. 1a, the decomposition step firstly preserves the residual information R₁, R₀, as image downscaled. Then the kept residual information R₁, R₀ will be stored back by adding with the low-resolution image I₁, I₀, to reconstruct the initial HR image H₁, H₀.

With the development of deep learning, many models adopt the Laplacian Pyramid structure as the main mapping frameworks, which construct progressive upsampling networks for image SR. Such as LapSRN [13] in Fig. 1b, a multi-phase network and each phase learn the residual information with convolutional layers. LapSRN progressively reconstructs each pyramid levels at the interval of 2 times, for 2×,4×, and 8 × SR, respectively. MS-LapSRN [14] is the parameter sharing version of LapSRN, which shares the network parameters across pyramid levels and exhibits the efficiency of recursive deployment. However, these models are designed to effectively predict SR of large scale factors.

In this paper, we present Laplacian Frequency Representation to reconstruct SR results of continuous scales. In our design, each pyramid level is at the interval of 0.1 in scale and parallelly allocated at the end of the network. According to the Laplacian pyramid [3] that the lost high-frequency information can be presented by the two neighboring pyramid levels, the high-frequency information of HR image is predicted based on the weighted interpolation of the two Laplacian pyramid levels neighboring the testing scale. The Laplacian Frequency Representation entails fewer training SR samples to generate HR images of scales in the continuous ratio range, which reduces the undesired training data storage space and shrinks the optimization period to accelerate the network convergence.

3 Any-scale image super-resolution

In this section, we provide the mathematical background of the any-scale SR method including Laplacian Frequency Representation and Recursive Deployment and introduce the structure of our proposed ASDN.

3.1 Any-scale SR method

There are two steps in the any-scale SR method: Laplacian Frequency Representation and Recursive Deployment. The proposed Laplacian Frequency Representation method is to generate HR images of decimal scales in a continuous scale range, and Recursive Deployment is to define the recursion times N and the small ratio r at each recursion for any-scale SR prediction. To use the minimum training samples, we define the small decimal ratios r ∈ (1,2]. For SR of each upscaling ratio R, the HR image of upscale ratio R can be achieved by recursively upscaled with a small ratio r and deployed N times.

3.1.1 Laplacian frequency representation

To generate SR of decimal scales in a continuous range, the intuitive method is to train the network with random dense scales in the range. However, we find this method is difficult due to a large amount of training scale samples and computation power for optimizing the network. To deal with this problem, we introduce Laplacian Frequency Representation as the intermediate representation of the high-frequency image information of SR results.

As shown in Fig. 2, our proposed Laplacian Frequency Representation has L Laplacian pyramid levels, and each pyramid level l is tasked with learning the high-frequency image information of HR images $O_{r_{l}}$ for the scale r_l with training samples of corresponding scales.

$$ r_{l}=\frac{l}{L-1}+1 , l=0,...,L-1\frac{}{} $$

(1)

According to the scalability of the image edges and the comprehensive coverage of high-frequency information of images in edges, we can interpolate the high-frequency image details of SR results of any small decimal scales r based on their two neighboring Laplacian pyramid levels.

For a given scale factor r in this continuous range, the Laplacian frequency represented HR images O_r can be defined as

$$ O_{r}=O_{r_{i}}+w_{r}*P_{i} $$

(2)

where

$$ P_{i}=O_{r_{i-1}}-O_{r_{i}}, i=1,...,L-1 $$

(3)

Here the phase number i = ⌈(L − 1) ∗ (r − 1)⌉ and w_r is the weight parameter of the edge information for the r scale SR. We define the weight parameter according to distance proportion of the scale r to the r_i in the phase P_i.

$$ w_{r}= (L-1)*(r_{i}-r) $$

(4)

The interpolated representation can be regarded as calculating the missing high-frequency image details of HR images of certain scales, so we name the mechanism as Laplacian Frequency Representation. The further evaluation of the accuracy of Laplacian Frequency Representation and the density of Laplacian pyramid levels in the experiment section proves that the represented SR results highly coordinate with the directly learned results, and the performance is stable when the Laplacian pyramid levels are at a certain density. As a result, we propose to train the Laplacian pyramid levels using deep neural networks with several scales and reconstruct the HR images of continuous decimal scales in the range with Laplacian Frequency Representation.

3.1.2 Recursive deployment

For SR of any upsampling ratio R in the larger range, it is impossible to train SR samples of all the scales to learn the mapping function of these scales. To minimize the training sample demands, we reuse the learned mapping network for SR of decimal scales in the range of (1,2]. We are based on the idea that any upscale decimal ratio R can be expressed as an integer N power of decimal ratios r in a small range. Therefore, the HR images of R can be generated by gradually upscaling and recursively deploying through the mapping network N times with small decimal ratios r ∈ (1,2].

We express the R as an integer N power of small decimal ratios r ∈ (1,2]. The integer N denotes the recursion times for the deployment, and the small ratio r is the upsampling ratio at each recursion. To determine the best solution of N and r for any-scale SR, several comparison experiments are performed in the experiment section. As we observed, SR with the larger upscale ratio r at the early recursions and the smaller recursive deployment times N has better performance than other N and r solutions.

Therefore, for any scale factor R, the recursive times N

$$ N=\lceil{\log_{2} R}\rceil $$

(5)

The upscale ratio r_n at each recursion n can be defined as

$$ r_{n}= \left\{\begin{array}{ll} 2 & \text{if} n \leq N-1 \\ \frac{R}{2^{N-1}} & \text{if} n= N \end{array}\right. $$

(6)

Based on the defined N and r solution for R, if the recursion time N is 1, the HR images of R = r are directly deployed by the network. In other situations, the coarse HR images from the previous recursion are bicubic upscaled with the small ratio r as the input LR images at the current recursion. For better SR performance, at the early N − 1 recursions, the small ratio r_n = 2, and at the N_th recursion, $r_{n}=\frac {R}{2^{N-1}}$.

3.2 Any-scale SR deep network

In this section, we build a deep neural network to predict the Laplacian Frequency Representation from the input images.

3.2.1 Network architecture

The Laplacian Frequency Representation should consist of L = 11 Laplacian pyramid levels for SR in the scale range (1,2]. Each Laplacian pyramid level is the reconstructed HR image containing high-frequency details. Due to the mutual relationship among different scales in the SR networks [12], our network for Laplacian Frequency Representation are based on the multi-scale parallel [28] framework by sharing the Feature Mapping Branch (FMB) across different scales and restoring HR images with separate Image Reconstruction Branches (IRBs). Sharing the FMB can largely reduce the computation capacity, and separating IRB reduces the complexity of the original learning problem and leads to an accurate result.

The feature mapping branch (FMB) of the Laplacian Frequency Representation is constructed by a deep convolutional neural network H = f_fmb(I). As shown in Fig. 3, FMB consists of Bi-Dense structure [25] for efficient feature learning and channel attention modules [28] for highlighting high-frequency context information. In the Dense Attention Block (DAB), channel attention module (see Fig. 3c) connects right after the concatenated feature channels. Therefore, the high-frequency information of the concatenated channel features are highlighted before preceding into the next block and thus allow the network to focus on more useful channels to improve reconstruction performance.

The other part of the network f_irb is the image reconstruction branch (IRB), which represents the Laplacian pyramid levels. For each Laplacian pyramid level $O_{r_{i}}=f_{irb}(H), i=0,...,10$, the locations of the tiny textures are different, and these textures usually contain high-frequency information, while the smooth areas have more low-frequency information. Therefore, to recover high-frequency details for image SR of different scales, it is helpful to mask out the discriminative high-frequency locations with spatial attention mechanism [16]. As shown in Fig. 3a, the learned high-level features are firstly restored into image space by a three-channel convolutional layer at each Laplacian pyramid Level. Then the restored image goes into the spatial attention (SA) [9] unit in Fig. 3c, to mask out the adaptive high-frequency information in the HR images of different scales. To preserve the smooth areas information and concentrate on training high-frequency information, the input interpolated LR images are added with the network output by identity skip connection (SC) to generate HR images.

To train the Any-scale SR Deep Network (ASDN) and generate Laplacian Frequency Representation, each IRB is randomly selected and combined after FMB at each update. For some practical applications where only require SR of specific scales, our ASDN can be fine-tuned to a fixed-scale network (FSDN) to further improve the reconstruction accuracy for the scales of interest by training image samples of specific scales. FSDN shares the same network structure as ASDN, except the deconvolutional layer of a specific scale, is inserted at the front of each IRB, which follows the common multi-scale single upscaling SR networks [15, 28, 29].

4 Experiments

In this section, we describe the implementation details of our models, including model hyper-parameters, training and testing details. Then we compare the proposed any-scale network and the fine-tuned fixed-scale model with several state-of-the-art SR methods on both fixed and any scale benchmark datasets including the quantitative, qualitative comparisons and any-scale comparisons. The effectiveness evaluation of the proposed any-scale method and the contribution study of different components in the proposed any-scale deep network are also provided in the paper.

4.1 Implementation details

Network settings

In the proposed ASDN, all convolutional layers have 64 filters and 3 × 3 kernel size except the layers in IRB for restoring images and the convolutional layers in CA and SA units. The layers for image restoration have 3 filters and all the convolutional layers in CA and SA units are 1 × 1 kernel size, which adopt the same setting as CSFM [9]. Meanwhile, the 3 × 3 kernel size convolutional layer zero-pads the boundaries before applying convolution to keep the size of all feature maps the same as the input of each level. ASDN and FSDN share the same FMB structure, where 16 DAB are densely connected and each DAB has 8 dense layers. But in FSDN, the deconvolutional layer settings follow single upsampling networks [15, 29] to upscale feature mappings with the corresponding scales.

Training details

The original training images are from DIV2K dataset [1] and Flicker dataset [1]. The input LR images for ASDN are bicubic interpolated from the training images with 11 decimal ratios r, which are evenly-distributed in the range of (1,2]. In each training batch, 16 augmented RGB patches with the size of 48 × 48 are extracted from LR images as the input, and the LR images are randomly selected from one scale training samples among the total 11 scales training data. Here the data augmentation includes horizontal flips and 90-degree rotations are randomly adopted on each patch. To fine-tune the FSDN, the input LR images are downscaled by the scale factor among 2×,3×,4×, and 8 ×. In the training batch, a batch of 96 × 96 size patches is used as the targets and the corresponding scale LR RGB patches to optimize the specific scale modules. In general, ASDN and FSDN are all built with the platform Torch and optimized by Adam with L1 loss by setting β₁ = 0.9, β₂ = 0.999, and 𝜖 = 10^− 8. The learning rate is initially set to 10^− 4 and halved at every 2 × 10⁵ minibatch updates for 10⁶ total minibatch updates.

Testing details

Our proposed networks are tested on five widely-used benchmark datasets for image SR: Set5 [2], Set14 [26], BSD100 [22], Urban100 [11] and Manga109 [19]. To test any-scale network (ASDN) for SR of a random scale s, the testing images are first downscaled with the scale factor s as the LR images. If the scale s is not larger than 2, the LR images with scale s are upsampled and forwarded into the ASDN with the two enabled neighboring Laplacian pyramid levels of the scale s. HR images are predicted by interpolating these two levels based on Eq. 2. While if the scale s is larger than 2, the testing recursion times are based on $N=\lceil {\log _{2} R}\rceil $. At each recursion n, the outputs of previous recursion are upscaled as input and deployed through ASDN with r_n according to the Eq. 6, except the initial recursion, which uses the LR images as input. To test fixed-scale network (FSDN), the testing input images are downscaled by the fixed scales s and deployed into the FSDN with the scale corresponding modules are enabled to yield the testing output.

4.2 Comparison with state-of-arts

To confirm the ability of the proposed methods, We first compare with state-of-the-art SR algorithms for qualitative and quantitative analysis on the normal fixed scales 2 ×, 3 ×, 4 ×, 8 ×, which includes predefined upsampling methods (SRCNN [4], VDSR [12], DRRN [7], MemNet [21] and SRMDNF [27]), and single upsampling methods (RDN [29], LapSRN [13], EDSR [15], RCAN [28]).

4.2.1 Quantitative comparison

We compare the performance of our any-scale SR networks with state-of-the-art methods on the five challenging dataset benchmarks. Table 1 shows quantitative comparisons for × 2, × 3, × 4, × 8 SR. For fair comparisons with the recent single upsampling networks, we fine-tune the ASDN with the fixed × 2, × 3, × 4, × 8 scale SR samples as FSDN for reference. It is obvious that FSDN has better performance than state-of-the-art methods, except RCAN on some datasets. Although on Urban100, which is consisted of straight-line building structure images, RCAN has better performance than FSDN due to the more channel attentions across the network, which is sensitive to the sharp edges in the image reconstruction. On other datasets, FSDN reconstruction accuracy is comparable to RCAN. This indicates the network, which is the same main framework as ASDN is effective to learn mapping functions for SR tasks.

Table 1 Quantitative evaluation of state-of-the-art SR algorithms

Full size table

Due to the strong ability of the framework, our ASDN performs favorably against the existing methods, especially compared to the predefined upsampling methods. Noted that ASDN does not use any 3 ×, 4 ×, 8 × SR samples for training but still generates comparable results as EDSR. There are mainly two reasons for ASDN drops behind some upsampling models. First, these upsampling models are trained with fixed-scale SR samples, and customized for the 2 ×, 3 ×, 4 ×, and 8 × scales deployments, but ASDN is trained with scales in (1,2]. Second, the upsampling layers [20] can improve the reconstruction performance, as shown in our experiment, FSDN (the upsampling version of ASDN) has more than 0.1dB PSNR compared to ASDN on scale 2 ×. However, some of the upsampling layers can only apply for SR of the integer scales [20], such as transposed layers. Although, Meta-upsampling [8] layer can upscale images with decimal scales, these scale factors need to be trained before deployment. Therefore, we compromise some reconstruction accuracy for the continuous scale SR using the predefined upsampling structure, which only requires to be trained with several representative scales. Our ASDN is still very profound on the normal fixed scales compared with the existing predefined upsampling deep methods. Regarding the speed, our ASDN takes 0.5 seconds to process a 288×288 image for 2 × SR on a Titan X GPU, and FSDN takes about 0.04 seconds to generate a 288×288 image for 2 × SR.

4.2.2 Any-scale comparison

In this section, in order to evaluate the efficiency of our ASDN for any upscale ratio SR, we firstly compare ASDN with other methods. The Bicubic interpolation method is adopted as the reference, and some deep learning network frameworks (EDSR, RDN, VDSR) are retrained with the proposed any-scale SR method and the same training data as our ASDN for any-scale SR comparison denoted as EDSR-Conv, RDN-Conv, and VDSR-Conv. Meta-EDSR and Meta-RDN [8] are dynamic meta-upsampling models which are trained with scale factors from × 1 to × 4 at the stride of 0.1.

The experimental results are shown in Table 2, which uses the PSNR value for comparison. It shows the PSNR value on SR of 9 trained scales from × 1.1 to × 1.9 and it is obvious that our ASDN reaches the state-of-the-art performance. Then it also illustrates ASDN efficiency on the scales not trained before and evaluates the effective scale range of our proposed any-scale SR network. For SR of scales out of the range, ASDN is comparable to Meta-EDSR, but slightly drops behind Meta-RDN. This is due to ASDN is the recursively deployed results, and Meta-RDN is customized with these scales. Although the recursively deployed SR results have slight drop back as the directly deployed results, recursive deployment can still effectively generate SR of scales not trained before. Through this way, ASDN only needs 11 training scales for any-scale SR.

Table 2 Results of any-scale SR on different methods tested on BSD100

Full size table

Figure 4 shows the any-scale SR results on a continuous scale range. We test our any-scale network performance with random decimal scales distributed in the commonly used range of × 2 to × 8 on Set5 and plot out the results into the line. It is proved that ASDN and the models trained with our any-scale SR method can effectively reconstruct HR images of continuous upscale ratios. Our ASDN outperforms all the other methods, which is generally 0.15 dB better than EDSR-Conv, outperforms VDSR-Conv by 0.6 dB and robustly keeps the deference of more than 3 dB PSNR from Bicubic method in the continuous scale range. The result demonstrates our ASDN can effectively reconstruct HR images of continuous upscale ratios and our any-scale training method is flexible to many deep CNN networks.

4.2.3 Qualitative comparison

We show visual comparisons on the testing datasets for 2 ×, 4 × and 8 × SR. For 2 × enlargement of Set14 in Fig. 5, FSDN suppresses the bias artifacts and recovers the cloth pattern and text closer to the ground truths than all the other methods. Meanwhile, ASDN tends to construct less biased images than other methods. For 4 × enlargement of the parallel straight lines in Fig. 6. Our methods generate a clearer building line, while other methods suffer the blurring artifacts. RCAN tends to generate misleading strong edges due to the more channel attention structure, but our ASDN and FSDN generates soft patterns closer to the ground truth. The reconstruction performance on 8 × SR is further analyzed in Fig. 7. FSDN restores the sharper characters than the compared networks and ASDN is able to recover more accurate textures from the distorted LR image than many other fixed-scale methods.

4.3 Study of any-scale methods

We study the effects of Laplacian Frequency Representation and Recursive Deployment of the any-scale SR methods.

4.3.1 Laplacian frequency representation

To evaluate the accuracy of the Laplacian Frequency Representation for continuous scale SR. We compare the reconstruction results of the Laplacian Frequency Representation with the directly deployed HR images of 100 scales in the range (1,2].

We first modify EDSR, RDN, and ASDN frameworks into the single predefined upsampling networks and train them with these 100 scales SR samples as EDSR-100, RDN-100 and ASDN-100 to generate HR images of Set5 on the 100 scales. Then we reconstruct the single redefined upsampling EDSR-100 and RDN-100 with 11 parallel IRBs as EDSR-Conv and RDN-Conv, as suggested in Section 4.2.2, trained with the same method and data as ASDN. As shown in Fig. 8a, It is obvious that the Laplacian frequency represented HR images have a similar quality to the direct deployed HR images.

To analyze the influence of the Laplacian pyramid level density on the SR performance, we train ASDN on 5,9,17 evenly distributed upscale decimal ratios in (1,2] with DIV2K, which separates the Laplacian Frequency Representation into 4,8 and 16 phases and names ASDN-4, ASDN-8, and ASDN-16 separately. Figure 8b demonstrates the performance of the three versions of ASDN with scales in (1,2]. In order to make the difference more obvious, we choose some scale ranges in (1,2]. It illustrates that ASDN-4 drops behind ASDN-8 and ASDN-16 commonly, and ASDN-8 and ASDN-16 almost overlap. The results show the Laplacian pyramid level density influences SR performance. To some extent, the model trained with more dense scales achieves better performance, but it saturates beyond a certain point, such as 10 phases. Due to this reason, we can generate HR images of any decimal scale in the range of (1,2] by the several Laplacian pyramid levels in (1,2].

4.3.2 Recursive deployment

In order to investigate the effects of recursive deployment for HR images of larger decimal scales. We mainly demonstrate the comparison of recursive deployment and direct deployment on scales × 2,×3,×4 We trained VDSR-Conv, EDSR-Conv, RDN-Conv, and ASDN with 11 evenly distributed upscale decimal ratios in (1,2] as the recursive models and the HR images are twice upscaled with the upscale ratios $\times \sqrt {2}$, $\times \sqrt {3}$, $\times \sqrt {4}$. To form the fair comparisons, we trained VDSR-Conv, EDSR-Conv, RDN-Conv, and ASDN with × 2,×3,×4 SR images as the direct deployment models. Table 3 illustrates the PSNR of recursive deployment and direct deployment. It is obvious that recursive deployment generally leads to the SR performance decline compared to the direct deployment. But the difference between recursive deployment and direct deployment goes down as the scale goes up. Since the decline is still in an acceptable range and goes gentle as the upscale ratios up, we adopt recursive deployment for SR in higher upscale ratio ranges.

Table 3 PSNR of the recursive deployment and direct deployment on SR for × 2,×3,×4

Full size table

To determine the best solution of recursive times N and upscale ratios r for recursive deployment. We also explore various combinations of N and r to deploy any-scale HR images with different strategies. Table 4 illustrates the performance of the HR images deployed by different strategies with ASDN on Set5. It is obvious that the larger upscale ratio r combined with, the smaller recursive time N will contribute to better performance. Furthermore, choosing the larger upscale ratios in the early recursions can produce better results than using the smaller scales. For these reasons, we recommend choosing $N=\lceil {\log _{2} R}\rceil $ with the largest upscale ratios r = 2 at the early N − 1_th recursions for large scale SR.

Table 4 PSNR of recursive deployment and direct deployment on SR for × 2,×3,×4. Black indicates the best performance

Full size table

4.4 Model analysis

4.4.1 Number of parameters

To demonstrate the compactness of our model, we compare the model performance and network parameters of our model with the existing deep networks for image SR in Fig. 9. Our model shows the trade-off between the parameter demands and performance. Since VDSR, DRRN, LapSRN, and MemNet are all light version networks, they all visibly concede the performance for the model parameter numbers. Therefore ASDN outperforms all the other predefined upsampling methods over 0.5 dB on Set14 for 2 × enlargement. Furthermore, FSDN achieves the best results with a moderate number of parameters compared to all the other upsampling methods.

4.4.2 Abviation study

In this section, we evaluate the influence of different network modules, such as channel attention (CA) in FMB, spatial attention (SA) in IRB, and skip connection (SC) between input and output. To demonstrate the effect of CA in the proposed network structure, we remove the CA from the FMB. In Table 5, we can see when CA is removed, the PSNR value on Set5 (×2) is relatively low compared to the model having CA. To investigate the effect of SA, we remove the SA from the ASDN to compare with the network with SA. SA can improve performance by 0.02 dB or 0.01 dB with or without CA in the models. We further investigate the contribution of SC to the network by comparing the models with or without SC. Adding global skip connections between the network input and output generally improves 0.04 dB on Set5. Generally combining attention modules into the network design, helps the residual high-frequency information reconstruction.

Table 5 Investigation of channel attention (CA), spatial attention (SA), and skip connection (SC). Black indicates the best performance

Full size table

5 Conclusion

In this paper, we propose an any-scale deep network (ASDN) to generate HR images of any scale with one unified network by adopting our proposed any-scale SR method, including Laplacian Frequency Representation for SR of small continuous scale ranges and Recursive Deployment for larger-scale SR. The any-scale SR method helps to reduce the demands of training scale samples and accelerate the network convergence. The extensive comparisons show our ASDN is superior to the most state-of-the-art methods on both fixed-scale and any-scale benchmarks.

References

Agustsson E, Timofte R (2017) Ntire 2017 challenge on single image super-resolution: Dataset and study. In: The IEEE conference on computer vision and pattern recognition (CVPR) workshops, vol 3, p 2
Bevilacqua M, Roumy A, Guillemot C, Alberi-morel ML (2012) Low-complexity single-image super-resolution based on nonnegative neighbor embedding
Burt PJ, Adelson EH (1987) The laplacian pyramid as a compact image code. In: Readings in computer vision, pp 671–679. Elsevier
Dong C, Loy CC, He K, Tang X (2014) Learning a deep convolutional network for image super-resolution. In: European conference on computer vision, pp 184–199. Springer
Gao G, Zhu D, Yang M, Lu H, Yang W, Gao H (2018) Face image super-resolution with pose via nuclear norm regularized structural orthogonal procrustes regression. Neural Comput & Applic, 1–11
Haris M, Shakhnarovich G, Ukita N (2018) Deep backprojection networks for super-resolution. In: Conference on computer vision and pattern recognition
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hu X, Mu H, Zhang X, Wang Z, Tan T, Sun J (2019) Meta-sr: a magnification-arbitrary network for super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1575–1584
Hu Y, Li J, Huang Y, Gao X (2019) Channel-wise and spatial feature modulation network for single image super-resolution. IEEE Transactions on Circuits and Systems for Video Technology
Huang G, Sun Y, Liu Z, Sedra D, Weinberger KQ (2016) Deep networks with stochastic depth. In: European conference on computer vision, pp 646–661. Springer
Huang JB, Singh A, Ahuja N (2015) Single image super-resolution from transformed self-exemplars. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 5197–5206
Kim J, Kwon Lee J, Mu Lee K (2016) Accurate image super-resolution using very deep convolutional networks. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 1646–1654
Lai WS, Huang JB, Ahuja N, Yang MH (2017) Deep laplacian pyramid networks for fast and accurate superresolution. In: IEEE Conference on computer vision and pattern recognition, vol 2, p 5
Lai WS, Huang JB, Ahuja N, Yang MH (2018) Fast and accurate image super-resolution with deep laplacian pyramid networks IEEE transactions on pattern analysis and machine intelligence
Lim B, Son S, Kim H, Nah S, Lee KM (2017) Enhanced deep residual networks for single image super-resolution. In: The IEEE conference on computer vision and pattern recognition (CVPR) workshops, vol 1, p 4
Liu Y, Wang Y, Li N, Cheng X, Zhang Y, Huang Y, Lu G (2018) An attention-based approach for single image super resolution. In: 2018 24Th international conference on pattern recognition (ICPR), pp 2777–2784. IEEE
Lu H, Li Y, Nakashima S, Kim H, Serikawa S (2017) Underwater image super-resolution by descattering and fusion. IEEE Access 5:670–679
Article Google Scholar
Lu T, Wang J, Zhang Y, Wang Z, Jiang J (2019) Satellite image super-resolution via multi-scale residual deep neural network. Remote Sens 11(13):1588
Article Google Scholar
Matsui Y, Ito K, Aramaki Y, Fujimoto A, Ogawa T, Yamasaki T, Aizawa K (2017) Sketch-based manga retrieval using manga109 dataset. Multimed Tools Appl 76(20):21811–21838
Article Google Scholar
Schulter S, Leistner C, Bischof H (2015) Fast and accurate image upscaling with super-resolution forests. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3791–3799
Tai Y, Yang J, Liu X, Xu C (2017) Memnet: a persistent memory network for image restoration. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 4539–4547
Timofte R, De Smet V, Van Gool L (2014) A+: Adjusted anchored neighborhood regression for fast super-resolution. In: Asian conference on computer vision, pp 111–126. Springer
Tong T, Li G, Liu X, Gao Q (2017) Image super-resolution using dense skip connections. In: Computer vision (ICCV), 2017 IEEE international conference on, pp 4809–4817. IEEE
Wang D, Lu H, Yang MH (2015) Robust visual tracking via least soft-threshold squares. IEEE Trans Circuits Sys Vid Techn 26(9):1709–1721
Article Google Scholar
Wang Y, Shen J, Zhang J (2018) Deep bi-dense networks for image super-resolution. In: 2018 Digital image computing: techniques and applications (DICTA), pp 1–8. IEEE
Zeyde R, Elad M, Protter M (2010) On single image scale-up using sparse-representations. International conference on curves and surfaces. 711–730
Zhang K, Zuo W, Zhang L (2018) Learning a single convolutional super-resolution network for multiple degradations. In: IEEE Conference on computer vision and pattern recognition, vol 6
Zhang Y, Li K, Li K, Wang L, Zhong B, Fu Y (2018) Image super-resolution using very deep residual channel attention networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 286–301
Zhang Y, Tian Y, Kong Y, Zhong B, Fu Y (2018) Residual dense network for image super-resolution. In: The IEEE conference on computer vision and pattern recognition (CVPR)

Download references

Author information

Yucheng Wang
Present address: ByteDance, Beijing, China

Authors and Affiliations

University of Technology Sydney, Sydney, Australia
Jialiang Shen, Yucheng Wang & Jian Zhang

Authors

Jialiang Shen
View author publications
You can also search for this author in PubMed Google Scholar
Yucheng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jian Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jialiang Shen.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shen, J., Wang, Y. & Zhang, J. ASDN: A Deep Convolutional Network for Arbitrary Scale Image Super-Resolution. Mobile Netw Appl 26, 13–26 (2021). https://doi.org/10.1007/s11036-020-01720-2

Download citation

Accepted: 30 November 2020
Published: 22 February 2021
Issue Date: February 2021
DOI: https://doi.org/10.1007/s11036-020-01720-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

ASDN: A Deep Convolutional Network for Arbitrary Scale Image Super-Resolution

Abstract

Similar content being viewed by others

Accelerating the Super-Resolution Convolutional Neural Network

Learning a Deep Convolutional Network for Image Super-Resolution

Single Image Super-Resolution Using Lightweight CNN with Maxout Units

Explore related subjects

1 Introduction

2 Related works

2.1 Image super-resolution using CNN

2.2 Laplacian pyramid structure

3 Any-scale image super-resolution

3.1 Any-scale SR method

3.1.1 Laplacian frequency representation

3.1.2 Recursive deployment

3.2 Any-scale SR deep network

3.2.1 Network architecture

4 Experiments

4.1 Implementation details

Network settings

Training details

Testing details

4.2 Comparison with state-of-arts

4.2.1 Quantitative comparison

4.2.2 Any-scale comparison

4.2.3 Qualitative comparison

4.3 Study of any-scale methods

4.3.1 Laplacian frequency representation

4.3.2 Recursive deployment

4.4 Model analysis

4.4.1 Number of parameters

4.4.2 Abviation study

5 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation