CorrNet: pearson correlation based pruning for efficient convolutional neural networks

Kumar, Aakash; Yin, Baoqun; Shaikh, Ali Muhammad; Ali, Munawar; Wei, Wenyue

doi:10.1007/s13042-022-01624-5

CorrNet: pearson correlation based pruning for efficient convolutional neural networks

Original Article
Published: 03 August 2022

Volume 13, pages 3773–3783, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

CorrNet: pearson correlation based pruning for efficient convolutional neural networks

Download PDF

Aakash Kumar ORCID: orcid.org/0000-0002-6339-855X¹,
Baoqun Yin¹,
Ali Muhammad Shaikh¹,
Munawar Ali¹ &
…
Wenyue Wei¹

705 Accesses
6 Citations
1 Altmetric
Explore all metrics

Abstract

Convolutional neural networks (CNNs) are quickly evolving, which usually results in a surge of computational cost and model size. In this article, we present a correlation-based filter pruning (CFP) approach to train more reliable CNN models. Unlike several available filter pruning methodologies, our presented approach eliminates useless filters according to the volume of information available in their related feature maps. We apply correlation to compute the duplication of information carried in the feature maps and created a feature selection scheme to obtain pruning approaches. Pruning and fine-tuning are cycled many times, producing slim and denser networks with similar accuracy to the original unpruned model. We practically calculate the success of our technique with various state-of-art CNN models on many standard datasets. Specifically, for ResNet-50 on ImageNet, our approach eliminates 44.6% filter weights and saves 51.6% Float-Point-Operations (FLOPs) with 0.5% accuracy gain and obtained state-of-art performance.

Using Feature Entropy to Guide Filter Pruning for Efficient Convolutional Networks

APSSF: Adaptive CNN Pruning Based on Structural Similarity of Filters

Article Open access 23 May 2024

Complex hybrid weighted pruning method for accelerating convolutional neural networks

Article Open access 06 March 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In this paper, we describe the method of filter pruning for the efficient CNNs, can be achieved based upon correlation amongst the feature maps which are generated from corresponding filter. This sort of pruning benefits the model in many ways, it removes the redundant feature maps to reduce the model size, lower computational cost along with save number of FLOPs. Alternatively, in past few years, researcher have achieved sophisticated performance utilizing CNNs in several demanding tasks in different areas namely image recognition [1, 2], web search [3], speech recognition [4], NLP. Nonetheless, their nature of being computationally intensive, as CNN networks have gotten deeper, the memory footprint, power hungry, and necessity of floating-point operations (FLOPs) have also dramatically increased. Thus, reason behind this growth is the increment in parameter count and also convolution operations. The networks with higher capacity tend to posses substantial inference costs specifically for the resource constrained devices namely mobile devices or the embedded sensors because these sorts of devices have limited computational and also power resources.

A substantial amount of research has been carried out to compress huge CNN networks or learn further efficient CNNs models directly. To mitigate the conflict of higher resource requirement of CNNs, researchers have presented numerous approaches regarding to compressing and accelerating CNNs in different models along with not having noticeable loss in accuracy. Among researches extensively utilized technique is pruning for network compression. This technique further classified into two sub categories namely weight pruning [5,6,7] other one is filter pruning [8,9,10]. However, weight pruning creates unstructured scarcities because it eliminates parameters directly in the filter [40]. This behavior results in unstructured memory access and that effects the overall efficiency. While filter pruning eliminating filters as a whole and leaves a structured model. Being a simple technique yet it is quite demanding because eliminating parameters from one specific layer can results in considerable changes in input of the subsequent layer. Thus, filter pruning efficient in contrast with weight pruning [25], which helps decrease size of model, minimizes computational cost and saves the number of FLOPs, which is the key focus of our proposed work.

Additionally, filter pruning methods involve some measure to compute filter importance in general. The number of past researches [2, 13] calculate the valuation of the lters by scale factors or L1-norm along with neglecting the volume of information contained by the feature maps. Moreover, in [7, 10] the authors showed that given that the output of a large number of feature maps by the middle layers of CNNs are most of the zeros or zero matrices, which exposes that not all the lters in the architecture are valuable.

Furthermore, we created a correlation-based feature selector (CSF), which will boost the efficiency of pruning through negate the setting of pruning rate layer to layer. The approach of pruning filters between multiple layers provides a general scenario of the effectiveness of the model resultant in a slimmer architecture. Further, we examine the behavior of CFS under several conditions and present that CFS can recognize efficient feature maps and eliminate less important feature maps. However, the data with the superior correlation amongst two variables replicating information which is similar to a loss in information.

Subsequently, numerous metrics can be employed to analyze the correlation amongst two random variables, for instance, correlation metric, range, standard deviation, and also mean. However, the requirement of the feature map selection algorithm based on statistical importance test is to consider that the features are mutually independent, but actually, there is always a definite correlation between the features [45]. Therefore, in this paper we utilized the Pearson correlation-based [11] technique to calculate correlation amongst two feature maps, where higher the correlation amongst two feature maps means greater the replication of information amongst two feature maps ought to be. Further, one feature map will be eliminated from the specific filter as described in the Fig. 1. Correlation performs a major part in data science as the selection of features and helps to nd eective features according to the correlation.

2 Related work

In general, challenge of compressing CNNs has been extensively considered in literature, although interpretability is not an impelling aspect in large number of theses studies. In the conventional way to compress CNNs, weights, but not filters are actually pruned and also quantized which commonly termed as “weight compression”. Few of these approaches are optimal brain surgeon [14], Deep compression [15], optimal brain damage [16], and the recent one is SqueezeNet [17].

The primary goal of filter pruning is basically to acquire an estimation regarding to importance of filters and then eliminate unimportant filters [2, 18, 19]. Besides, thereafter at every pruning step, it essentially required re-training to cover the decline in accuracy. The authors [20] estimated the importance of the filter over a subset of training data which is based upon output feature map. In [14] they performed pruning based upon a greedy technique which assessed filter importance through verifying accuracy of the model right after pruning filter. However, feature maps or in other words pruning activation is utilized in [21] with the aim of acquiring more faster CNNs. This technique can also be percept as eliminating filters in input on a particular location, but those filters most of the time be left in other locations which hardly ever results in any sort of filter compression. Thus, in our work, CFS selects less important lters based on the correlation value between successive corresponding feature maps. We apply Pearson correlation to calculate the correlation value between two feature maps. The CFS approach is dierent from other approaches and can tackle their weakness such as ignorance of spatial information in feature maps since we try to ip the feature maps by row and measure their spatial information with correlation to evaluate the usefulness of the corresponding lters. After we create a feature selection module to extract the output of each filter and calculate their correlation weights. These modules are put between each two successive feature maps of the filter of the pre-trained model, as illustrated in the Fig. 2. Those filters whose corresponding two feature maps correlation are the higher value will be pruned one of them. After the pruning, we fine-tune the pruned model to recover performance and can even obtain greater accuracy in several scenarios. Lastly, the pruning and fine-tuning procedure are iterative for few times to achieve a slimmer model. Moreover, we also study the relationship between the pruning ratio and the number of filters to highlight the sharing of information in every convolutional layer of a CNN model.

To prune a fully trained network is fruitful in several ways in contrast to training a network from scratch containing lesser filters. Because there are many uncertainties such as architecture selection and how many filters are required to start with. Even though to tackle that issue many hyper-parameters optimization approaches presented [22,23,24], large number of feasible architecture along with filters that results in lofty computational cost into a conjugational way as in more model choice issues [25]. Additionally, contemporary outcomes recommend that since the CNNs with large-scale, the pruned network accuracy corresponds to faintly higher in contradiction to a certain network trained from scratch [10], for ResNet and also VGG. In contrary to larger-scale, small-scale CNNs, to train a particular network from scratch is most likely to accomplish similar accuracy considering pruned one. In ample amount of application regarding to transfer learning which based upon well trained networks, the algorithms can accomplish much higher accuracies in contrast to training from scratch provided same number of filters and architecture [21].

Many of researches focused on reducing complexity of multiplication as well as parameters. In [41] authors used Strassen Algorithm with the help of Pan's modification to reduce number of multiplications in CNNs. On the other hand, Li [42] presented Structured channel weight sharing to compress (SCWC) which uses distributive property to reduce number of multiplications in CNNs. The authors in [43] proposed two densely linked CNNs and named them DenseDsc and Dense2Net. In DenseDsc, the reliable depth-wise separable convolution is applied to enhance the performance. For the Dense2Net, author applied group convolution to enhance the parameters performance.

Besides, a model presented in [26] familiarized sparsity to the model parameters and with that they also required support of sparse libraries to achieve anticipated compression results. Similarly, mentioned technique provides an inadequate compression rate on Total run time memory (TRM) and FLOPs. But these particular techniques deliver a better compression rate regarding to weights storage, with limited FLOPs. whereas pruning approach in [27] presented for filter importance possesses specific constraints as necessities that are not generally met. So, models sustain the redundancy which are compressed by these approaches since these approaches do not contemplate filter redundancy during pruning.

As another option, approaches based upon weight quantization [28, 29] have been utilized in previous researches for compression of models. In [30] performed compression relied upon float value quantization for the purpose of model storage. But in [7] conducted compressions though Binarization in which every float value is quantized up to binary value. However, researchers have also utilized Bayesian approaches considering network quantization. On contrary our proposed technique removes unimportant feature maps to help decrease the size of the model along with save the number of FLOPs and lower the computational cost.

We perform our approach on various famous datasets and different CNN models. For VGG16 on CIFAR100, we obtain approx. 59.6% of parameter pruning and 46.4% reduction of FLOPs with 0.17% accuracy loss. The model with less redundancy such as ResNet-50, we also approx. 44.6% parameter elimination, without loss of accuracy. In the coming sections, we will provide brief details of our correlation-based filter pruning approach.

3 Methodology

Initially, we present how to determine the correlation scores of two feature maps. After, our feature maps selection and filter pruning schemes are given. Lastly, the discussion of computational cost compression.

3.1 Calculate correlation scores of the two successive feature maps

The number of past researches [2, 10, 13] calculate the valuation of the filters through scale factors or L1-norm and neglect the volume of information contained by the feature maps. Some past studies [7, 10] have given that the output of a large number of feature maps by the middle layers of CNNs are most of the zeros or zero matrices, which exposes that not all the filters in the architecture are valuable.

To find the usefulness of filters, we apply correlation to calculate the information in feature maps. Correlation performs a major part in data science as the selection of features and helps to find effective features according to the correlation score. Considering that the output of dissimilar convolutional layers has a substantial difference in the amount of information, therefore, the feature maps weights are calculated in every layer individually. Particularly, to overcome the conditional outcome of a single image, we arbitrarily choose a large number of images from the training set to compute the average correlation weights of filters.

Suppose $H_i$ is the height of the output feature maps and $W_i$ is the width, $m_i$ denoted as the number of filters in the $i-th$ convolutional layer, where one filter produces one feature map. Further, N shows the randomly selected images input into the model, value of N directly impact on the memory of the system, greater the value of N, more it consumes memory of the system. Moreover, It is worth noted that the value of N is same for all considered datasets. The lowest scores of final accuracy are obtained from N of 16, 32 , 50, and 64. The highest results of final accuracy are demonstrated by the N of 256 and 512 instances. The N of 100, 128, 150, and 200 instances show the average outcomes of final accuracy. Therefore, the higher the value of N, greater the final accuracy we will get, additionally, much larger value can also effect on the final accuracy as shown in Fig. 3 (The figure only highlights ImageNet dataset for VGG16 model). Let $X_{i, j}^{(n)}$ set as $j-t h$ matrix of output feature map of the layer i for the $n-t h$ image, and $Y_{i, k}^{(n)}$ is $k-t h$ matrix of output feature map of the same layer i for $n-t h$ image, and both the matrix converted into a vector of feature maps given below:

$$\begin{aligned} \hat{X}_{i, j}^{(n)}= & {} \left( x_{i, j, 1}^{(n)}, x_{i, j, 2}^{(n)}, \ldots , x_{i, j, L_{i}}^{(n)}\right) \end{aligned}$$

(1)

$$\begin{aligned} \hat{Y}_{i, k}^{(n)}= & {} \left( y_{i, k, 1}^{(n)}, y_{i, k, 2}^{(n)}, \ldots , y_{i, k, L_{i}}^{(n)}\right) \end{aligned}$$

(2)

Where $L_{i}=H_{i} \times W_{i}$. Finally, we will normalize both $\widehat{X}_{i, j}^{(n)}, \hat{Y}_{i, k}^{(n)}$ using the following equations.

$$\begin{aligned} p_{i, j, l}^{(n)}= & {} \frac{x_{i, j, l}^{(n)}-\min _{l}\left\{ x_{i, j, l}^{(n)}\right\} }{\max _{l}\left\{ x_{i, j, l}^{(n)}\right\} -\min _{l}\left\{ x_{i, j, l}^{(n)}\right\} } \end{aligned}$$

(3)

$$\begin{aligned} P_{i, j}^{(n)}= & {} \left( p_{i, j, 1}^{(n)}, p_{i, j, 2}^{(n)}, \ldots , p_{i, j, L_{i}}^{(n)}\right) \end{aligned}$$

(4)

For $\hat{Y}_{i, k}^{(n)}$

$$\begin{aligned} p_{i, k, l}^{(n)}= & {} \frac{y_{i, k, l}^{(n)}-\min _{l}\left\{ y_{i, k, l}^{(n)}\right\} }{\max _{l}\left\{ y_{i, k, l}^{(n)}\right\} -\min _{l}\left\{ y_{i, k, l}^{(n)}\right\} } \end{aligned}$$

(5)

$$\begin{aligned} P_{i, k}^{(n)}= & {} \left( p_{i, k, 1}^{(n)}, p_{i, k, 2}^{(n)}, \ldots , p_{i, k, L_{i}}^{(n)}\right) \end{aligned}$$

(6)

After, for the $n-t h$ image of $i-t h$ convolutional layer, the Pearson correlation of both $j-t h$ and $k-t h$ vector of feature maps are given as:

$$\begin{aligned} \rho \left( (P_{i, j}^{(n)}), (P_{i, k}^{(n)})\right) =\frac{\sum _{l=1}^{L_{i}}\left( p_{i, j}^{(n)}-\overline{p_{t, j}^{(n)}}\right) \left( p_{i, k}^{(n)}-\overline{p_{l, k}^{(n)}}\right) }{\sqrt{\sum _{l=1}^{L_{i}}\left( p_{i, j}^{(n)}-\overline{p_{t, j}^{(n)}}\right) ^{2}\left( p_{i, k}^{(n)}-\overline{p_{l, k}^{(n)}}\right) ^{2}}} \end{aligned}$$

(7)

Whereas $p_{i, j}^{(n)}$ is the mean of $P_{i, j}^{(n)}$ and $p_{i, k}^{(n)}$ is the mean of $P_{i, k}^{(n)} .$ The value of $\rho \left( (P_{i, j}^{(n)}), (P_{i, k}^{(n)})\right)$ range from $-1$ to 1 . If the value of $\rho \left( (P_{i, j}^{(n)}), (P_{i, k}^{(n)})\right)$ is 0 then the $P_{i, j}^{(n)}, P_{i, k}^{(n)}$ both are independent, 1 otherwise. Feature maps that contain a greater correlation value are considered redundant feature maps, therefore, only those feature maps are selected, which contains the low redundancy between successive feature maps. The lowest Pearson’s correlation scores considering neighboring feature maps are forwarded to the selected feature map set. After will get the best feature maps and discarding the redundant feature maps. This process will continue layer to layer until will get all the effective feature maps from all layers. The algorithm 1 shows the steps based on our introduced method.

3.2 Strategies for filter pruning

In the process to recognize the unimportant filters from a pre-trained network, the CFS is created and placed between every two successive convolutional layers of the network. As illustrated in the Fig. 2, the output of the $i-th$ convolutional layer is put into the correlation weights module to calculate the correlation score of each two consecutive feature maps through the algorithm explained in the above section. The higher correlation score shows there are duplications of information in these feature maps and one of them should be removed, further, the corresponding filter in the $i-th$ convolutional layer is less valuable. After we can eliminate the feature maps with duplicate information by pruning all their incoming and outgoing connections. With that, all the unimportant filters of the $i-th$ layer and feature maps for input to the next layer are eliminated, along with the corresponding channels of every filter in the next layer.

3.2.1 Setting up pruning threshold

It is important to set the pruning threshold according to the Symmetrical Uncertainty(SU) coefficient in each convolutional layer. Initially, Symmetrical Uncertainty score of each feature map in each layer are placed in ascending order. After they are gathered from the largest values until the set pruning ratio is surpassed. The last superficial value is applied as the threshold of the corresponding layer, and all the feature maps and concerned filters whose Symmetrical Uncertainty score are higher than the threshold will be eliminated. Finally, we achieved a slimmer architecture with fewer parameters, less run time memory, and storage.

3.2.2 Pruning by iteration and fine-tune process

After the pruning, we might experience some loss of accuracy temporarily, however, it can be greatly overcome through the following process of fine-tuning. In some cases, we can even obtain a greater accuracy than the baseline model. For the entire architecture pruning, past researches generally prune and fine-tune the filter layer to layer, or retrain the model after every pruning and fine-tuning process. Further, we reduce and fine-tune the model iteratively and there is no requirement to retrain the model from scratch again. This algorithm used as a prototype on LeNet shows that this is efficient. After few epochs, we can obtain a greater compression and even achieve a better classification accuracy.

Table 1 VGG16 Pruning for CIFAR10 dataset

Full size table

3.2.3 Strategy to adjust residual models (ResNet)

The presented filter pruning approach can be effortlessly used on normal CNN models e.g. AlexNet and VGGNet. But, some modification schemes are needed when it is applied to reduce complex models as they have cross-layer connections for example ResNet. In ResNet, the output of the last convolutional layer and the identity mapping have a similar number of feature maps and similar in size, which creates hurdles to reduce them. As illustrated in the Fig. 4, our CFS are inserted between the first and second convolutional layers. In the bottleneck block of the third convolutional layer, we only eliminate the channels of every filter to make them compatible with the input feature maps and do not delete the number of filters, because the output of that have to align the identity maps and as we know that 1 x 1 filters have fewer parameters.

Table 2 Pruning results of the VGG-16 model performed on CIFAR-10/100

Full size table

4 Computational complexity analysis and compression

As highlighted in Sect. 3.1, the input of the $i-th$ convolutional layer is $H_{i-1} \times W_{i-1} \times m_{i-1}$ matrix of feature maps and generates a $H_{i} \times W_{i} \times m_{i}$ matrix, in which $m_{i}$ and $m_{i-1}$ the total number of feature maps. Let’s suppose, parameterization of the $i-th$ convolutional layer is $T_{i} \times T_{i} \times m_{i} \times m_{i-1}$, whereas $T_i$ denote the spatial dimension of each filter. Generally, convolutions possess the computational complexity of $T_{i} \times T_{i} \times m_{i} \times m_{i-1} \times H_{i} \times W_{i}$. Now we set the pruning ratio of the $i-th$ layer as $p_i$ and the corresponding filter pruning rate is denoted as $\widehat{p}_{t}$. So that the $i-th$ layer total filters will be decreased from $m_{i}$ to $m_{i}\left( 1-p_{i}\right)$ and filters channels in the same layer are decreased from $m_{i-1}$ to $m_{i-1}\left( 1-p_{i-1}\right)$. Finally, we can obtain compression rate in computational complexity for this reduced layer as follows:

$$\begin{aligned} \begin{aligned}&1-\frac{T_{i} \times T_{i} \times m_{i-1}\left( 1-\hat{p}_{i-1}\right) \times m_{i}\left( 1-\hat{p}_{i}\right) \times H_{i} \times W_{i}}{T_{i} \times T_{i} \times m_{i-1} \times m_{i} \times H_{i} \times W_{i}} \\&=1-\left( 1-\hat{p}_{i}\right) \left( 1-\hat{p}_{i-1}\right) \end{aligned} \end{aligned}$$

(8)

5 Experiments

We check our introduced CFS on various state-of-art models and datasets, such as CIFAR10 and CIFAR100 datasets used on the VGG16 model, on the other hand, we used the CIFAR10 dataset on ResNet50 and ResNet56 models. All the experiments are done with Tensorflow and Keras framework on NVIDIA Tesla P100 GPU using Google Colab Pro version.

5.1 Implementation settings

Initially, the architectures are trained from scratch to compute the classification accuracies and considered them as baselines. Before training, all the images are pre-processed before being fed into the model: all the images are randomly cropped into 32 x 32 size, and the horizontal flip is used with padding set to four. The mini-batch size is set to 100 for training and 1000 for test in the VGGNet model, on the other hand, the mini-batch size is set to 64 for train and 256 for test in the ResNet models. All the networks are trained and fine-tuned applying Stochastics Gradient Descent (SGD) for 160 iterations on both datasets. Further, we set the initial learning rate to 0.1, and after 50% and 75% iterations, we set the learning rate to 0.01 and 0.001 respectively. To avoid overfitting, we also applied weight initialization.

5.2 VGGNet pruning

According to [1], the convolutional layers of VGGNet are having different information attentiveness and robustness, therefore, we set separate pruning rates for every convolutional layer. Based on the architecture of VGGNet, the layers in the network are divided into three groups, (1) layers contain 512 filters, (2) layers contain 256 filters, and (3) layers consist of less than 128 filters. Further, we set different pruning rates p by each group, such as 1.5p, p, and 0.5p respectively. Finally, we prune the VGG16 network iteratively with r = 10%. Our introduced CFS obtains better results than the approaches for filter pruning. After only a second epoch, the CFS approach can prune approx. 50% parameter and even obtain accuracy gain of 0.30%. After two more epochs, the saving of parameters can be approx. 90% and elimination of FLOPs can be 52.4% with a gain in accuracy of 0.05%. The Table 1 shows layer-wise pruning of parameters, FLOP saving and elimination of activation maps. Further, detailed illustration of results is given in the Table 2a.

We continue our training for VGG16 with the CIFAR100 dataset. The setting has been similar during the experiment process. This time our model can obtain a reduction of FLOPs approx. 47% in just three epochs as shown in the Table 2b. As compare to the CIFAR10, the pruning ratio is not greater, this is because the CIFAR100 has 10 times more classes and it requires more features to perform the classification task. Further, we trained VGG16 model with the ImageNet dataset, our method achieves 35.67% parameter pruning with 34.2% of FLOPs saved only on third epoch.

5.3 ResNet pruning

We have used two architectures of the ResNet family: ResNet50 and ResNet56 with the structure of bottleneck are used to check the performance of the presented algorithm. As mentioned in the summary of ResNet architectures, the ReLU and Batch-Norm layers are inserted before every convolutional layer in bottleneck blocks. Since there are skip connections available in ResNet blocks and the useful information can be divided with the entire architecture, thus, we apply a similar pruning ratio for all layers for pruning. Furthermore, there are fewer redundant parameters in ResNet because the information is shared across the model via skip connections, therefore, we have to prune all ResNet models in single-shot pruning.

Firstly, we prune ResNet50 model using the ImageNet dataset. We can achieve nearly equal accuracy comparing to baseline architecture with approx. 40.5% parameters elimination and reduction of FLOPS approx. 48.2% with the pruning ratio is set to 20%. Further, for setting up pruning ratio to 30%, our CFS can also eliminate approx. 44.6% parameters with 51.6% FLOPs reduction, and 0.56% loss of accuracy as shown in Table 3a. It can be seen that the ResNet with bottleneck design contains less redundancy in calculations and parameters, unlike VGGNet.

Table 3 Pruning results in ResNet models performed on CIFAR-10 and ImageNet

Full size table

We continue to apply our CFS approach to a deeper ResNet56 network for the CIFAR10 dataset, we enable the same settings as with ResNet50. Our algorithm can outclass the baseline model with a reduction of approx. 25.9% parameters and approx. 30.2% FLOPs with a gain in accuracy of 0.20%. when we try to increase the pruning ratio to 30%, our approach obtains about 52.2% of parameter elimination and saving about 55.6% of FLOPs with the only degradation of 0.31% accuracy as illustrated in Table 3b. Furthermore, we try our CFS algorithm on ResNet18 on ImageNet datasets, and our CFS approach prunes approx. 42.31% of pruning with 30.66% FLOPs saved as given in Table 3c.

6 Discussion

To widely analyze the effectiveness of our presented algorithm on the architecture, we evaluate the architecture accuracy of the different pruning rates and compression rates. As illustrated in the Figs. 5, 6, and 7 at the start the classification accuracy of pruned architecture increase over the baseline network and decreases as we increase the pruning rate. While the pruning rate is less than 25%, nearly 50% of the parameters are eliminated, and our approach obtains no loss of accuracy and sometimes it achieves marginally accuracy enhancement. This shows that our algorithm CFS can eliminate unimportant information and enhance the efficient set of features. This discussion only highlights the ResNet model.

7 Conclusion

In this article, we present an easy and efficient approach, which calculates the effectiveness of convolutional filters on the bassis of duplication possess in the two successive feature maps generated through these filters. Our approach proposes correlation to compute the redundancy brought by the feature maps and measure the usefulness of related convolutional filters. A correlation feature selector is created to designed pruning schemes. To overcome the dimensionolity mismatch problem in the ResNet model in course of pruning, new pruning schemes are presented. Furthermore, this article also highlights the sharing of information in each convolutional layer with results reflects that filters in many layers contribute limited to the final accuracy of the model. Advanced experiments reflect the advantage of our method compared to the presently available approaches. Finally, for deeper and complex model ResNet56 on the CIFAR10 dataset, our presented approach can eliminate 52.2% of parameters along with 55.6% FLOPs reduced with 0.09% of accuracy gain, and this shows the greatness of the proposed algorithm.

References

Hosseini B, Montagne R, Hammer B (2020) Deep-aligned convolutional neural network for skeleton-based action recognition and segmentation. Data Sci Eng 5(2):126–139
Article Google Scholar
He Y, Zhang X, Sun J (2017) Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE international conference on computer vision (pp. 1389-1397)
Nguyen PQ, Do T, Nguyen-Thi A, Ngo TD, Le D, Nguyen TH (2016) “Clustering web video search results with convolutional neural networks,” 2016 3rd National Foundation for Science and Technology Development Conference on Information and Computer Science (NICS), pp. 135-140, https://doi.org/10.1109/NICS.2016.7725638.
Dahl GE, Yu D, Deng L, Acero A (2011) Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans Audio Speech Lang Process 20(1):30–42
Article Google Scholar
Han S, Pool J, Tran J, Dall, WJ (2015) Learning both weights and connections for efficient neural networks. arXiv preprint arXiv:1506.02626
Carreira-Perpinán MA, Idelbayev Y (2018) “learning-compression” algorithms for neural net pruning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 8532-8541)
Liu B, Wang M, Foroosh H, Tappen M, Pensky M (2015) Sparse convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 806-814)
Yu R, Li A, Chen CF, Lai JH, Morariu VI, Han X, Gao M, Lin CY, Davis LS (2018) Nisp: Pruning networks using neuron importance score propagation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 9194-9203)
He Y, Ding Y, Liu P, Zhu L, Zhang H, Yang Y (2020) Learning filter pruning criteria for deep convolutional neural networks acceleration. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2009-2018)
Li H, Kadav A, Durdanovic I, Samet H, Graf HP (2016) Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710
Press WH, Teukolsky SA, Flannery BP, Vetterling WT (1992) Numerical recipes in Fortran 77: volume 1, volume 1 of Fortran numerical recipes: the art of scientific computing. Cambridge University Press
MATH Google Scholar
Liu Z, Li J, Shen Z, Huang G, Yan S, Zhang C (2017) Learning efficient convolutional networks through network slimming. In Proceedings of the IEEE international conference on computer vision (pp. 2736-2744)
Abbasi-Asl R, Yu B (2017) Structural compression of convolutional neural networks. arXiv preprint arXiv:1705.07356
Han S, Mao H, Dally WJ (2015) Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149
LeCun Y, Denker JS, Solla SA (1990) Optimal brain damage. In: Advances in neural information processing systems 2 (NIPS 1990), vol 2, pp 598-605
Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. arXiv preprint arXiv:1602.07360
Hu H, Peng R, Tai YW, Tang CK (2016) Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. arXiv preprint arXiv:1607.03250
Molchanov P, Tyree S, Karras T, Aila T, Kautz J (2016) Pruning convolutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440
Jaderberg M, Dalibard V, Osindero S, Czarnecki WM, Donahue J, Razavi A, Vinyals O, Green T, Dunning I, Simonyan K, Fernando C (2017) Population based training of neural networks. arXiv preprint arXiv:1711.09846
Fernando C, Banarse D, Blundell C, Zwols Y, Ha D, Rusu AA, Pritzel A, Wierstra D (2017) Pathnet: Evolution channels gradient descent in super neural networks. arXiv preprint arXiv:1701.08734
Li L, Jamieson K, DeSalvo G, Rostamizadeh A, Talwalkar A (2017) Hyperband: a novel bandit-based approach to hyperparameter optimization. J Mach Learn Res 18(1):6765–6816
MathSciNet MATH Google Scholar
Reed R (1993) Pruning algorithms-a survey. IEEE Trans Neural Netw 4(5):740–747
Article Google Scholar
Chen W, Wilson J, Tyree S, Weinberger K, Chen Y (2015) Compressing neural networks with the hashing trick. In International conference on machine learning (pp. 2285-2294). PMLR
He Y, Liu P, Wang Z, Hu Z, Yang Y (2019) Filter pruning via geometric median for deep convolutional neural networks acceleration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 4340-4349)
Dubey A, Chatterjee M, Ahuja N (2018) Coreset-based neural network compression. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 454-470)
Tung F, Mori G (2018) Clip-q: Deep network compression learning by in-parallel pruning-quantization. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7873-7882)
Miao H, Li A, Davis LS, Deshpande A (2017) Towards unified data and lifecycle management for deep learning. In 2017 IEEE 33rd International Conference on Data Engineering (ICDE) (pp. 571-582). IEEE
Rastegari M, Ordonez V, Redmon J, Farhadi A (2016) Xnor-net: Imagenet classification using binary convolutional neural networks. In European conference on computer vision (pp. 525-542). Springer, Cham
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778)
Li Y, Wang L, Peng S, Kumar A, Yin B (2019) Using feature entropy to guide filter pruning for efficient convolutional networks. In International Conference on Artificial Neural Networks (pp. 263-274)
Aketi SA, Roy S, Raghunathan A, Roy K (2020) Gradual channel pruning while training using feature relevance scores for convolutional neural networks. IEEE Access 8:171924–171932
Article Google Scholar
Kumar A, Shaikh AM, Li Y et al (2021) Pruning filters with L1-norm and capped L1-norm for CNN compression. Appl Intell 51:1152–1160. https://doi.org/10.1007/s10489-020-01894-y
Article Google Scholar
Luo JH, Wu J, Lin W (2017) Thinet: A filter level pruning method for deep neural network compression. In Proceedings of the IEEE international conference on computer vision (pp. 5058-5066)
Chen T, Lin L, Zuo W, Luo X, Zhang L (2018) Learning a wavelet-like auto-encoder to accelerate deep neural networks. In Thirty-Second AAAI Conference on Artificial Intelligence
Wang H, Zhang Q, Wang Y, Hu H (2017) Structured probabilistic pruning for convolutional neural network acceleration. arXiv preprint arXiv:1709.06994
Singh P, Verma VK, Rai P, Namboodiri V (2020) Leveraging filter correlations for deep model compression. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 835-844)
He Y, Kang G, Dong X, Fu Y, Yang Y (2018) Soft filter pruning for accelerating deep convolutional neural networks. arXiv preprint arXiv:1808.06866
Ali M, Yin B, Kunar A, Sheikh AM, Bilal H (2020) “Reduction of Multiplications in Convolutional Neural Networks,” 2020 39th Chinese Control Conference (CCC), pp. 7406-7411, https://doi.org/10.23919/CCC50068.2020.9188843
Li Guoqing, Zhang Meng, Wang Jiuyang, Weng Dongpeng, Corporaal Henk (2022) SCWC: structured channel weight sharing to compress convolutional neural networks. Inf Sci 587:82–96
Article Google Scholar
Li Guoqing, Zhang Meng, Li Jiaojie, Lv Feng, Tong Guodong (2021) Efficient densely connected convolutional neural networks. Pattern Recognit 109:107610
Article Google Scholar
Zhaoyi Yan, Xing Peiyin, Wang Yaowei, Tian Yonghong (2020) “Prune it yourself: Automated pruning by multiple level sensitivity.” In 2020 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), pp. 73-78. IEEE
Pengtian Chen, Li Fei, Wu Chunwang (2021) “Research on intrusion detection method based on Pearson correlation coefficient feature selection algorithm.” In Journal of Physics: Conference Series, vol. 1757(1):012054. IOP Publishing

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China under grant number 62133013 and sponsored by CAAI-Huawei MindSpore Open Fund. Chinese Academy of Sciences (CAS) and The World Academy of Sciences (TWAS) are highly acknowledged for the funds making this study possible.

Author information

Authors and Affiliations

Department of Automation, University of Science and Technology of China, Hefei, 230026, People’s Republic of China
Aakash Kumar, Baoqun Yin, Ali Muhammad Shaikh, Munawar Ali & Wenyue Wei

Authors

Aakash Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Baoqun Yin
View author publications
You can also search for this author in PubMed Google Scholar
Ali Muhammad Shaikh
View author publications
You can also search for this author in PubMed Google Scholar
Munawar Ali
View author publications
You can also search for this author in PubMed Google Scholar
Wenyue Wei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aakash Kumar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kumar, A., Yin, B., Shaikh, A.M. et al. CorrNet: pearson correlation based pruning for efficient convolutional neural networks. Int. J. Mach. Learn. & Cyber. 13, 3773–3783 (2022). https://doi.org/10.1007/s13042-022-01624-5

Download citation

Received: 18 December 2021
Accepted: 22 July 2022
Published: 03 August 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s13042-022-01624-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

CorrNet: pearson correlation based pruning for efficient convolutional neural networks

Abstract

Similar content being viewed by others

Using Feature Entropy to Guide Filter Pruning for Efficient Convolutional Networks

APSSF: Adaptive CNN Pruning Based on Structural Similarity of Filters

Complex hybrid weighted pruning method for accelerating convolutional neural networks

1 Introduction

2 Related work

3 Methodology