1 Introduction

In recent years, deep learning technique has been used for data analysis in the fields such as computer vision [1, 2] and natural language processing [3, 4], achieving unprecedented success. Meanwhile, the advantages that Convolutional Neural Networks (CNNs) show in image classification [5,6,7], detection [8, 9] and segmentation [10, 11] attract more research interests. To improve the fitting ability of the models, the structure of the neural networks becomes deeper, and the number of parameters is increasing. On the one hand, the accuracy of the neural networks for the image recognition task is improved. On the other hand, there is a growing demand for computing and storage resources [12, 13]. The overhead of advanced Graphical Processing Unit (GPU) equipment for training and reasoning limits the usage of complex models, let alone resource-constrained edge devices, such as portable mobile devices and wearable devices. Previous studies [14,15,16] have revealed that there is much redundancy in deep networks. Although these redundant parameters ensure the uniformity of model structures, such excessive parameters increase the space and time complexity of the networks, which brings more negative impact on application than positive impact on accuracy. Therefore, various network compression methods have been developed, compromising the tradeoffs between redundant parameters, running time of the models, and accuracy.

Fig. 1
figure 1

Framework of TopologyHole. The left column shows the normal pre-trained model inference process. Each group of convolutional layer and ReLU activation layer will generate a set of feature maps. In the middle column, we calculate each set of feature maps according to the definition of TopologyHole, sort the feature maps generated in the same layer, and remove the filters with high TopologyHole. In the right column, we fine-tune the pruned model to recover the influence of pruning on the accuracy

Fig. 2
figure 2

Average TopologyHole of feature maps from different convolutional layers and architectures on CIFAR-10. Specifically, the subtitles of the subgraph describe the indexes of feature maps to which the extracted TopologyHole belongs, mapped by ReLU after the convolutional layer of the models. The abscissa of the subgraph represents the indices of the feature maps of the current layer, corresponding to the indices of filters of the convolutional layer simultaneously. The ordinate is the batches of training images (each batch size is set to 128). Different colors represent different TopologyHole numerical sizes

Generally, the network compression methods include parameter quantization [15, 17, 18], knowledge distillation [19,20,21], low-rank approximation [22,23,24], compact model design [25, 26], network pruning [12, 27] and etc. Quantization maps the network weights to a smaller range of values and storage bits. Knowledge distillation utilizes the knowledge of the teacher network and transfers it to a compact distillation model. The low-rank approximation uses matrix or tensor decomposition techniques to decompose the original convolution filters. Moreover, the compact model design aims to develop a specially structured convolution kernel or compact convolution computing unit to reduce the computational complexity of the model.

Network pruning is well employed on mainstream hardware. It aims at cutting off the redundant weights or filters of CNNs, along with their associated computations, and generating more compact subnetworks [15, 27, 28]. In contrast to weight-level pruning, filter-level pruning results in models with structured sparsity and only needs the Basic Linear Algebra Subprograms (BLAS) library to improve performance, which is more flexible in practical implementation. In this paper, our research focuses on structured filter pruning. Empirically, we categorize the existing structured filter pruning into three groups: property-based pruning, imposed scaling factor-based pruning, and propagation-based pruning. Compared with the other two groups of methods, the calculation of property-based pruning is more straightforward with no need for additional constraints and factors, so the mainstream filter pruning methods [27, 29,30,31] mainly belong to this group.

However, most property-based pruning methods are denoted to establishing the connection between the weights of the convolutional layers and the importance of filters and prune filters through the smaller-norm-less-important criterion. This criterion requires that the deviation of filter norms should be significant and the norms of filters to be pruned are expected to be absolutely small, i.e., close to zero. These two prerequisites greatly limit the effectiveness of norm-based methods. In this case, we develop a feature map-based pruning method. The feature map is generated after the convolutional layer and ReLU layer. It is a one-to-one correspondence with the filter of the convolutional layer. In our opinion, the information of the feature maps better represents the importance of the information contained in the corresponding filters than other intrinsic properties of the pre-trained model, such as the weights of the convolutional layers. By analyzing APoZ [30], the most popular pruning method based on feature maps, we find that simply calculating the percentage of zero activations of the feature map cannot sufficiently represent the importance of the filter corresponding to the feature map. We attribute this problem to its data-driven scheme and heavily relying on the input distribution, which is unstable on effect and not conducive to deployment. Instead, a data-independent feature map-based filter pruning scheme is more convincing.

To explore the inherent characteristics of feature maps, we introduce the concept of geometric topology. Topology is the study of properties of geometric figures or spaces that remain the same after continuous changes in shape. Genus is a scalar used in topology to distinguish between topological spaces. In this paper, we propose the TopologyHole method, which defines a new metric referring to the genus to evaluate the information of the feature map, as shown in Fig. 1. We find that according to the definition of TopologyHole, the mean value of TopologyHole of the feature map generated by the same filter is relatively stable no matter how the data for CNN is distributed, as shown in Fig. 2. This phenomenon implies that TopologyHole represents implicit information of the feature maps, which can be used as the criterion for pruning by filters. Based on this, the filters generating feature maps with high TopologyHole can be removed because we confirm that feature maps with high TopologyHole have greater sparsity and more redundant information. We have proved that high-TopologyHole feature maps have a low contribution to the model prediction accuracy through many experiments.

We conducted many pruning experiments with different models, including VGG-16 [13], GoogLeNet [6] and ResNet [5], on two benchmark datasets, CIFAR-10 [32] and ImageNet [33]. The results show that our TopologyHole leads the state-of-the-art performance better than the existing filter pruning methods. We also carry out experiments to show that the filters generating feature maps with high TopologyHole can be removed more reasonably than those with low TopologyHole.

To summarize, our main contributions are:

  • We define a novel metric, TopologyHole, to describe the amount of information contained in the feature map. As a data-independent metric, TopologyHole is friendly to deployment because it does not depend on the input data distribution.

  • Compared with the most popular feature map-based pruning method, APoZ, our TopologyHole takes the position relation of zero activations in the feature map into account, which is more convincing than APoZ.

  • Through a large number of experiments, we confirm that feature maps with high Topology are less critical to preserving accuracy. Thus, the filters generating these feature maps can be removed first.

  • Extensive experiments demonstrate the efficacy of our TopologyHole in making tradeoffs between model complexity and accuracy. Our proposed method achieves better performances with similar compression rates in network compression over the various state-of-the-art [29, 31, 34,35,36,37,38].

The rest of the paper is organized as follows. Section 2 summarizes related works for network compression and classifies them into three groups. Section 3 describes the details of the proposed pruning method. The experimental settings and results are illustrated and discussed in Sect. 4. In addition, Sect. 5 gives a conclusion.

2 Related work

Filter pruning is a structural pruning method that discards the whole filters in current convolutional layers. The core of the filter pruning method lies on how to quantify the importance of each filter in the network. Based on the features where filter selection strategy extracts evaluation metrics, we categorize the existing filter pruning approaches into the following three groups:

  1. (1)

    Property-based pruning: The filters are pruned by a specific property of the pre-trained model. Li et al. [27] calculated the \(L_1\)-norm of each filter in the convolutional layers and believed that filters with small norms could be pruned first because they were less important. He et al. [29] calculated the geometric median of the filters within the same layer and pruned filters closest to each other. Lin et al. [38] encoded the second-order information of pre-trained weights, which enabled the representation capacity of pruned networks to be recovered with a simple finetuning procedure. Hu et al. [30] pruned filters corresponding to the feature maps mapped by ReLU with a high percentage of zero activations. Lin et al. [31] evaluated the rank of feature maps and pruned filters generating low-rank feature maps.

  2. (2)

    Imposed scaling factor-based pruning: Huang et al. first introduced imposed scale factor to scale the outputs of specific model structures [34]. Lin et al. [37] introduced the idea of generative adversarial learning and removed the basic structures, including channels, branches, or blocks, by the sparsification of the soft mask. Tian et al. [39] introduced a trainable collaborative layer to prune and learn neural networks in one go jointly.

  3. (3)

    Propagation-based pruning: Luo et al. [35] utilized the input of the next layer of the convolution layer to guide pruning the output channel of the current layer. Yu et al. [36] used the proposed propagation algorithm to push the importance scores of final responses back layer by layer and pruned the filters with the least importance. Molchanov et al. [40] defined the filter importance as the change of loss caused by removing a specific filter from the network and used a Taylor expansion to approximate it. Lian et al. [41] introduced an evolutionary algorithm into the process of searching for the most suitable number of pruned filters for every layer.

Discussion The imposed scaling factor-based pruning methods inevitably introduce additional computation, and the propagation-based pruning methods are always data-dependent, whose utilizing of training data is computationally intensive. Therefore, we focus on property-based pruning methods. Among the property-based methods, traditional smaller-norm-less-important criterion [27] is not convincing that two prerequisites limit its usage (mentioned in Sect. 1). Considering that the feature maps correspond to the convolutional layer filters, some researchers have begun to explore the possibility of pruning according to the feature maps. To the best of our knowledge, APoZ [30] was the most popular method to prune the network through feature maps. We highlight our advantages compared to this approach as follows: (1) based upon extensive statistical validations, we empirically demonstrate that the mean value of TopologyHole of feature map generated by the same filter is relatively stable. Therefore, our TopologyHole is a data-independent metric that is friendly to deployment because it does not depend on the input data distribution, unlike APoZ; (2) our TopologyHole considers the percentage of zero activations of the feature map like APoZ and the position relation of zero activations; (3) the experiments demonstrate our TopologyHole is more effective than APoZ with more FLOPs reduction on CIFAR-10 shown in Table 4.

The following section of the paper introduces the mathematical definition of TopologyHole and its efficacy in filter pruning for modern advanced CNN models on two benchmark datasets, CIFAR-10 and ImageNet.

3 Methodology

3.1 Preliminaries

We assume a pre-trained CNN network has L convolutional layers. Note that a convolutional layer here contains pooling, batch normalization and ReLU activation. We use the thop library to calculate the FLOPs and the amount of parameters of the network. The deep CNN network can be parameterized by \({W_{L^i}} = \left\{ {w_1^i,w_2^i,\ldots ,w_{{N_{i + 1}}}^i} \right\} \in {{\mathbb{R}}^{{N_{{i} + 1}} \times {N_i} \times {K_i} \times {K_i}}}\), where i denotes the i-th convolutional layer, \({{N_{{i} + 1}}}\) is the number of filters as well as \({{N_{i}}}\) is the number of input channel in \({{L^i}}\), and \({K_i}\) represents the kernel size. When the image data is fed into the model, each filter \(w_j^i \in {{\mathbb{R}}^{{N_i} \times {K_i} \times {K_i}}}\) outputs a feature map \(o_j^i \in {{\mathbb{R}}^{{h_i} \times {w_i}}},1 \le j \le {N_{{i} + 1}}\), where \({h_i}\) and \({w_i}\) are the height and width of the feature map, respectively.

In the pruning phase, we first determine the number of feature maps to be removed by the number of output channels \({N_{i + 1}}\) of the current layer and pruning rate \({P_i}\). Then, rank the feature maps according to the importance of the defined evaluation strategy, and remove the corresponding least important \({N_{i + 1}} \times {P_i}\) filters. At the same time, the input tensor of the \({L^{i + 1}}\) layer is also smaller, further reducing the FLOPs and the number of parameters.

3.2 TopologyHole

The difference between network pruning methods for manually determining pruning rate lies in the criteria for judging unimportant filters of each layer. Compared with other pruning methods, property-based pruning is more focused on a specific property of the pre-trained model, e.g. percentage of zero activations or rank of feature maps or \(L_p\)-norm for filters, and prunes the corresponding filters with less importance.

Fig. 3
figure 3

TopologyHole Counting Standard and Equivalent Topology Structure. The upper and lower feature maps are extracted from the same (Conv+ReLU) layer output by the neural network. They have different weight distributions, but according to our TopologyHole counting standard, they have the same number of Topologyholes, which means they are equivalent to the right-center topology structure with two TopologyHoles

We define TopologyHole to measure the information richness of feature maps:

$$\begin{aligned} S_{j}^i\left( I \right) = \frac{1}{{\text {batch}}}\sum _{{\text {num}} = 1}^{{\text {batch}}} {\mathbf{TopologyHole}} \left( o_j^i\right) _{{\text {num}}}, \end{aligned}$$
(1)

where \({\mathbf{TopologyHole}}\left( \cdot \right)\) is the TopologyHole of a feature map for input data I. \(S_{j}^i\) is the measurement (i.e., the number of TopologyHole) for the j-th filter of the i-th convolutional layer and batch is the number of input images.

We confirm that the filter with a larger TopologyHole contains less information. Specifically, the count of TopologyHole conforms to the following criteria:

  • In the feature map matrix of \({{h_i} \times {w_i}}\) size output by the \({L^i}\) convolutional layer, the feature cell with weight of 0 will be counted as one TopologyHole as shown in Fig. 3-\(\textcircled{1}\).

  • If many feature cells with a weight of 0 are connected, they will be collectively counted as one TopologyHole, and individual cells are not repeatedly counted, as shown in Fig. 3-\(\textcircled{2}\). Note that diagonals touching cells are not considered connected.

  • If any of the connected feature cells whose weights are all 0 are located in the outermost circle of the feature map matrix, the number of TopologyHole is not considered for the connected cells as a whole, as shown in Fig. 3-\(\textcircled{3}\).

Therefore, we conclude that there are two TopologyHoles for the two feature maps in Fig. 3 by counting TopologyHole using the above three principles.

3.3 Tractability

As we mentioned above, we find that according to the definition of TopologyHole, the mean value of TopologyHole of the feature map generated by the same filter is relatively stable no matter how the data given to CNNs is distributed, as shown in Fig. 2. Therefore, we can use a small batch of image data to calculate the average TopologyHole of each feature map.

figure a
figure b

To illustrate that pruning using TopologyHole is very effective, we summarize the TopologyHole counting method and TopologyHole pruning process in Algorithm 1 and Algorithm 2, respectively.

In Algorithm 1, we input the feature map \(o_j^i\) of the j-th filter of \({L^i}\) into the TopologyHole counter. The next phase is to traverse each pixel of \(o_j^i\). If its pixel value is not 0, it skips and detects the next pixel. On the contrary, if its pixel value is 0, the current TopologyHole count will be reassigned to it, and p and q are used to store the current pixel position m and n, respectively. When the pixel value of the four points connected by pixel point (pq) is 0, the pixel value is repeatedly assigned as the current TopologyHole count value. Simultaneously, extended detection and assignment are carried out to the connected pixels of pixels with pixel value 0 until all connected pixels with the same value of 0 are reassigned with the same TopologyHole count value, and then the TopologyHole count will be updated, and the next pixel is detected as above.

The details of the TopologyHole pruning procedure are illustratively explained in Algorithm 2. Before selecting pruning filters, the training data X and the pruning rate \(P_i\) of each layer should be input into the pre-trained full model. Then, the TopologyHole of each filter in the same convolutional layer is calculated as \(S_j^ I\) by Eq. 1 Algorithm 1. Sort \(\left\{ {\left. {S_{j}^i,1 \le j \le {N_{i + 1}}} \right\} } \right.\) and remove the \({N_{i + 1}} \times {P_i}\) filters with highest TopologyHole. Thus, a pruning model inheriting the weight of the full pre-trained model is obtained. The precision loss caused by pruning will be recovered by retraining the pruned model with an appropriate number of epochs, and the desired pruned model is finally obtained.

4 Experiments

4.1 Experimental settings

Models and dataset To illustrate that our TopologyHole pruning method compresses and accelerates the model effectively for both small and large datasets, we choose CIFAR-10 [32], and ImageNet [33] as two benchmarks. Meanwhile, we compare the pruning performance of our method with that of state-of-the-art (SOTA) on several mainstream models, including VGG-16, ResNet-56/110, GoogLeNet on CIFAR-10, ResNet-50 on ImageNet.

Implementation details We use Pytorch 1.7.1 to implement our TopologyHole filter pruning method under Inter(R) Core(TM) i7-9700K CPU 3.60 GHz and two NVIDIA RTX 3090 with 24 GB for GPU processing. Our TopologyHole method is a type of one-shot pruning approach, and we adopt the Stochastic Gradient Descent algorithm (SGD) with an initial learning rate of 0.01 and 0.1, respectively, for CIFAR-10 and ImageNet as the optimization function. For CIFAR-10, the pruned model is fine-tuned by 300 epochs (reducing learning rate in the 150-th and 225-th epoch) for ResNet-56/110 and GoogLeNet, and 150 epochs (reducing learning rate in the 50-th and 100-th epoch) for VGG-16. The batch size, momentum, and weight decay for the above four model architectures are set to 256, 0.9, and 0.005, respectively. For ImageNet, the pruned model is fine-tuned by 90 epochs reducing the learning rate in the 30-th, 60-th and 80-th epoch. The batch size and momentum of the ImageNet are the same as those of CIFAR-10, except for weight decay set to 0.0001. For a fair and accurate comparison, we utilize the built-in function thop.profile to calculate FLOPs and Params.

Evaluation metrics For a fair comparison with other algorithms, we measure the top-1 accuracy of the pruned model on CIFAR-10 and top-1 and top-5 accuracy on ImageNet. On the other hand, we also calculate the Floating point operations (FLOPs), the number of parameters (Params), and the pruning rate (PR) relative to the full model, to evaluate the pruning effectiveness of the different pruning criteria. FLOPs represents the computational speed of the model, i.e., the time complexity of the network, and Params represents the size of the model, i.e., the space complexity of the network.

4.2 Comparison on CIFAR-10

To prove that our proposed TopologyHole filter pruning method is suitable for pruning different model architectures and shows excellent performance, compression experiments are carried out on several models, including ResNet-56/110, VGG-16, and GoogLeNet. We report the top-1 accuracy of the model pruned by each method. Different pruning methods adopt different FLOPs baselines and Params baselines. If the baselines in the original papers are different from those in this paper, their specific values in the tables will be replaced by N/A, while the drop percentages (i.e. PR, pruning rate) are retained. If the papers to which the methods belong did not have the same evaluation metrics as ours, the vacant metrics are also replaced by N/A. The results of our proposed method are in bold. All results are presented in Tables 1, 2, 3 and 4 on the CIFAR-10 dataset. M/B means million/billion, respectively, and the entries are sorted according to FLOPs (PR)

Table 1 Results of ResNet-56 on CIFAR-10

ResNet-56 Results for ResNet-56 are presented in Table 1. Our TopologyHole removes around 47.4\(\%\) FLOPs and 42.8\(\%\) parameters while obtaining the top-1 accuracy at 93.76\(\%\). Compared to 93.26\(\%\) by the original full model, the accuracy is improved by 0.50\(\%\). Compared with other property-based pruning methods, HRank (2020) [31] and FilterSketch (2021) [38], TopologyHole shows an overwhelming superiority.

Table 2 Results of ResNet-110 on CIFAR-10
Table 3 Results of VGG-16 on CIFAR-10
Table 4 Results of GoogLeNet on CIFAR-10
Table 5 Results of ResNet-50 on ImageNet

ResNet-110 Table 2 shows that TopologyHole outperforms the SOTAs in both accuracy and time complexity reduction (i.e., FLOPs reduction). Specifically, with 71.6\(\%\) FLOPs reduction and 68.3\(\%\) parameters reduction which are much more than those of FilterSketch and HRank, TopologyHole achieves 93.59\(\%\) top-1 accuracy, 0.15\(\%\) better than FilterSketch and 0.23\(\%\) better than HRank. Compared with other SOTAs, like FPGM (2019) [29], TopologyHole with more FLOPs reduction (59.6\(\%\) of TopologyHole vs. 52.3\(\%\) of FPGM) obtains 94.12\(\%\) top-1 accuracy, 0.38\(\%\) better than that of FPGM.

VGG-16 Table 3 displays the pruning results of VGG-16. TopologyHole reduces the FLOPs of VGG-16 by 58.1\(\%\) and the parameters by 81.6\(\%\) while obtaining the top-1 accuracy at 93.93\(\%\), only losing 0.03\(\%\) accuracy relative to the original full model. TopologyHole significantly outperforms other SOTAs.

GoogLeNet For GoogLeNet, as shown in Table 4. TopologyHole removes 73.7\(\%\) FLOPs and 65.9\(\%\) parameters with a negligible accuracy drop (94.89\(\%\) for TopologyHole vs. 95.05\(\%\) for the baseline). It is significantly better than APoZ (2016) [30] and HRank, which are both pruning based on feature maps like our TopologyHole. Especially for APoZ, TopologyHole achieves better performance among both accuracy and pruning rate (94.89\(\%\) by TopologyHole vs. 92.11\(\%\) by APoZ for accuracy, 73.79\(\%\) by TopologyHole vs. 50.0\(\%\) by APoZ for FLOPs, 65.9\(\%\) by TopologyHole vs. 53.7\(\%\) by APoZ for Params). It proves that although both our TopologyHole and APoZ involve zero activations of feature maps, the TopologyHole counting criterion is completely different from APoZ (see the details in Sect. 3.2). The TopologyHole of feature maps can better serve as a discriminative property for identifying the redundant filters.

4.3 Comparison on ImageNet

We also explore the performance of our proposed TopologyHole filter pruning method on the ImageNet dataset with ResNet-50, a popular CNN. Comparison of pruning ResNet-50 on ImageNet by our TopologyHole and other effective pruning criteria are shown in Table 5. We adopt top-1 and top-5 accuracy, FLOPs, and parameters reduction as evaluation metrics. We also report the accuracy gap between pruned model and the original full model. For FLOPs and parameters, if the baselines in the methods original papers are different from those in this paper, their specific values in the table will be replaced by N/A, while the drop percentages (i.e., PR, pruning rate) are retained. If the papers to which the methods belong did not have the same evaluation metrics as ours, the vacant metrics are also replaced by N/A. The results of our proposed method are in bold. M/B means million/billion, respectively, and the entries are sorted according to FLOPs (PR). Original performance of full ResNet-50 on ImageNet is 76.15\(\%\) of top-1 accuracy and 92.87\(\%\) of top-5 accuracy with 4.11 billion FLOPs and 25.55 million of Params. Compared with other pruning criteria, our TopologyHole performs better in all aspects. Specifically, with 45.0\(\%\) FLOPs and 40.9\(\%\) Params reduction, our TopologyHole pruning method achieves 75.71\(\%\) top-1 accuracy and 92.66\(\%\) top-5 accuracy while compressing 1.82\(\times\) of FLOPs and 1.69\(\times\) of Params.

Fig. 4
figure 4

Comparisons between accuracy and FLOPs (top) and accuracy and remained Parames (bottom) with baseline, four network architectures (ResNet-56, ResNet-110, VGG-16, GoogLeNet) pruned by seven SOTAs, high TopologyHole and low TopologyHole on the CIFAR-10 dataset. Top-left has the better performance

4.4 Ablation study

We conduct additional ablation studies to analyze further the effectiveness and universality of the TopologyHole filter pruning method.

Filter selection criteria We conduct some experiments on ResNet-56/110 and VGG-16 to demonstrate that it is more appropriate to prune off the filters with highest TopologyHole than lowest TopologyHole. The TopologyHole pruning variants we are comparing include (1) high TopologyHole, i.e., pruning filters with the highest TopologyHole, which is the principle followed by our TopologyHole filter pruning method; (2) low TopologyHole, i.e., pruning filters with lowest TopologyHole. With the same pruning rate of FLOPs and Params, the performance of the model obtained by high TopologyHole pruning is better than that obtained by low-TopologyHole pruning, as shown in Fig. 4. To highlight the effectiveness of the TopologyHole, we also add seven SOTAs. Among them, APoZ (2016) and HRank (2020) are feature map-based methods like our TopologyHole, while FPGM (2019) and FilterSketch (2021) are other property-based methods. To compare with the other two groups’ structural pruning methods, we also select the latest works to compare (GAL (2019) and PBT (2021) for imposed scaling factor-based pruning method, SST (2021) for propagation-based pruning method).

Varying pruning rates To comprehensively understand the TopologyHole filter pruning method, we test the accuracy of different filter pruning rates for ResNet-56 and the accuracies of the pruned models w.r.t. the variety of filter pruning rates are shown in Fig. 5. With the filter pruning rate increasing, the accuracy of the TopologyHole model first rises above the baseline model and then drops approximately linearly. It should be noted that when we were drawing this line chart, we set the filter pruning rate of each layer of the model to be the same, which is obviously not the optimal choice for the pruning rate of each layer, so the accuracies of pruned models are not comparable to that shown in Table 1.

Fig. 5
figure 5

Accuracies and FLOPs/parameters of the pruned ResNet-56 by TopologyHole w.r.t. the variety of filter pruning rates on CIFAR-10

5 Conclusion

Inspired by geometric topology, this paper proposes a novel network pruning method called TopologyHole filter pruning. We predefine a measurement, called TopologyHole, used to describe the feature map and associate it with the importance of the corresponding filter. First of all, we find out that the average TopologyHole of the feature map for the same filter is relatively stable, which means TopologyHole is valid as a metric for filter pruning. Then, through a large number of experiments, we have demonstrated that priorly pruning the filters with high-TopologyHole feature maps achieves competitive performance compared to the state-of-the-art. Specifically, with 45.0\(\%\) FLOPs and 40.9\(\%\) parameters reduction, our TopologyHole pruning method achieves 75.71\(\%\) top-1 accuracy, only 0.44\(\%\) drop according to the original full model, and 92.66\(\%\) top-5 accuracy while compressing 1.82\(\times\) of time complexity and 1.69\(\times\) of space complexity. Therefore, our proposed TopologyHole filter pruning shows excellent results in reducing the time and space complexity of the network and in reducing the loss of precision of the model after pruning.