PIDray: A Large-Scale X-ray Benchmark for Real-World Prohibited Item Detection

Zhang, Libo; Jiang, Lutao; Ji, Ruyi; Fan, Heng

doi:10.1007/s11263-023-01855-1

PIDray: A Large-Scale X-ray Benchmark for Real-World Prohibited Item Detection

Published: 04 August 2023

Volume 131, pages 3170–3192, (2023)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

International Journal of Computer Vision Aims and scope Submit manuscript

PIDray: A Large-Scale X-ray Benchmark for Real-World Prohibited Item Detection

Download PDF

Libo Zhang^1,2,
Lutao Jiang^1,2,
Ruyi Ji³ &
…
Heng Fan⁴

891 Accesses
3 Citations
1 Altmetric
Explore all metrics

A Correction to this article was published on 17 August 2023

This article has been updated

Abstract

Automatic security inspection relying on computer vision technology is a challenging task in real-world scenarios due to many factors, such as intra-class variance, class imbalance, and occlusion. Most previous methods rarely touch the cases where the prohibited items are deliberately hidden in messy objects because of the scarcity of large-scale datasets, hindering their applications. To address this issue and facilitate related research, we present a large-scale dataset, named PIDray, which covers various cases in real-world scenarios for prohibited item detection, especially for deliberately hidden items. In specific, PIDray collects 124, 486 X-ray images for 12 categories of prohibited items, and each image is manually annotated with careful inspection, which characterizes it, to our best knowledge, with the largest volume and varieties of annotated images with prohibited items to date. Meanwhile, we propose a general divide-and-conquer pipeline to develop baseline algorithms on PIDray. Specifically, we adopt the tree-like structure to suppress the influence of the long-tailed issue in the PIDray dataset, where the first course-grained node is tasked with the binary classification to alleviate the influence of head category, while the subsequent fine-grained node is dedicated to the specific tasks of the tail categories. Based on this simple yet effective scheme, we offer strong task-specific baselines across object detection, instance segmentation, and multi-label classification tasks and verify the generalization ability on common datasets (e.g., COCO and PASCAL VOC). Extensive experiments on PIDray demonstrate that the proposed method performs favorably against current state-of-the-art methods, especially for deliberately hidden items. Our benchmark and codes are available at https://github.com/lutao2021/PIDray.

Handling occlusion in prohibited item detection from X-ray images

Article 21 July 2022

Multi-label X-Ray Imagery Classification via Bottom-Up Attention and Meta Fusion

Dualray: Dual-View X-ray Security Inspection Benchmark and Fusion Detection Framework

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Security inspection is tasked with checking packages against specific criteria and reveals any potential risks to ensure public safety, which is widely applied in real-world scenarios, such as public transportation and sensitive departments. In recent years, a set of surveys provide an in-depth review of developments in this field (Akcay & Breckon, 2022; Mery et al., 2020; Velayudhan et al., 2022). In practice, there is an ever-increasing demand for inspectors to monitor the scanned X-ray images generated by the security inspection machine to specify potentially prohibited items, such as guns, ammunition, explosives, corrosive substances, and toxic and radioactive substances. But unfortunately, it is highly challenging for inspectors to localize prohibited items hidden in messy objects accurately and efficiently, which poses a great threat to safety.

Deep learning technologies have sparked tremendous progress in the computer vision community (Ren et al., 2015; Liu et al., 2016; Tian et al., 2019; Ji et al., 2020b, a; Li et al., 2020; Cai et al., 2020), which makes it possible to inspect prohibited items automatically. The security inspectors demand to quickly identify the locations and categories of prohibited items relying on computer vision technology. Most previous object detection algorithms are well-designed to detect objects in natural images, which are not optimal for detection in X-ray images due to the following factors. Firstly, X-rays have strong penetrating power, and different materials in the object absorb X-rays to different degrees, resulting in different colors. Secondly, the contours of the occluder and the occluded objects in the X-ray are mixed together. As shown in Fig. 1, compared with natural images, X-ray images present a quite different appearance and edges of objects and background, which brings new challenges in appearance modeling for X-ray detection. To advance the developments of prohibited item detection in X-ray images, some recent attempts devote to constructing security inspection benchmarks (Mery et al., 2015; Akcay & Breckon, 2017; Akcay et al., 2018; Miao et al., 2019; Wei et al., 2020). However, most of them fail to meet the requirements in real-world applications for three reasons. (1) Existing datasets (such as GDXray) are characterized by small volume and very few categories of prohibited items (e.g., knife, gun, and scissors). For example, some common prohibited items such as powerbank, lighter, and sprayer are not involved in the previous datasets. (2) Some real-world scenarios require high-level security based on accurate predictions of masks and categories of prohibited items. Most existing benchmarks (for example SIXray and OPIXray) only offer image-level or bounding-box-level annotations, which are not optimal configurations in such scenarios. (3) Detecting prohibited items hidden in messy objects is one of the most significant challenges in security inspection. Unfortunately, few studies have been developed toward this goal due to the lack of comprehensive datasets covering such cases. These challenges urgently require a large-scale prohibited item benchmark and an efficient and effective method.

On the other hand, we observe that the models trained on the dataset, the majority of which are with prohibited items, are error-prone when processing the samples without any prohibited items. We argue that this issue arises from the fact that mainstream training schemes exclude all images without any bounding box by default during the pre-processing stage. Even though no significant influence on general datasets (e.g., COCO (Lin et al., 2014) and PASCAL VOC (Everingham et al., 2010)) where annotated samples account for the majority, this arrangement of dataset incurs a dilemma in security inspection, as the images with prohibited items are just special cases in the real scenario. Such cases reflect the large gap between artificial datasets and real scenarios. Therefore, for a more stable and robust model in security inspection, the construction of the dataset should be in line with the real scenarios as far as possible.

To this end, we present a large-scale prohibited item detection dataset (PIDray) for real-world applications. The PIDray dataset covers 12 categories of prohibited items in X-ray images. From the exemplars with annotations in Fig. 2, we can observe that each image contains at least one prohibited item with both the bounding box and mask annotations. Notably, for the fine-grained investigation, the test set is well-divided into three parts, i.e., easy, hard, and hidden. Particularly, the hidden subset focuses on the prohibited items deliberately hidden in messy objects (e.g., changing the item shape by wrapping wires). To the best of our knowledge, it is a dataset characterized with the largest volume and varieties of annotated images with prohibited items to date for this domain.

Based on the observation that images without prohibited items account for the majority of the proposed dataset, which characterizes the dataset with long-tailed distribution, we propose a divide-and-conquer pipeline, which adopts the tree-like structure to suppress the influence of samples without prohibited items in a course-to-fine manner. Specifically, the sample first passes through the first coarse-grained node to determine whether it contains the prohibited item or not. If true, the sample is fed to the subsequent fine-grained node for the task-specific operations (e.g., detection or segmentation). If not, that means this is a sample without any prohibited items. The key insight of our method is that the distribution of the proposed dataset enables us to cast such a task as a multi-task learning problem. We perform the binary classification in the first node to balance the head and tail categories in a course-grained manner, and then perform task-specific operations in the later node in a fine-grained manner.

For object detection and instance segmentation tasks, we utilize such a pipeline to construct a strong baseline on the top of two- or one-stage detectors like Cascade Mask R-CNN (Cai & Vasconcelos, 2019), where we also contribute to the FPN structure with the dense attention module. Concretely, we first use both the spatial- and channel-wise attention mechanisms to exploit discriminative features, which is effective to locate the deliberately prohibited items hidden in messy objects. Then we design the dependency refinement module to explore the long-range dependencies within the feature map. Extensive experiments on the proposed dataset show that our method performs favorably against the state-of-the-art methods.

Meanwhile, to fully unleash the potential of the proposed dataset, we establish a multi-label classification task for this benchmark. And we extend the divide-and-conquer pipeline to this domain to alleviate the issue of long-tailed distribution. Specifically, the first coarse-grained node is tasked with binary classification, filtering out the head category without prohibited items. After that, the fine-grained node is dedicated to the multi-label classification of the tail categories with prohibited items. The experiment performance demonstrates that our design is a simple yet effective scheme for the multi-label classification task on the PIDray dataset.

To sum up, the main contributions of this work can be summarized into the following five folds.

Towards the prohibited item detection in real-world scenarios, we present a large-scale benchmark, i.e., PIDray, formed by 124, 486 images in total. To the best of our knowledge, it is an X-ray detection dataset with the largest volume and varieties of annotated images with prohibited items to date. Meanwhile, it is the first benchmark dedicated to cases where prohibited items are deliberately hidden in messy objects.
We provide various tasks besides object detection on the proposed PIDray to fully unleash its potential in real-world applications, including segmentation and multi-label classification.
We propose the divide-and-conquer pipeline to address the issue of long-tailed distribution in the PIDray dataset, which adopts the tree-like structure to suppress the influence of samples without prohibited items in a course-to-fine manner.
With our novel divide-and-conquer pipeline, we deliver strong task-specific baselines across object detection, instance segmentation, and multi-label classification tasks on the PIDray dataset and verify its generalization ability on common datasets (e.g., COCO and PASCAL VOC).
Extensive experiments, carried out on the PIDray dataset and general dataset, verify the effectiveness of the proposed methods compared to the state-of-the-art methods.

This paper extends an early conference version in Wang et al. (2021). The main new contributions are as follows. (1) We enlarge PIDray by introducing 76,809 new X-ray images without prohibited items to bridge the gap between the artifact dataset and natural scenarios. (2) We enrich the applications of PIDray by introducing a new task of multi-label classification, which further unleashes the potential of PIDray. (3) We propose a novel divide-and-conquer pipeline for developing strong task-specific baselines on PIDray to facilitate future research. (4) More thorough experiments are conducted on PIDray with in-depth analysis to show the effectiveness of our approach. (5) Besides the experiments on PIDray, we further verify the generalization ability of our pipeline on common benchmarks (e.g., COCO, PASCAL VOC).

The remainder of this paper is organized as follows. Section 2 briefly reviews research directions relevant to our method. In Sect. 3, we describe the construction of the PIDray dataset in detail. In Sects. 4 and 5, the task-specific strong baselines are realized under the guidance of the divide-and-conquer pipeline. In Sect. 6, the extensive experimental results are reported and analyzed, including comparing the proposed method and the state-of-the-art approaches, validation of generalization ability on the general datasets, and the comprehensive analysis of the ablation studies. Finally, Sect. 7 draws the conclusions of the proposed method.

2 Related Work

This section reviews six major research directions closely related to our work, i.e., prohibited item benchmarks, object detection, the attention mechanism, multi-label classification, multi-scale feature fusion, and long-tailed distribution issues.

2.1 Prohibited Items Benchmarks

Due to discrepancies of penetrating capability, different materials tend to present various colors under X-ray. Such property incurs more challenges in cases where objects are overlapped. Moreover, like natural images, X-ray images are featured of notorious characteristics, e.g., intra-class variances and distribution imbalance. Recently, a few datasets have been collected to advance prohibited item detection investigations. To be concrete, Mery et al. (2015) collects the GDXray dataset for nondestructive testing. GDXray is formed by three prohibited items: gun, shuriken and razor blade. Without complex background and overlap, it is easy to recognize or detect objects in this dataset. Differing from GDXray, Dbf$_6$ (Akcay & Breckon, 2017), Dbf$_3$ (Akcay et al., 2018) and OPIXray (Wei et al., 2020) cover complicated background and overlapping-data, but unfortunately, the volumes of images and prohibited items are still insufficient. Recently, Liu et al. (2019a) releases a dataset containing 32, 253 X-ray images, of which 12, 683 images include prohibited items. This dataset has 6 types of items, but none are strictly prohibited, such as mobile phones, umbrellas, computers, and keys. Miao et al. (2019) provides a large-scale security inspection dataset called SIXray, which covers 1, 059, 231 X-ray images with image-level annotation. However, the proportion of images containing prohibited items is very small in the dataset (i.e., only $ 0.84\% $). In addition, there are 6 categories of prohibited items, but only 5 categories are annotated. Unlike the aforementioned datasets, we propose a new large-scale security inspection benchmark that contains over 47K images with prohibited items and 12 categories of prohibited items with pixel-level annotation. Towards real-world application, we focus on detecting deliberately hidden prohibited items.

2.2 Object Detection

Object detection is a long-standing problem in the computer vision community. Generally speaking, modern object detectors fall into two groups: two-stage and one-stage detectors.

2.2.1 Two-stage Detectors

R-CNN (Girshick et al., 2014) exemplifies the first research line and proves that CNN can dramatically improve detection performance. However, it is time-consuming since each regional proposal is computed individually in this pipeline. To close this gap, Fast-RCNN (Girshick, 2015) utilizes the ROI pooling layer to extract fixed-size features for each proposal from the feature map of the full image. Therein, the selective search, which is utilized to generate proposal regions, becomes a major bottleneck. The follower Faster RCNN (Ren et al., 2015) introduces an efficient way to replace selective search, which designs the RPN network and derives numerous variants. For example, FPN (Lin et al., 2017a) assembles low-resolution features with high-resolution features through a top-down pathway and lateral connections, which greatly improves the detection ability at different sizes. Mask R-CNN (He et al., 2017) attaches a mask branch to the Faster-RCNN (Ren et al., 2015) structure to improve detection performance via a multi-task learning scheme. Cascade R-CNN (Cai & Vasconcelos, 2018) introduces the classic cascade structure into Faster R-CNN (Ren et al., 2015) framework, which consistently improves detection accuracy in a cascade manner. Libra R-CNN (Pang et al., 219) develops a simple yet effective strategy to alleviate the issue of data imbalance in the training process.

Table 1 Comparison of the dataset statistics with existing X-ray benchmarks

PIDray: A Large-Scale X-ray Benchmark for Real-World Prohibited Item Detection

Abstract

Similar content being viewed by others

Handling occlusion in prohibited item detection from X-ray images

Multi-label X-Ray Imagery Classification via Bottom-Up Attention and Meta Fusion

Dualray: Dual-View X-ray Security Inspection Benchmark and Fusion Detection Framework

Explore related subjects

1 Introduction

2 Related Work

2.1 Prohibited Items Benchmarks

2.2 Object Detection

2.2.1 Two-stage Detectors

2.2.2 One-stage Detectors

2.3 Attention Mechanism

2.4 Multi-Label Classification

2.5 Multi-Scale Feature Fusion

2.6 Long-Tailed Distribution Issue

3 Dataset PIDray

3.1 Dataset Collection

3.2 Data Annotation

3.3 Data Statistics

4 Methodology of Object Detection and Instance Segmentation

4.1 Binary Classification on the Node of \(\mathcal {R}_0 \)

4.2 Detection/Segmentation on the Node of \(\mathcal {R}_1 \)

5 Methodology of Multi-Label Classification

5.1 Binary Classification on the Node \( \mathcal {R}_0 \)

5.2 Multi-label Classification on the Node \(\mathcal {R}_1 \)

5.3 Loss Function

6 Experiments

6.1 Implementation Details

6.2 Evaluation Metrics

6.3 Overall Evaluation

6.4 Evaluation on Other Security Inspection Benchmarks

6.4.1 Evaluation on OPIXray V2

6.4.2 Evaluation on SIXray

6.4.3 Problems of GDXray

6.4.4 Problems of COMPASS-XP

6.5 Evaluation on General Dataset

6.6 The Generality of Divide-and-Conquer Pipeline

6.7 Ablation Study

6.7.1 Necessity of Samples Without Prohibited Items

6.7.2 Effect of the Divide-and-Conquer Pipeline

6.7.3 Contributions of the Fine-Grained Node to Object Detection/ Instance Segmentation

6.7.4 Contributions of the Fine-Grained Node to Multi-Label Classification Task

7 Conclusion

Data Availability

Change history

17 August 2023

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation