Keywords

1 Introduction

Bird strike accidents cause not only financial debts but also human casualties. According to Federal Aviation Administration (FAA)Footnote 1, from 1990 to 2019, there have been more than 220 thousand wildlife strikes with civil aircraft in USA alone and 97% of all strikes involve birds. An estimated economic loss could be as high as $500 million per year. Furthermore, more than 200 human fatalities and 300 injuries attributed to bird strikes. Bird strikes happen most near or at airports during takeoff, landing and associated phrases. About 61% of bird strikes with civil aircraft occur during landing phases of flight (descent, approach and landing roll). 36% occur during takeoff run and climbFootnote 2. It is the airspace that the airport should be responsible for so that the prevention of bird strikes is one of the most significant safety concerns. Although various systems are designed for preventing bird strikes, accidents keep occurring with increasing commercial activities and flights. Improving the performances of bird strike prevention systems remains a research challenge. One fundamental limitation to the performances is the lack of large-scale data collected at real-world airports. On the one hand, real-world airports have strict rules on security and privacy regarding camera system deployment. On the other hand, it is inevitably expensive to develop a large-scale dataset that involves a series of time-consuming and laborious tasks.

Fig. 1.
figure 1

(a) The deployment of a network of cameras in a real-world airport (b) A bird example in AirBirds (c) Examples of birds in CUB [25, 27], Birdsnap [1], NABirds [24] and CIFAR10 [10].

Existing relevant datasets are either small in size or not dedicated for bird strike prevention. The wildlife strike database created by FAA provides valuable information, while each record in this database only contains a few fields in text form, such as date and time, aircraft and airport information, environment conditions, lacking informative pictures and videos. The relevant dataset developed by Yoshihashi et al aims at preventing birds from hitting the blades of turbines in a wind farm [29], rather than in real-world airports, and its size is less than one seventh of ours. Well-known datasets like ImageNet [5], COCO [14], VOC [6], CIFAR [10] collects millions of common objects and animals, including birds, but they are developed for the research of general image recognition, object detection and segmentation. Another branch of datasets, such as CUB series [25, 27], Birdsnap [1] and NABirds [24] containing hundreds of bird species, focus on fine-grained categorization and part localization. And the size of these datasets is less than 50% of ours. One of the most significant differences between the above-mentioned datasets and ours is that birds in previous datasets are carefully selected and tailored, which means they are often centered in the image, occupy the main part of an image and have clear outlines, referring to Fig. 1c.

However, it is unlikely that birds in the images captured in real-world airports have these idealized characteristics. The deployment of a network of cameras around a runway in a real-world airport is shown in Fig. 1a. Each camera is responsible for monitoring an area of hundreds of meters so that flying birds that appear are tiny in size even in a high-resolution image. For example, in our dataset, the average size of all annotated birds is smaller than 10 pixels in the 1920\(\times \)1080 images, taking up only \(\sim \)0.5% of the image width, shown in Fig. 1b.

To advance the research and practical solutions for bird strike prevention, we collaborate with a real-world airport for two years and finally present AirBirds, a large-scale challenging dataset consisting of 118,312 time-series images with 1920\(\times \)1080 resolution and 409,967 bounding box annotations of flying birds. The images are extracted from videos recorded by a network of cameras over one year, from September 2020 to August 2021, thus cover various bird species in different seasons. Diverse scenarios are also included in AirBirds, e.g., changing lighting and 13 meteorological conditions. Planning, deployment and joint commissioning of the monitoring system last for one year. The data collection process takes another whole year and subsequent cleaning, labeling, sorting and experimental analysis consume parallel 12 months. To the best of our knowledge, AirBirds is the first large-scale challenging image dataset that collects flying birds in real airports for bird strike prevention. The core contributions of this paper are summarized as follows.

  • A large-scale dataset, namely AirBirds, that consists of 118,312 time-series images with 1920\(\times \)1080 resolution containing flying birds in real-world airports is publicly presented, where there exist 409,967 instances with carefully manual bounding box annotations. The dataset covers various kinds of birds in 4 different seasons and diverse scenarios that include day and night, 13 meteorological and lighting conditions, e.g., overcast, sunny, cloudy, rainy, windy, haze, etc.

  • To reflect significant differences with other relevant datasets, we make comprehensive statistics on AirBirds and compare it with relevant datasets. There are three appealing features. (i) The images in AirBirds are dedicatedly taken from a real-world airport, which provide rare first-hand sources for the research of bird strike prevention. (ii) Abundant bird instances in different seasons and changing scenarios are also covered by AirBirds as the data collection spans a full year. (iii) The distribution of AirBirds is distinctive with existing datasets since 88% of instances are smaller than 10 pixels, and the remaining 12% are more than 10 and less than 50 pixels in 1920\(\times \)1080 images.

  • To understand the difficulty of AirBirds, a wide range of strong baselines are evaluated on this dataset for bird discovering. Specifically, 16 detectors are trained from scratch based on AirBirds with careful configurations and parameter optimization. The consistently unsatisfactory results reveal the non-trivial challenges of bird discovering and bird strike prevention in real-world airports, which deserve further investigation.

As far as we know, bird strike prevention remains a open research problem since it is not well solved by existing technologies. We believe AirBirds will benefit the researchers, facilitate the research field and push the boundary of practical solutions in real-world airports.

2 Related Work

In this section, we review the datasets that are either closely relevant to bird strike prevention or contain transferable information to this topic.

Table 1. Comparisons of AirBirds and relevant datasets. Density is the average instances in each image. Duration refers to the period of data collection.

FAA Wildlife Strike Database. One of the most relevant datasets is the Wildlife Strike DatabaseFootnote 3 maintained by FAA. This database contains more than 220K records of reported wildlife strikes since 1990 and 97% of strikes attribute to birds. The detailed descriptions for each incident can be divided into the following parts: bird species, date and time, airport information, aircraft information, environment conditions, etc. An obvious limitation is the contents in this database are mainly in text form, lacking informative pictures and videos.

Bird Dataset of a Wind Farm. Yoshihashi et al. develop this dataset for preventing birds striking the blades of the turbines in a wind farm [29]. 32,000 birds and 4,900 non-birds are annotated in total to conduct experiments of a two-class categorization. It is similar to us that the ratio of bird size and the image size is extremely small. However, compared to AirBirds’ data collection process spanning a whole year, this dataset collects images only for 3 days so that the number of samples and scenarios are much less than those of AirBirds.

Bird Datasets with Multiple Species. Bird species probably provide valuable information for bird strike prevention. Another branch of the relevant datasets, such as CUB series [25, 27], Birdsnap [1], NABirds [24] and VB100 [7], focuses on fine-grained categorization of bird species. Images in these datasets are mainly collected from public sources, e.g., FlickrFootnote 4, or by professionals. One of the most significant differences between the datasets in this branch and AirBirds is that birds in these datasets are carefully tailored, which means they are often centered in the image, occupy the main part and have clear outlines. However, it is unlikely for birds captured in real-world airports to have these wonderful characteristics. Moreover, bounding box annotations are absent in some of them, e.g., VB100 [7], thus they are not suitable for the research of tiny bird detection.

Well-Known Datasets Containing Birds. Commonly used datasets in computer vision are also relevant as the bird belongs to one of the predefined categories in those datasets and there exist numbers of samples, such as ImageNet [5], COCO [14], VOC [6], CIFAR [10]. However, the above-mentioned datasets are dedicatedly designed for the research of general image classification, object detection and segmentation, not for bird strike prevention. And their data distributions differ from AirBirds, thus limited information can be transferred to this task.

The comparisons with related work are summarized in Table 1. AirBirds offers the most instances, the longest duration and the richest scenarios in image form.

Fig. 2.
figure 2

The number of images per month in AirBirds.

3 AirBirds Construction

This section describes the process of constructing the AirBirds dataset, including raw data collection, subsequent cleaning, annotation, splits and sorting to complete it.

3.1 Collection

To cover diverse scenarios and prepare adequate raw data, we decide to record in a real-world airport (Shuangliu International Airport, Sichuan Province, China) over 4 seasons of a whole year. The process of data collection starts from September 2020 and ends in August 2021.

Considering frequent takeoffs and landings, airport runways and their surroundings are major monitoring areas. We deployed a network of high-resolution cameras along the runways, as Fig. 1a shows. All deployed cameras use identical configurations. The camera brand is AXIS Q1798-LEFootnote 5, recording 1920\(\times \)1080 images at a frame rate of 25. Due to the vast volume of raw data but a limited number of disks, it is infeasible to save all videos. We split into two parallel groups, one group for data collection and the other for data processing, so disk spaces can be recycled once the second group finishes data processing.

3.2 Preprocessing

This step aims to process raw videos month by month and save 1920\(\times \)1080 images in chronological order. 25 frames per second in raw videos lead to numerous redundant images. To avoid dense distribution of similar scenarios, a suitable sampling strategy is required. One crucial observation is that the video clips where flying birds appear are very sparse compared to other clips. Hence, at first, we manually locate all clips where there exist birds, then sample one every 5 continuous frames in previously selected clips instead of all of them, resulting in an average of 300+ images per day, \(\sim \)10000 images per month, 118,312 in total. The number of images per month is shown in Fig. 2 and 13 meteorological conditions and the corresponding number of days are depicted in Fig. 3.

Fig. 3.
figure 3

The number of days of different weather in AirBirds.

3.3 Annotation

To ensure quality and minimize costs, we divide the labeling process into three rounds. The first round that generates initial bounding box annotations for birds in the images is done by machines. The second round refines previous annotations manually by a team of employed workers. It should be noted that the team does not have to discover birds from scratch. In the third round, we are responsible for verifying those manual annotations and requiring further improvements of low-quality instances.

figure a

It is not a simple task for humans to discover tiny birds from collected images with broad scenes. In the first round, we develop an algorithm for generating initial annotations and run on a computer. The idea of this algorithm is related to background subtraction in image processing. In our context, cameras are fixed in real-world airports, thus the background is static in the monitoring views. Since the images in each sequence are in chronological order, considering two consecutive frames, by computing the pixel differences between the first and second frame, the static part, namely the background, is removed while other moving targets, such as flying birds in the monitoring areas, are probably discovered. Algorithm 1 describes the detailed process. Initially, we treat the first frame as background, convert it to gray mode, apply Gaussian blurFootnote 6 to this gray image, and denote the output image as b, then remove the first image from the input sequence \(\mathcal {S}\). The set of initial bounding box annotations \(\mathcal {B}\) is empty. Then we traverse the image \(I_i\) in \(\mathcal {S}\). In the loop, \(I_i\) is also converted to gray image \(g_i\). After that, Gaussian blur is applied to \(g_i\) to generate a denoised image \(c_i\). Then we compute differences between b and \(c_i\), resulting in d. Fourth, regions in d whose pixel values are in the range of [min, max] are considered as areas of interest, e.g., if the pixel differences of the same area in those 2 consecutive frames are more than 30, there probably are moving targets in this area. The dilation operation is applied to those areas to expand contours for finding possible moving objects \({\textbf {c}}_i\), including flying birds. After that, heuristic rules are used to filter candidates according to the object size, e.g., big targets like airplanes, working vehicles, workers, are removed, resulting in \({\textbf {b}}_i\). Then \({\textbf {b}}_i\) is inserted into \(\mathcal {B}\). Finally, we set background b as \(c_i\) and move forward. The key steps of this algorithm are visualized in Fig. 4.

Refinement is required since previously discovered moving objects are not necessarily birds. In the second round, we cooperate with a team of workers to accomplish the task. According to the predefined instructions, every single image should be zoomed in to 250+% to check the initial annotations in detail and the team mainly handles 3 types of issues that arose in the first round (i) add missed annotations, (ii) delete false-positive annotations, (iii) update inaccurate annotations. In the third round, we go through the annotations refined by the team, requiring further improvements where inappropriate.

3.4 Splits

To facilitate further explorations of bird strike prevention based on this dataset, it is necessary to split AirBirds into training and test set.

We need to pay attention to three key aspects when splitting the dataset. First, we should keep a proper ratio between the size of the training and the test set. Second, it is essential to ensure training and test sets have a similar distribution. Third, considering the characteristic of chronological order, we should put a complete sequence into either the training or the test set rather than split it into different sets.

Fig. 4.
figure 4

Visualization of key steps in Algorithm 1.

At last, we divide 98,312 images into the training set and keep the remaining 20,000 images in the test set, a nearly 5:1 ratio. All images and labels are publicly available, but excluding the labels in the test set. The validation set is not explicitly distinguished as the primary evaluation should take place on the test set, and users can customize the ratio between training and validation set individually. We are actively building an evaluation server and the labels in the test will be kept there.

In addition, the images in AirBirds can also be divided into 13 groups according to 13 kinds of scenarios shown in Fig. 3. This division is easy to achieve since each image is recorded on a specific day and each day corresponds to one type of meteorological condition, according to the official weather report. Based on the division, we can evaluate the difficulty of bird discovering in different scenarios in real-world airports.

4 Experiments

In this section, a series of comprehensive statistics and experiments based on AirBirds are presented. First, we investigate the data distribution in AirBirds and compare with relevant datasets to reflect their significant differences. Second, a wide range of SOTA detectors are evaluated on the developed dataset for bird discovering and the results are analyzed in detail to understand the non-trivial challenges of bird strike prevention. Third, the effectiveness of Algorithm 1 is evaluated since it plays an important role in the first round of annotations when constructing AirBirds.

4.1 Distribution

In this subsection, we investigate the distribution of AirBirds and compare with relevant datasets. Figure 5 shows the distribution of width and height of bounding box in different datasets. Obviously, objects in AirBirds have much smaller sizes. Further, Fig. 6 depicts the proportion of objects with various sizes in relevant datasets. 88% of all instances in AirBirds are smaller than 10 pixels and the rest 12% are mainly in the interval [10, 50). Therefore, data distribution in real-world airports is significantly different from that in web-crawled and tailor-made datasets.

Fig. 5.
figure 5

Distributions of width and height of annotated instances in COCO and AirBirds.

4.2 Configurations

A wide range of detectors are tested on AirBirds for bird discovering. Before reporting their performances, it is necessary to elaborate on the specific modifications we made to accommodate the AirBirds dataset and the detectors. Concretely, we customize the following settings.

Models. To avoid AirBirds preferring a certain type of detectors, various kinds of strong baselines are picked for evaluation, including one-stage, multi-stage, transformer-based, anchor-free, and other types of models, referring to Table 2.

Devices. 6 NVIDIA RTX 2080Ti GPUs are used during training and a single GPU device is used during test for all models.

Data Format. The format of annotations in AirBirds is consistent with YOLO [16] style. Then we convert them to COCO format when training models other than YOLOv5.

Anchor Ratios and Scales. We need to adapt the ratios and scales for anchor-based detectors to succeed in custom training because objects in AirBirds have notable differences in size with that in the commonly used COCO dataset. The k-means clustering is applied to the labels of AirBirds, finally the ratios are set to [\(\frac{8}{13}\), \(\frac{9}{12}\), \(\frac{11}{9}\)] and the scales are set to [2\(^0\), 2\(^{\frac{1}{3}}\), 2\(^{\frac{2}{3}}\)].

Learning Schedules. All models are trained from scratch with optimized settings, e.g., training epochs, learning rate, optimizer, batch size, etc. We summarize these settings in Sect. 2 in the supplementary material.

Fig. 6.
figure 6

Comparisons of the ratio of the number of objects with different sizes in the datasets Birdsnap, COCO, VOC and AirBirds. The numbers in each bar are in %. Here the object size in pixel level is divided into 4 intervals: (0,10), [10,50), [50, 300) and [300,\(+\infty \)).

Algorithm 1. The thresholds of pixel differences in Algorithm 1, min and max, are set to 25 and 255, respectively.

4.3 Results and Analysis

Both accuracy and efficiency are equivalently important for bird discovering in a real-world airport. The accuracy is measured by average precision (AP) and the efficiency is judged by frames per second (FPS). Results are recorded in Table 2.

For accuracy, the primary metric AP seems unsatisfactory, e.g., the highest score achieved by YOLOv5 is only 11.9, and the scores of all other models are less than 10. We also compare the performances of those detectors on COCO and AirBirds, shown in Fig. 7. Under the same detector, however, the performance gap is surprisingly large. For instance, the AP score of EfficientDet-D2 on COCO exceeds the one on AirBirds by 41.5(=42.1-0.6).

Table 2. Comparisons of various kinds of object detectors on AirBirds test set. The column AP\(_l\) has been removed as there are few large objects in AirBirds and the corresponding scores are all 0. EffiDet: EfficientDet-D2, Faster: Faster RCNN, Cascade: Cascade RCNN, Deform: Deformable DETR. These abbreviations have the same meaning in the following figure or table.
Table 3. Comparisons of Algorithm 1 and YOLOv5 in terms of precision, recall and f1 score. Algorithm 1 runs on a common computer and YOLOv5 is tested with a 2080Ti GPU.

Besides, precision-recall relationship are also investigated, and results are shown in Fig. 8. The trend in all curves is that precision decreases with increased recall because more and more false-positive birds produce as more and more birds are recalled. YOLOv5 outperforms others while precision drops to 0 when recall reaches 0.7.

At this point, we wonder whether these detectors are well trained on AirBirds. Hence, their training losses are visualized in Fig. 9. We observe the losses of all detectors drop rapidly in the initial rounds of iterations, then progressively become smooth, indicating the training process is normal and converges to the target.

In terms of efficiency, YOLOv5 also outperforms others, surpassing 100 FPS on a 2080Ti GPU. However, most of detectors fail to operate in real-time efficiency even with GPU acceleration, which deviates a key principle of bird strike prevention.

We also wonder why a wide range of detectors work poorly on AirBirds. Reasons are detailed in Sect. 3 in the supplementary material due to space limitation.

In short, existing strong detectors show decent performances on commonly used datasets e.g. COCO, VOC etc. However, even with carefully customized configurations, they have room for significant improvements when validating on AirBirds. The results also imply the non-trivial challenges of the research of bird strike prevention in real-world airports, where AirBirds can serve as a valuable benchmark.

Fig. 7.
figure 7

Comparisons of the performances among representative detectors on AirBirds and COCO.

Fig. 8.
figure 8

Precision-Recall curves of different detectors in VOC [6] style.

Fig. 9.
figure 9

Training losses of different types of detectors.

4.4 Effectiveness of the First Round of Annotations

As mentioned in Sect. 3, Algorithm 1 provides the first round of bounding box annotations for possible flying birds and the annotations are saved. Here we validate its effectiveness and compare it with the best performing YOLOv5. Different from average precision that sets strict IoU thresholds between detections and groundtruth, actually precision, recall and f1 score are more meaningful metrics for evaluating initial annotations.

Table 3 shows Algorithm 1 recalls more than 95% of birds in the initial round, which saves workers numerous efforts of discovering birds in subsequent rounds from scratch thus save costs. In addition, the results indicate that sequence information is helpful for tiny flying birds detection as the input images in Algorithm 1 are in chronological order. The star symbol in the second row in Table 3 means the results of Algorithm 1 are obtained on an ordinary computer(i5 CPU, 16GB memory), without GPU support.

5 Conclusion

In this paper, we present AirBirds, a large-scale challenging dataset for bird strike prevention constructed directly from a real-world airport, to close the notable gap of data distribution between real world and other tailor-made datasets. Thorough statistical analysis and extensive experiments are conducted based on the developed dataset, revealing the non-trivial challenges of bird discovering and bird strike prevention in real-world airports, which deserves increasing and further investigation, where AirBirds can serve as a first-hand and valuable benchmark.

We believe AirBirds will alleviate the fundamental limitation of the lack of a large-scale dataset dedicated for bird strike prevention in real-world airports, benefit researchers and the field. In the future, we will develop advanced detectors for flying bird discovering based on AirBirds.