1 Introduction

The advantages of using drones, compared with enduring platforms (manned aerial vehicles and satellites), are the lower-altitude flights, images with high spatial resolution and low cost use and maintenance for monitoring and sensing environments. In recent years, the powerful features of drones have been improving, resulting in drones becoming a major field instrument for researchers. Thus, an increasing number of companies are being persuaded by the simple mechanics of drones for surveillance and infrastructure inspection applications. Generally, drones have the ability to fly at various speeds indoors or outdoors and control their position around targets and obstacles using various sensors to detect their environment. All of these advantages and features make them increasingly suitable to replace human operations in situations in which experts cannot participate, especially in dangerous, difficult, expensive or exhausting conditions Kanellakis and Nikolakopoulos (2017). Drones can be controlled remotely from a ground control station (GCS) by the pilot (remotely piloted aerial system (RPAS)) or can be automated by the onboard, programmable sensors mounted on it. As a vehicle, drones refer to the supporting hardware such as sensors, microcontrollers, ground stations, and software including communication protocols, and user interfaces. To perform most of the unmanned aerial vehicle (UAV) applications, the computer vision method has a vital role. Computer vision aims to interpret the 3D world into metric data by processing 2D images of planes in different applications. Each computer vision simulation should consider four tasks, namely, acquiring, processing, analyzing and understanding digital videos and images Elharrouss et al. (2020). The image deciphering, assists in automating the real-world problems, especially those that are difficult to the average human to perceive. The computer vision methods in drone applications ranges from basic and simple aerial imagery to super complex tasks such as aerial refueling or rescue operations. The methods for performing the application accurately require reliable decision-making and precise maneuvering tasks Al-Kaff et al. (2018).

In this paper, based on videos and images captured by drones in computer vision, we present a survey of works that have introduced a database for the various applications of the videos and images and works that have used these databases. We have categorized applications into three groups. The first group of applications is related to remote sensing with challenges such as camera calibration, image matching, and aerial triangulation. The second group of applications use their own drone navigation in which computer vision methods are designed to explore challenges such as flight control, visual localization and mapping, and target tracking and obstacle detection. The third group of applications is dedicated to using images and videos captured by drones in applications such as surveillance, agriculture and forest, animal detection, disaster detection, and face recognition. This survey summarizes the knowledge generated by 228 articles and provides insights based on many additional articles and supporting literature. A statistical report on the surveyed literature from 2005 until the present (October 2020) is shown in Fig. 1. As it can be verified in the figure that growing academic interest based on papers that provide databases on the topic in terms three categories, from 2017 until the present (October 2020). All works have been categorized into 116 journals, 75 conferences, 28 preprints, 6 reports and 3 books/thesis classes.

Fig. 1
figure 1

Number of papers used in this survey by publication year (from 2005 to October 2020)

For researchers, our survey is an introduction to open research. Additionally, we provide an overview of the existing literature and present databases for remote sensing and navigation based on computer vision and applications related to images captured by drones. Because of the breadth of the research area, many other related topics considering our literature survey are considered in this article:

  • An accurate, sharp boundary cannot be found to separate militarysecurity and civil applications. We try to include articles with civil and commercial applications of drones, which generally can be used in both contexts.

  • We try to concentrate on outdoor applications, which can also be used in indoor environments.

  • Since the infrastructure of each application in computer vision is a database, we have focused on databases in various applications and works related to these databases.

  • Papers were explored that explicitly introduce or use databases based on drones in their title, abstract, or keywords or used databases in the experimental results section using any relevant term or description. Our keywords for searching were “vehicle, UAV, drone, unmanned aircraft, unmanned aerial system (UAS), remotely piloted aerial system (RPAS), and remotely piloted vehicle”, but they were not limited to these words. Then, we choose only papers related to computer vision and those that created a database. It should be noted that topics related to drones such as operations planning of mobile robots (including ground-based drones), mobile sensors, vehicle routing and machine scheduling are not part of this survey.

  • Paper were considered if they met the following publishing criteria: peer-reviewed English journals, peer-reviewed conference proceedings, or recent manuscripts from open-source archives. Additionally, we have tracked all the studies from authors who are distinguished and experts in the field. Due to a large number of publications, we are unable to include all the publications. However, we try to include important articles on the topic.

  • All papers of the sections (text, tables, figures, and plots) are sorted in term of categories and publication year.

This paper presents comprehensive insights into the evaluation and benchmarking of videos and images captured by drones based on the three categories, as shown in Fig. 2.

We provide a background on drones and their developed applications based on computer vision in Sect. 2. In Sect. 3, we summarize databases related to remote sensing and navigation groups. Based on our survey categories in the literature, we then describe applications that can be applied to images and videos captured from drones in Sect. 4. Section 5 is devoted to open challenges and research that can be done in the future. Section 6 outlines future directions and concludes our article.

Fig. 2
figure 2

Overview of our survey structure

2 Background

In this section, we first present a history of and developments in drone technology and then a brief description of the types of drones and cameras used are presented. While a brief description of surveys presented up until recently is described, finally, a general description of drone-based computer vision is presented.

2.1 History and developments

The history of using drones dates back to the First Italian War of Independence (1849), when, in response to dropping bombs on Venice, a system of unmanned hot air balloons were designed by the Austrian Empire. This development led to the use of hot air balloons and kites for communication during the American Civil War and the Spanish–American War, and it has endured and been developed for military use until the twenty first century. Advances were observed when the tensions between the U.S. and the Soviet Union during the Cold War were increasing, during which the U.S. government started a UAS research program under the code name “Red Wagon”. In parallel with the advances, the first version of the Global Positioning System (GPS) based on the global satellite navigation system was introduced by the Defense Advanced Research Projects Agency (DARPA). The genesis of commercial application of drones was in 2006, as shown in Fig. 3. The figure summarizes the commercial aspects of drones until the present. Dajiang (DJI), as a leader in the commercial and civilian drone industry, created the first commercial drone in 2006. DJI has steadily developed drones for various applications around the world. From 2012, the Federal Aviation Administration (FAA), according to U.S. law, has managed to integrate small drones into the airspace and reports each year details about the drones, including the number and distance limitations. In 2013, Amazon announced plans to deliver products by drones. For more information regarding history and development of drones, readers can refer to Rakha and Gorodetsky (2018).

As shown in Fig. 3, from 2018, researchers in computer vision based on drone systems have produced and developed databases. To the best of our knowledge, no one field or industry has presented a comprehensive review of studies using databases of videos and images captured from drones. In addition, since each study based on computer vision methods has specific database needs in various applications, a summary of all of them is useful for continuing research. This survey is dedicated to gathering, comparing, contrasting, and assessing current and emerging research in drone fields based on the databases created.

Fig. 3
figure 3

A historical timeline of UAS technology developments based on commercial aspects

2.2 UAVs and cameras types

Drones fly without needing roads, and thus, they can reach difficult locations for various aims. Many companies have produced types of drone models for different missions to reduce labor costs. In the production process, issues such as the weight of the aircraft and thus its energy consumption, thermal control, and cabin pressurization are important. Figure 4 illustrates several models of drones.

Fig. 4
figure 4

Different models of drones

To sense different situations, we need to use a variety of sensors. For example, to sense the environment and estimate their position and orientation in space, exteroceptive, and proprioceptive sensors such as the global positioning system (GPS) are mounted on drones. In addition, drones can be equipped and embedded with different types of sensors to extract useful data and information. Ultrasonic sensors and visual stereo or monocular camera systems can be directly used to detect and avoid obstacles and map 3D environments. This can be integrated with laser range finders and inertial measurement units (IMUs) to provide more accurate results and visual-inertial ego-motion estimation. Some examples of modular vision systems are depicted in Fig. 5. In this survey, we explore images and videos, and accordingly, we consider studies that include a camera as a primary or secondary sensor.

Fig. 5
figure 5

Different models of cameras used in computer vision applications

2.3 Related to previous surveys

A number of representative surveys concerning drone-based computer vision have been presented, as summarized in Table 1. The research reviewed in Colomina and Molina (2014), Pádua et al. (2017) was dedicated to presenting 3D reconstruction and geometric correction methods. Reference Xiang et al. (2018) focused on surveying the issues of specific aerial remote sensing data processing, such as image matching and dense image matching. As mentioned in Xiang et al. (2018), some other drone data processing technologies and their recent advances were presented with a focus on deep learning and related methods on drone data geometric processing. References Kanellakis and Nikolakopoulos (2017) and Al-Kaff et al. (2018) provide a comprehensive review of navigation systems, which also include advances in computer vision. However, recent developments in current procedures and methodologies of drone-based thermal imaging practices were detailed in Rakha and Gorodetsky (2018). In addition, some surveys reviewed specific applications of UAVs in remote sensing fields, such as agriculture Gago et al. (2015), forestry Yuan et al. (2015), disaster Adams and Friedland (2011), Giordan et al. (2018) and surveillance Puri (2005), Kanistras et al. (2015). Extensive work on other hot issues, such as optimization approaches for civil applications Otto et al. (2018) and machine learning approaches Choi and Cha (2019), was explored separately. Considering the problems discussed above, it is imperative to provide a comprehensive survey of drones, centering on drone-based computer vision methods based on databases, recent applications, and future directions. A thorough review and summarization of existing work is essential for further progress in drone computer vision, particularly for researchers wishing to enter the field. The objectives of this paper are the following:

  • a systematic survey of computer vision methods based on databases are categorized into three different themes; (in each section, we provide a critical overview of databases and the methods applied to them);

  • a detailed overview of recent potential applications of drones in computer vision tasks;

  • a discussion of the future directions and challenges of drones from the point of view of databases.

Table 1 List of a number of related surveys on UAVs in recent years

2.4 General description of UAV-based computer vision

Today, computer vision methods are applied in most drone applications. By developing computer vision algorithms and decreasing their errors and embedding them into sensors, drones can not only be used for simple applications such as photography and filming but also in more complex applications. Cameras obtain images and videos. After obtaining images and videos by drones, tasks related to them (e.g., image processing and analysis to collect scene information, including drone attitude and position) can be considered. Additionally, the distance of drones from the building should be considered. The distance depends on the laws of the specific country (for example, in the U.S., the FAA controls and manages the rules). However, the distance for commercial purposes is almost 5 m. All the distance variations over the years can be found in Rakha and Gorodetsky (2018). The term computer vision includes characteristics and analyses of the real 3D world from 2D image planes. To implement a computer vision system, three fields are involved, namely, image processing, pattern recognition and machine learning. In the first step, it needs to use image processing methods to provide images and videos to execute processes, such as noise removal and morphology tools. Then, in terms of each application, several methods are applied on the processed images to extract features and patterns. Finally, machine learning methods are used to learn the various patterns to automate the processes. In new machine learning methods, such as deep learning, all or two steps are integrated. Computer vision, in general, focuses on interactions with the environment as well as the basic applications of machine inspection, navigation, 3D model building, and surveillance. One of the other contexts related to drones is imaging, which includes the process of producing images and involves image processing and computer vision. Consequently, the development of drones and their corresponding capabilities in computer vision can be used in object recognition, object tracking, pose estimation, ego-motion estimation, optical flow, and scene reconstruction Kanellakis and Nikolakopoulos (2017).

In the following section, we present information about events related to computer vision and drones, as shown in Table 2. It should be noted that in conjunction with ICCV 2017Footnote 1, ECCV 2018Footnote 2 and CVPR 2019Footnote 3, three workshops based on computer vision problems for drones have been presented. Each year starting in 2013, the International Conference on Unmanned Aircraft Systems presents new issues in the field. Additionally, many competitions are organized that use images and videos captured by UAVs, such as Kristan et al. (2017). As shown in our references, journals attracting the most attention in the field were Sensors, Remote Sensing, and the IEEE Transactions journals. It should be noted that most of the detection methods in computer vision applications are real-time methods and since most of researchers cannot provide a real condition for their methods based on drones, in next sections, we consider papers that provide a database for these purposes. In the survey, we explored all databases, including RGB, thermal and multispectral images and videos. Additionally, we consider the databases in terms of the type of availability: public, private and upon request.

Table 2 The journals and conferences attracting the most attention in the UAV based on computer vision field

3 Remote sensing and navigation databases

With the increase in the number of applications by drones in recent decades, advances in photogrammetry and remote sensing have turned into a commercial competition. In remote sensing, it is important to know the quality of the information and the acquisition obtained by the sensors. Remote sensing based on drones provides high-resolution images and videos a low photographic altitude and other data at spatial, spectral and temporal scales compared with satellite and manned aerial remote sensing. Camera calibration, image matching, aerial triangulation, dense reconstruction, image stitching, and multisensor registration are computer vision problems in remote sensing, and these problems have recently been explored in a survey Xiang et al. (2018). The important role of large databases is not only for evaluating traditional methods but also can be useful for applying a new approach, such as deep learning models Elharrouss et al. (2019). However, in recent years, a few works have provided publicly available databases, which requires more effort. To prepare a standard database, we need to follow a series of the rules. Reference Long et al. (2020) discussed the rules to create a standard database for remote sensing applications. Remote sensing databases are as follows.

The International Society for Photogrammetry and Remote Sensing (ISPRS) and EuroSDR presented a database Nex et al. (2015) for image orientation and dense matching. The database provided oblique airborne, UAV-based and terrestrial images captured from Dortmund, Germany, and Zurich, Switzerland. Additionally, terrestrial laser scanning, aerial laser scanning, topographic networks, and GNSS points accompany it as ground truth data. In addition, 3D coordinates on checkpoints (CPs) and cross-sections and residuals on generated point cloud surfaces were presented.

To mosaic images captured by drones, Xu et al. (2016) presented a large database that can also be used for image matching and camera calibration. Images with a resolution of 3680 by 2456 pixels and from flying heights of 558 m, 405 m, and 988 m were captured over Yongzhou, Hechi and HeJiangdong of Hunan Province, China. One of the drones was used Pix4DFootnote 4 with a Panasonic DMC-GF1 camera with a 20 mm focal length lens mounted on it.

In Al Kaff (2017), three database groups were introduced, and some of the state-of-the-art methods for image matching were applied to them. The images were captured by quadcopter drone with a 1270 by 720 resolution from flying heights of 61.1 m, 78.6, and 153.6 m. The images were obtained from both outdoor and indoor scenarios.

The Image Fisheye database Yin et al. (2018), with the aim of camera calibration, evaluating distortion parameter settings, and rectification of images, was created. Additionally, a deep learning method based on an end-to-end multi-contextual collaborative network was presented that estimated the distortion parameter and subsequently removed them from captured images. As recommended in Xiang et al. (2018), the database can be used for evaluating the camera calibration of drones.

To estimate 3D pose, a drone-assistant database synthesis was introduced in Albanis et al. (2020). In the study, the DJI Mavic Enterprise drone was equipped with a HoloLens 2.0 external color camera. The database has both the egocentric view of a cooperative drone and the exocentric view of the user.

Some sample images from the databases are shown in Fig. 6.

Fig. 6
figure 6

Samples of the database images related to remote sensing methods: a Nex et al. (2015), b Xu et al. (2016), c Al Kaff (2017), d Yin et al. (2018), and e Albanis et al. (2020)

Stabilizing and automating flight accurately are the targets of modern drones, which leads to the design of navigation systems at higher levels than previous systems in terms of speed, accuracy, and autonomy with accurate flight stabilization. The main part of an autonomous UAV is the navigation system and its supporting subsystems. The navigation supporting subsystems (pose estimation, obstacle detection, and visual servoing) use data captured by various sensors and integrate the data for the navigation system. One of the important tasks in the system is estimating the pose of the drone in terms of positions (xyz) and orientations (uvw), and the rest of the tasks, such as detecting obstacles and tracking targets statically and dynamically, are handled by other subsystems, which are finally integrated. Today, due to the increase in sensors based on vision and the improvement of computer vision methods, companies tend to design and produce drone navigation systems using cameras and analyze their data Al-Kaff et al. (2018). Additionally, in the system, three subsystems of pose estimation, obstacle detection, and visual servoing should be redesigned based on computer vision methods. In the navigation systems group, two survey papers presented in 2017 Kanellakis and Nikolakopoulos (2017) and 2018 Al-Kaff et al. (2018) did not explore any databases, and therefore, we introduce the databases of the group in this section.

The Video Verification of Identity (VIVID) database Collins et al. (2005) includes images captured on a runway with changing drone flight heights in both visible and thermal IR imagery for the aim of tracking a vehicle. In addition, they provided ground truth images for tracking aim and a website for interacting with the researcher for testing new methods. Original videos are in AVI format, and their frames are presented separately.

The database presented in Zimmermann et al. (2009) is not based on images captured by a drone; however, many researchers use this database for pose estimation and tracking based on drones Luna (2013). The database includes three objects (MOUSEPAD (MP), TOWEL, and PHONE) and position them in each frame and were labeled as their ground truth.

Reference Pestana et al. (2013) presents a database that follows navigation purposes, and many researchers have used it. To collect the database, AR Drone 2.0 is used in an unstructured condition at flying heights ranging from 1 to 2 m and from 10 to 15 m. The database is useful for training state-of-the-art and deep learning network methods.

Reference Tian et al. (2016) presented a database for adjusting the brightness of two matched images and can also be used for other image processing steps. The study area was the northwestern part of the Sichuan Basin, China, which was captured by a drone at a height of 400 m and a speed of 50 km/s equipped with a nonmeasurement array charge-coupled device (CCD) camera with a resolution of 0.3 m.

Reference Robicquet et al. (2016) introduced a database for navigation aims, such as multitarget tracking and predicting the trajectories. The Stanford Drone Dataset (SDD) includes images and videos recorded by a quadcopter drone (a 3DR solo) equipped with a 4K camera at a flying height of 8 m over intersections of the Stanford University campus with a resolution of 1400 by 1904 pixels. Additionally, due to providing comprehensive ground truth, the database is sufficient to test deep learning methods Wang et al. (2018).

In Rozantsev et al. (2017) to evaluate navigation problems such as obstacle detection, two databases were created, one of which was based on drones. A drone equipped with a camera flew in various weather conditions and flying heights recorded environment at a resolution of 752 by 480 pixels. They evaluated their approach, a convolutional neural network (CNN), on the databases. Additionally, because the paper used a CNN, they provided patches of images with the original sizes.

The UAV mosaicking and change detection (UMCD) database Avola et al. (2018) includes images and videos at low altitude for mosaicking and detecting altitude changes. Compared with other aerial databasesFootnote 5,Footnote 6 that have many goals, this database focused on these two goals. Images recorded by drone of the National Marine Electronics Association (NMEAFootnote 7) are from flying heights from 6 m to 15 m with speeds from 2 m/s to 12 m/s with spatial resolutions ranging from 720 by 540 (4:3, standard definition) up to 1920 by 1080 (16:9, high definition) pixels per frame.

In Bharati et al. (2018), to detect obstacles and track moving objects by a drone with a forward-looking camera, a database and a method based on a kernelized correlation filter (KCF) framework tested on it were presented with variations in the scale, axial and planar rotation, partial occlusion, illumination variation, and camera instability.

Reference Chen and Lee (2018) presented the National Campus Taiwan University (NCTU) campus database to detect obstacles such as pedestrians, cars, trees, leaves, trunks, trucks, poles and buses by autonomous drone flight. The drone used in the research was a small quadrotor equipped with a PixhawkFootnote 8 controlling system and an Nvidia TX2 embedded system suitable for applying deep learning methods. Additionally, they applied a deep learning network (UAVNet) on patches extracted from the database.

In Loquercio et al. (2018), a database (original images and patch-based images) and a deep learning method (DroNet) were presented to an autonomous flight system. DroNet is a deep learning method based on CNNs with the aim of flying drones over the streets of a city. A forward-looking camera was mounted on Parrot Bebop 2.0 droneFootnote 9 with flying heights ranging from 5 to 30 m. Additionally, in Palossi et al. (2019), the database was developed by images captured by a COTS Crazyflie 2.0 nano quadrotorFootnote 10.

Reference Müller et al. (2018) presented a simulator (Sim4CV) along with a related database to cover many applications in the computer vision community that is suitable for autonomous drone flights and moving objects. They collected images and videos from two drones with speeds of 4 m/s, 6 m/s, and 8 m/s equipped with stabilized cameras. Additionally, the Sim4CV project presented a deep learning method for the aims above and displays comprehensive information on their website.

Reference Mantegazza et al. (2018) introduced a database for autonomous flight and moving object detection. The quadrotor drone captured images at an altitude of 1–2 m for 45 min. They applied some state-of-the-art machine learning methods and deep learning networks and compared them with the database.

A synthetic 3D database obtained by flying a drone in suburban and urban areas at high speeds was presented in Marcu et al. (2018). The database can be used to estimate depth and safe landings and to test deep learning methods in an environment with obstacles. They provided additional data such as RGB, depth and safe-landing information from Google Earth.

In Kang et al. (2019), a database based on images captured by a Crazyflie 2.0 nano drone equipped with a 3.4-g monocular camera at an altitude of 40 cm and a speed of 30 cm/s to meet autonomous flight challenges was presented. The images were collected at Cory Hall at UC Berkeley and to evaluate them, a deep reinforcement learning method was tested on it. However, the research was designed for indoor scenarios, but the database can be used for outdoor scenarios.

The benchmarking database Backes et al. (2019) was designed for flood mapping and modeling for images captured by drones. A Pix4D drone was used at flying heights of 50 m and 60 m to take high-resolution images for creating a 3D mapping. Designing accurate models can help people affected by floods, especially in urban areas.

The database presented in Karaduman et al. (2019) can be useful in patrolling and tracking challenges by detecting the drone route. The drone speed and flight altitude were 50 km/s and 100 m, respectively. To see the achieved results of the method presented in the paper, readers can refer to the supplementary materialFootnote 11 of the paper.

Some image samples from the databases are shown in Fig. 7. Also, the databases presented in the navigation and remote sensing groups are summarized in Table 3.

Fig. 7
figure 7

Samples of the databases images related to navigation methods: a Collins et al. (2005), b Zimmermann et al. (2009), c Pestana et al. (2013), d Tian et al. (2016), e Robicquet et al. (2016), f Rozantsev et al. (2017), g Avola et al. (2018), h Bharati et al. (2018), i Chen and Lee (2018), j Loquercio et al. (2018), k Müller et al. (2018), l Mantegazza et al. (2018), m Marcu et al. (2018), n Kang et al. (2019), o Backes et al. (2019), p Karaduman et al. (2019), and q a chart of the number of images, frames, and videos in terms of the databases (the order is corresponding to Table 3)

Table 3 List of the databases used by navigation and remote sensing groups

4 Applications of images and videos captured by drones

The section is dedicated to using images and videos captured by drones in various applications, such as surveillance, agriculture and forestry, animal detection, disaster detection, and face recognition as shown in categories of Fig. 2. In the group, we present new methods based on databases in each subgroup.

4.1 Surveillance

One of the important applications of drones is surveillance. We divided this application into traffic, crowd, and object detection.

4.1.1 Traffic and crowd detection

The significant increase in the number of vehicles in urban areas and on roadways has led transportation managers to propose new capabilities and systems for the surveillance of traffic and the issues related to it. One of the systems for this aim is use of drones and devices mounted on them, in contrast to the traditional technologies such as inductive loop detectors. The use of drones not only can increase the mobility and coverage domain but also the cost of its operation is significantly lower than that of manned aerial vehicles (MAVs). In Kanistras et al. (2015), Puri (2005) two surveys of drone-based systems for traffic monitoring and management are presented. In the following section, we explore databases created for traffic issues.

The VIRAT video database Oh et al. (2011) includes videos of both humans and vehicles in single-object and two-object categories of with annotated details. The database was collected by a camera used on a drone with an aerial video resolution of 640 by 480 pixels in natural scenes with people performing normal actions in standard contexts, with uncontrolled, cluttered backgrounds. Therefore, the database can be used in continuous visual event recognition (CVER) in which an event can be recognized.

Reference Liu and Mattyus (2015) presented aerial images based on a drone over Munich, Germany, equipped with a German Aerospace Center (DLR) 3K camera system with a resolution of 5616 by 3744 pixels at a flying height of 1000 m. The database is suitable for detecting vehicles in multiclass and multidirectional scenarios. They applied a method based on a fast binary detector using integral channel features in a soft cascade structure.

A video database based on a car parking with the aim of privacy inspection and in three categories (normal, suspicious, and illicit behaviors) was presented in Bonetto et al. (2015). A DJI Phantom 2 Vision+ mini-drone was used to collect videos with a mounted camera with full HD resolution. The database was manually annotated for persons and vehicles in each scene using the ViPER-GT toolFootnote 12 in XML format. Additionally, a method using privacy filters was applied to evaluate the database goal.

To collect images in Xu et al. (2016), a quadcopter (Phantom 2) with a GoPro Hero Black Edition 3 camera (resolution of 1920 by 1080) was used. Some scenarios were considered for different weather conditions, locations, time and flight altitudes (refer to Table 1, page 12 in Xu et al. (2016)). Additionally, for traffic monitoring, a method based on the Viola-Jones (V-J) and linear support vector machine (SVM) classifier with HOG features (HOG + SVM) was proposed.

In Najiya and Archana (2018) a method to detect vehicles along with the amount, speed, and densities of bidirectional flow based on enhanced videos, a Kanade–Lucas–Tomasi (KLT) tracker, a SVM, and connected graphs for traffic surveillance was presented. The method was applied to the presented database by using drones to collect the videos with a resolution of 336 by 596.

Reference Kyrkou et al. (2018) presented different models using deep learning methods for traffic monitoring on a database created by drones in different illumination, viewpoint, and occlusion conditions. Since the speed of transmitting and processing data from drone to GCS is vital, they designed a light CNN for the aim that is compared with other deep networks.

Reference Ke et al. (2018) introduced a database for traffic surveillance by drones over different roadway segments with an orthographic camera, and images at a resolution of 60 by 40 were captured. Additionally, they proposed a method based on deep learning to address irregular ego-motion, low estimation accuracy in a dense traffic situation, and high computational complexity. It should be noted that the database was an updated version from that developed in Ke et al. (2017).

The database created by original images and patches presented in Zhu et al. (2018), is suitable for detecting, counting, and tracking vehicles and location and type (car, bus, or truck) recognition. The database used a Zenmuse X3 camera at a 3840 by 2178 resolution mounted on an Inspire 1 Pro quadcopter in sunny and cloudy weather. Since the authors provided images based on (512 by 512) patches, deep learning methods can be applied to the database.

Crowd detection is one of the challenging problems in surveillance and behavioral analysis that attracts researchers in drone fields to address it. Upright views, detecting boundaries of crowds in places such as sport stadiums, drone locations and flight altitudes, and moving objects have been explored in images based on drones Minaeian et al. (2015). In the following section, we explore databases created for crowd issues.

In Tzelepi and Tefas (2017), a video and image drone database was created based on videos collected from the YouTubeFootnote 13, senseFly-Example-droneFootnote 14, as well as the UAV123Footnote 15 databases. The database is for detecting human crowds for applications in which crowd and non-crowd scenes should be classified. To solve this problem, the authors proposed a deep learning method. Additionally, patch-based images are publicly prepared for studies that use deep learning approaches.

The database presented in Al-Sheary and Almagbile (2017) includes three subgroups of images. The first groupFootnote 16 of images was collected via a low-altitude Pix4D drone with a Canon cameraFootnote 17 over Leftous. The second group was images download from the internet, while the third group was images captured over Mecca. To evaluate the database, the authors tested a segmentation method to extract the crowd.

In Almagbile (2019), images of different orientations and positions with resolutions of 691 by 1359, 683 by 471, and 689 by 1366 pixels were captured to detect and count people. The authors tested a method that uses features from accelerated segment test (FAST) and filters for extracting crowd features.

A drone-based vehicle re-identification (ReID) database was presented in Wang et al. (2019). Two DJI Phantom4 drones captured vehicles in different locations, with diverse view-angles and flight-altitudes. In addition, a deep learning method was tested for vehicle ReID.

To explore the congested urban environment in traffic monitoring, in Barmpounakis and Geroliminis (2020), a new database (pNEUMA) was presented. The images were captured by 10 consumer quadcopter DJI drones equipped with a camera with a resolution of 4096 \(\times\) 2160 pixels. The study place was included a total of 10 Km area, 10 km road network, low, medium, and high-volume arterial, more than 100 intersections, and more than 30 bus stops.

Reference Chen et al. (2020) extracted vehicle trajectory based on images recorded by a DJI Mavic professional drone. Images were collected with a resolution of 3840 \(\times\) 2160 pixels at an altitude of 223 m and 281 m. Both free flow and congested scenarios were considered in the database. Three procedures of the region of interest (ROI), the kernelized correlation filter (KCF), and transforming positions from the Cartesian coordinate in the video to the Frenent coordinate applied on the database.

DroneVehicle database Zhu et al. (2020a) was created by the Lab of Machine Learning and Data Mining, Tianjin University, China. The database was recorded by both RGB and infrared cameras mounted on a drone. The database covers scenarios such as urban roads, residential areas, parking lots, highways, objects such as car, bus, truck, van, and density such as sparse and crowded scene.

To detect and segment vehicles based on images captured by drone in Zhang et al. (2020), a database was presented. A DJI matrice 200 quadcopter equipped with a zenmuse X5S gimbal and camera collected the images with resolution from a range of 960 \(\times\) 540 pixels to 5280 \(\times\) 2970 pixels. A Multi-Scale and Occlusion Aware Network (MSOA-Net) with two parts of Multi-Scale Feature Adaptive Fusion Network (MSFAF-Net) and Regional Attention-based Triple Head Network (RATH-Net) was tested on the database.

Several deep learning methods were conducted on a database based on drone images presented in Lyu et al. (2020). The database is suitable for semantic segmentation in complex urban scenes for applications such as robotics and autonomous driving. The image resolution was considered in 4096 \(\times\) 2160 pixels and 3840 \(\times\) 2160 pixels.

Some image samples from the databases are shown in Fig. 8. Also, the databases presented in the traffic and crowd tasks are summarized in Table 4.

Fig. 8
figure 8

Samples of the database images related to traffic and crowd methods: a Oh et al. (2011), b Liu and Mattyus (2015), c Bonetto et al. (2015), d Xu et al. (2016), e Al-Sheary and Almagbile (2017), f Tzelepi and Tefas (2017), g Najiya and Archana (2018), h Kyrkou et al. (2018), i Ke et al. (2018), j Zhu et al. (2018), k Almagbile (2019), l Wang et al. (2019), m Barmpounakis and Geroliminis (2020), n Chen et al. (2020), o Zhu et al. (2020a), p Zhang et al. (2020), q Lyu et al. (2020), and r chart of the number of images, frames, and videos of the databases (the order is corresponding to Table 4)

Table 4 List of the databases used in traffic and crowd tasks

4.1.2 Object detection

Object detection (segmenting scenes to certain classes such as humans, buildings, or cars) is a basic step in computer vision that covers different areas in the field, such as image retrieval and video surveillance. In the following section, we explore drone-based object detection databases.

In Saif et al. (2014), a dynamic motion model (DMM) was applied to the UAV video databaseFootnote 18 (actions1.mpg and actions2.mpg) from the Center for Research in Computer Vision (CRCV) at the University of Central Florida, while in Maria et al. (2016), a database based on YouTube videos was collected to detect cars in a scene.

The UAV123 database Mueller et al. (2016) introduced 123 videos captured by drones at low altitudes for tracking issues and a simulator to evaluate moving targets in a real-time state. Attributes such as the aspect ratio change, full and partial occlusion, low resolution, illumination variation, fast motion, and camera motion were provided for researchers.

The Okutama-Action database and its annotations Barekatain et al. (2017) for concurrent human action detection present some challenging issues, such as a non-static camera with abrupt motion, the dynamic transition of actions, multiple concurrent actions and multi-labeled actors. The database was recorded by two drones with cameras with a resolution of 4 K and 45 or 90 degrees at flying heights of 10–45 m.

In Cehovin Zajc et al. (2017), the presented 360-degree videos can be used in active-camera robotics applications such as circling over a target object. The videos captured by a drone with a Ricoh Theta 360-degree camera over objects of different sizes have also annotated frames.

A car parking database (CARPK) was presented in Hsieh et al. (2017) in which challenges for object counting in parking lots were considered. A Phantom 3 professional drone at a flying height of 40 m recorded high-resolution videos. Additionally, the authors tested an object-counting method on the database based on layout proposal networks (LPNs) and spatial kernels.

The UAVDT benchmark Du et al. (2018) is a database for object detection and tracking that includes high-density, small-object camera motion, and real-time challenges with attributes such as different weather conditions (daylight, night and fog), flying altitudes (10–30 m), and different camera views (front view, side view and bird’s-eye view). The authors claimed that state-of-the-arts methods used in the database achieved a disappointing result because of new challenges presented in the database. Also, a developed version of the database was presented in Yu et al. (2020).

\(UG^{2}\) Vidal et al. (2018) includes uncontrolled videos at resolutions of 600 by 400 to 3840 by 2026 recorded by the drone that collected videos from the YouTube website. The database provided challenges related to glare, lens flare, low image quality, camera shaking, and images converted to patches for testing deep learning methods.

Mivia, a research laboratory of the University of Salerno, presented the Mivia database Carletti et al. (2018) for multiobject tracking. The database was a collection of DJI F-450 drone videos that the mounted Nilox F60 camera recorded with variable altitudes, speeds, and angles (yaw and pitch). Additionally, they proposed a method based on local data association with a backward chain for multiobject tracking.

Reference Zhu et al. (2018) was a report of the Vision Meets Drone 2018 challenge workshop in conjunction with the 15th European Conference on Computer Vision (ECCV 2018) that also presented a database for the Vision Meets Drone Video Detection and Tracking (VisDrone-VDT2018) challenge. Additionally, the database was developed in Zhu et al. (2018), Du et al. (2019), Zhu et al. (2020b, 2020c).

Reference Xu et al. (2018) describes the database downloaded from the DJI websiteFootnote 19, which is videos captured by various types of drones and cameras and suitable for a lower power object detection challenge (LPODC). Additionally, the paper reported the results of the System Design Contest (SDC) in conjunction with the 55th Design Automation Conference (DAC) in 2018.

The Urban Drone dataset (UDD) Chen et al. (2018) includes images over Beijing, Huludao, Zhengzhou, and Cangzhou (China) collected from images of a DJI Phantom 4 drone at flying heights of 60–100 m with resolutions of 4 K (4096 by 2160 pixels) and 12 M (4000 by 3000 pixels). Additionally, the images can be fed into deep learning networks.

Finally, the UAVP100 database Wang et al. (2019) was considered for tracking people (online single-person tracking (OSPT)) by DJI Phantom 4, Inspire 2 and Spark drones with flying heights of 5–30 m and a cameras resolution of 1920 by 1080 pixels. The challenges explored when collecting the database are similar to those for UAV123.

Reference Qi et al. (2019) presented a database based on other database images and their own drone. The aim of the database was to detect and track objects. Several challenges such as parking lots, street views, social parties, traveling were explored in the study.

Virtual AeriaL Image Dataset (VALID) Chen et al. (2020) is a virtual database, which can be considered as images captured from drones. The authors presented a comprehensive ground truth that is suitable for image segmentation on 30 categories in 6 different virtual scenes and 5 various ambient conditions (sunny, dusk, night, snow, and fog).

ERA (Event Recognition in Aerial videos) database was presented in Mou et al. (2020). The database was collected for recognizing events from images recorded by drones on YouTube. Several deep learning methods were tested on the database. The database covered 25 classes related to different events such as traffic congestion, harvesting, ploughing, constructing, police chase, conflict, baseball, basketball, boating.

Reference Mandal et al. (2020) introduced a moving object recognition (MOR) database based on videos recorded by drones. The videos were captured on highways, flyovers, traffic intersections, urban areas, and agricultural regions. The range of image resolutions was from 1280 \(\times\) 720 pixels to 1920 \(\times\) 1080 pixels. In addition, a deep learning method was tested on the database.

EyeTrackUAV2 database Perrin et al. (2020) is useful to explore saliency researches related to drones. The EyeLink 1000 Plus eye-tracking systemFootnote 20 was used to conduct the experiment and create gaze information. Image resolution was considered from 1280 \(\times\) 720 pixels or 720 \(\times\) 480 pixels. Additionally, the database is suitable to test a deep learning approach.

Some image samples from the databases are shown in Fig. 9. Also, the databases presented in the object detection are summarized in Table 5.

Fig. 9
figure 9

Samples of the database images related to object detection methods: a Maria et al. (2016), b Mueller et al. (2016), c Barekatain et al. (2017), d Cehovin Zajc et al. (2017), e Hsieh et al. (2017), f Du et al. (2018), g Vidal et al. (2018), h Carletti et al. (2018), i Zhu et al. (2018), j Zhu et al. (2018), k Xu et al. (2018), l Chen et al. (2018), m Wang et al. (2019), n Qi et al. (2019), o Chen et al. (2020), p Mou et al. (2020), q Mandal et al. (2020), r Perrin et al. (2020), and s a chart of the number of images, frames, and videos in terms of the databases

Table 5 List of databases used in object detection

4.2 Agriculture and forestry

Today, compared with satellite imagery, there is a growing interest in using drones to present effective solutions in autonomous applications, such as inspections of the state of farming. From the viewpoint of farmers, drones can provide a bird’s-eye view over their fields to assess and lead to a precise monitoring system for crop and water statuses and biomass estimation Adão et al. (2017).

Reference Zarco-Tejada et al. (2014) used the database presented by the Institute for Sustainable Agriculture (IAS) of the Spanish Council for Scientific Research (CSIC). The database was obtained by consumer-grade cameras at a resolution of 4000 by 3000 and at a flying height of 200 m for tree height estimation.

In Turner et al. (2014), a collection of ultrahigh-resolution visible, multispectral and thermal images were captured by three sensors. A Canon 550D digital single-lens reflex (DSLR) camera (resolution of 5184 by 3456 pixels), a FLIR Photon 320Footnote 21 uncooled thermal sensor (resolution of 324 by 256 pixels) and a Tetracam mini-MCA sensor with six channels (resolution of 1280 by 1024), respectively, mounted on an oktokopter drone. It has been demonstrated that drones carrying multiple sensors can be considered to accurately map vegetation canopies.

Reference Tripicchio et al. (2015) describes a collection method for drone videos that can be used for analyzing soil characteristics. The videos captured by an Asus Xtion Pro Sensor collected RGB and depth data. In addition, a new approach to classify plow field by the sensor is studied. Finally, two different metric, re-orientation method based on Principal component analysis (PCA) and Delaunay triangulation method, have been developed for this purpose.

In Dandois et al. (2015), images captured by a Canon ELPH 520 HS digital camera on board a hobbyist, commercial multirotor and ArduCopter drone over a temperate deciduous forest in Maryland, USA. The database is suitable for producing 3D multispectral point clouds at different flying heights.

Reference Oppenheim et al. (2017) presented a database for the detection and counting of yellow tomato flowers in a greenhouse. The images captured by a smartphone’s LG-G4 camera and a Canon PowerShot 590IS at resolutions of 5312 by 2988 and 3264 by 1832 mounted on drone with top and front views.

In Kragh et al. (2017), a multimodal database for obstacle detection in agriculture based on a DJI Phantom 4 drone equipped with three sensors. Web, thermal and stereo cameras at resolutions of 1920 by 1080, 640 by 512, and 1024 by 544 pixels, respectively, and at altitudes of 1.5–50 m is studied. The database comprises approximately 2 h of data in a grass-mowing scenario in Denmark.

In Murugan et al. (2017), a multispectral image database for agriculture monitoring in a large farm in Roorkee, Uttarakhand, India, was presented. The drone used was a DJI Phantom at an altitude of 100 m and mounted on it was a high-definition 4K-resolution RGB camera. Additionally, the database can be used for image segmentation based on a multichannel imaging process.

The authors in Escalante et al. (2019), designed and produced a hexacopter drone equipped with six 700-KVA brushless motors and four 40A electronic speed controllers for monitoring barley fields in the state of Nuevo Leon, Mexico. They used a Parrot Sequoia multispectral sensor to capture multispectral images in the five channels of red, green, red-edge, and near-infrared at a resolution of 1.2 Mpx and at a height flying of 24.4 m. Additionally, a deep learning method was applied to the database.

To recognize the bayberry tree, a database of the images was collected by drone in Wang and Luo (2019). The database can be useful to extract the position and crown information of the tree and to estimate yield. The drone used in the study was a DJI Phantom 4 to collect to take the aerial photography in Dayangshan Forest Park, Yongjia county, Zhejiang province with a resolution of 5472 \(\times\) 3648 pixels from January 23 to 24, 2019. A deep learning method based on Mask RCNN (Mask Region Convolutional Neural Networks) was tested on the database.

Some image samples from the databases are shown in Fig. 10. Also, the databases presented in agriculture and forest methods are summarized in Table 6.

Fig. 10
figure 10

Samples of the database images related to agriculture and forest methods: a Tripicchio et al. (2015), b Escalante et al. (2019), c Oppenheim et al. (2017), d Kragh et al. (2017), e Murugan et al. (2017), f Zarco-Tejada et al. (2014), g Dandois et al. (2015), h Turner et al. (2014), and i Wang and Luo (2019)

Table 6 List of the databases used in agriculture and forestry

4.3 Animal detection

One of the other applications based on drones that has recently been increasing is monitoring animals in large areas with the aim of detecting, counting, and tracking. In the following section, we explore drone-based animal detection databases.

A conservation animal database was collected in van Gemert (2014) for the localization and counting animals such as rhinos or elephants. Specially, the database is suitable for animal detection and counting. An Ascending Technologies Pelican quadcopter drone with a GoPro Hero 3 (Black Edition) action camera (resolution of 1920 by 1080 pixels) recorded the videos. In addition, a object recognition method based on the three light-weights was evaluated in terms of the database.

Reference Chamoso et al. (2014) presented a database to detect cattle in an area with a very large number of animals captured by a multirotor drone equipped with a GoPro Hero 5 in full HD auxiliary camera with a resolution of 1080 pixels per inch. The database was evaluated by a CNN architecture for animal detection and counting. Therefore, the database can be used for applying deep learning methods.

A monitoring wildlife database (koala tracking and detection above the canopy) Gonzalez et al. (2016) was created by S800 EVO HexacopterFootnote 22 drone over the Sunshine Coast, 57 km north of Brisbane, Queensland, Australia. RGB and thermal images and videos of were obtained by a Mobius RGB camera (resolution 1080) and FLIR thermal camera (resolution o 640 by 510), respectively.

A process of data augmentation on the database provided in Okafor et al. (2017) was applied to develop the database for use in animal detection and deep learning approaches. The image was taken by a DJI Phantom 3 drone. To obtain promising and accurate results in deep learning approaches, they applied a data augmentation method on the database. It should be noted that data augmentation is an important step in deep learning methods to increase training data. Additionally, several deep learning methods were evaluated in terms of the database.

For detecting and enumerating marine wildlife over breeding colonies in eastern Canada, a database was collected by a senseFly eBee drone equipped with two sensors: an RGB camera (Canon S110 with image resolution of 12 megapixel) and a thermal infrared camera (senseFly LLC, Thermomapper, with an image resolution of 640 by 512) Seymour et al. (2017). Moreover, a counting animal method based on polygon/convex hull proportion and high-pass filter combination was applied on the database.

Reference Kellenberger et al. (2018) introduced a database for detecting animals over the KuzikusFootnote 23 Wildlife Reserve in eastern Namibia. A Canon PowerShot S110 RGB camera, a multispectral and a thermal sensor with a resolution of 3000 by 4000 was mounted on a single wing of a senseFly 3 eBee. The database was suitable for exploring the challenge of monitoring and covering large areas and for applying deep learning methods.

To detect and count sheep over the Pirinoa region of New Zealand, a database was presented in Sarwar et al. (2018) that was evaluated with deep learning methods. The deep learning method was based on Region-based convolutional neural networks (R-CNNs). The results showed that the R-CNNs obtained great promise for sheep detection and counting compared with CNNs. The database captured by a drone with an image resolution of 20,148 by 1080 pixels at an altitude of 80 m.

Since analyzing the population and migration of marine animals such as stingrays and dolphins is important for biologists, Saqib et al. (2018) presented drone videos with a resolution of 3840 by 2160 pixels for stingrays and a resolution of 4096 by 2160 pixels for dolphins over beaches in Queensland, Australia. A deep learning method based on Faster-R-CNNs was tested on the database. The approach obtained better results than CNNs and R-CNNs.

For the counting, assistance, and management of cattle, a DJI Phantom 4 drone with a flight time of 28 min and image resolution of 4000 by 3000 pixels flew over Kumamoto, Japan Shao et al. (2019). The database captured four sets of normal, truncated, blurred, and occluded images in different weather conditions and areas. In addition, a CNN method was applied on the database. To achieve more accurate results a three-dimensional model was considered on the images. Therefore, the database is suitable to test deep learning methods.

A database was introduced in Sykora-Bodie et al. (2017) and developed in Gray et al. (2019)Footnote 24,Footnote 25 for sea turtle detection during a mass nesting event on the coast of Ostional, Costa Rica. The database was obtained by flying a Canon PowerShot S110 near-infrared (NIR) camera at a flying height of 90 m. Moreover, to increase the quality of the images, a post-processing step based on a threshold function was applied on the database.

Reference Rahnemoonfar et al. (2019) presented a collection of images captured by a fixed-wing drone from the Measurement Analytics Lab (MANTIS) at Texas A&M University-Corpus Christi under a blanket Certificate of Authorization (COA) and equipped with Canon IXUS 127 HS 16.1 MP RGB and resolution of 3456 by 4608 pixels over the Welder Wildlife Foundation in Sinton. The database aimed to cover animal detection and counting and was evaluated by a deep learning method.

A DJI Phantom 4 Pro drone with 20-MPixel (resolution of 4864 \(\times\) 3648 pixels) the camera was used to collect the Cattle database introduced in Barbedo et al. (2019). The place of collecting the database was at the Canchim farm, São Carlos, Brazil at 11 dates over the year of 2018. One of the study aims was to determine the ideal ground sample distance (GSD). In addition, a deep learning method was applied to the database.

The first method applied to the database presented in Xu et al. (2020) was Mask RCNN to detect cattle and sheep. The places of collecting the database are the Tullimba Research Feedlot (AEC18-038) owned by the University of New England, New South Wales, Australia and surrounding farmlands (AEC19-009) across seasons from Summer to Spring (February to October). The images were captured by an integrated PTZ camera which was mounted on a MAVIC PRO drone. The resolution of the images are 4096 \(\times\) 2160 pixels. Additionally, a preprocessing on the images provided them to use in deep learning methods.

Some image samples from the databases are shown in Fig. 11. Also, the databases presented in animal detection are summarized in Table 7.

Fig. 11
figure 11

Samples of the database images related to animal detection methods: a van Gemert (2014), b Chamoso et al. (2014), c Gonzalez et al. (2016), d Okafor et al. (2017), e Seymour et al. (2017), f Kellenberger et al. (2018), g Sarwar et al. (2018), h Saqib et al. (2018), i Shao et al. (2019), j Gray et al. (2019), k Rahnemoonfar et al. (2019), l Barbedo et al. (2019), m Xu et al. (2020), and n a chart of the number of images (there are not information for the number of images for references Gonzalez et al. (2016) and Seymour et al. (2017)), frames, and videos in terms of the databases

Table 7 List of the databases used in animal detection

4.4 Disaster detection

Imagery based on drones has been opening up a growing, interesting, important role in disaster analysis due to its ability in real-time tasks, its high spatial resolution images, its oblique imagery, etc. These results lead to effective results in detecting cracks and damage and to help transportation planners make the right decisions. In the following section, we introduce drone-based databases related to disaster analysis.

The drone used in Jeon et al. (2013) was equipped with several sensors such as a mirror-less camera, a GPS, an IMU, and a sensor integration and synchronization module. The authors designed and produced a micro drone Sony NEX-55 camera with a resolution of 4912 by 3264 pixels that captured images at an altitude of 100m. The database can be used for disaster detection and monitoring.

The purpose of Ofli et al. (2016) was to provide a database for disaster response and wildlife protection and anti-poaching efforts. The SAVMAP project research was a collaboration of Drone AdventuresFootnote 26 and the EPFL Cooperation & Development CenterFootnote 27. The authors also presented a solution based on machine learning approaches. Features extracted were based on the histogram of oriented gradient (HOG). Several machine learning methods such as SVM and logistic regression were selected accordingly.

The UAV Mosaicking and Change Detection (UMCD) database Avola et al. (2017) was used to support five tasks: object detection, people search and rescue, people and vehicle classification, military camp monitoring as well as urban area monitoring. The tasks are suitable for mosaicking and changing detection methods at low altitude. The database includes two sets of 30 and 20 videos, respectively.

In Kakooei and Baleghi (2017), oblique images were collected for disaster assessment including the Haitian earthquake of 2010Footnote 28, Hurricane Irene of 2011Footnote 29, Hurricane Sandy of 2012Footnote 30, the Illinois tornadoes of 2015Footnote 31, and abc7chicagoFootnote 32. The database is suitable for earthquakes and hurricanes assessment. Moreover, a segmentation algorithm was considered for estimating facade and building damage in the areas.

In Bejiga et al. (2017), two databases were introduced for search and rescue (SAR) operations by drone images. The first database was a collection of different videos of a ski area from the website with a resolution of 1280 by 720 pixels, and the second database was captured by a CyberFed “Pinocchio” hexacopter equipped with a GoPro camera over a mountain close to the city of Trento at a flying height of 2–4 m for low flights and 20–40 m for high flights. Additionally, the database is appropriate for applying deep learning methods.

The database used in Attari et al. (2017) was provided by the World Bank in collaboration with the Humanitarian UAV Network (UAViators) during Cyclone Pam in Vanuatu in 2015. The database was targeted for monitoring damage and object detection in affected environments. In addition, a deep learning method (Nazr-CNNFootnote 33) was proposed for this goal. Therefore, the database can be used to compare deep learning methods.

L’Aquila database Duarte et al. (2017) was collected from the damage left by the earthquake in L’Aquila, Italy, in 2009. The database was obtained by flying an Aibot X6 hexacopter equipped with a Sony ILCE-6000 camera at an altitude of 100m. The database is appropriate for damage detection. A segmentation method based on CNNs was tested on the database. Additionally, they presented a solution using a sparse point cloud.

In Li et al. (2018) a database of 5 different scenes (urban, suburban, rural, wilderness and green land) from an airborne drone was collected and can be used for scene recognition and damage detection. Additionally, the authors used superpixel-based features for this purpose. The features were used to segment and detect the damages. In addition, an SVM classifier was consider for classifying the scenes.

Reference Xu et al. (2018) introduced three databases from drone earthquake images over three locations of Mirabello, Italy, 2012; Lushan County in Sichuan Province, China, 2013; and Hanwang County in Sichuan Province, China, 2008. The databases were captured by multirotors, rotors, and fix-wing drones. To segment the damaged areas, feature extraction and classification methods were based on geometrical features and K-k-nearest neighbors (KNN), respectively. Additionally, the database images were generated in a 3D point cloud format.

Fig. 12
figure 12

Samples of the database images related to disaster methods: a Jeon et al. (2013), b Ofli et al. (2016), cAvola et al. (2017), d Kakooei and Baleghi (2017), eBejiga et al. (2017), f Attari et al. (2017), gDuarte et al. (2017), h Li et al. (2018), iXu et al. (2018), j Kamilaris and Prenafeta-Boldú (2018), kLi et al. (2019), and l a chart of the number of images, frames, and videos in terms of the databases (there are not information for the number of images for references Kakooei and Baleghi (2017) and Li et al. (2019))

Table 8 List of the databases used in disaster detection

Reference Kamilaris and Prenafeta-Boldú (2018) presented a small database for disaster detection and monitoring that was captured by a drone. The database is based on images of fires, earthquakes, collapsed buildings, tsunami and flooding, as well as “non-disaster” related scenes. Deep learning methods were evaluated in terms of the database.

Finally, reference Li et al. (2019) introduced a damaged building assessment database based on images from Hurricane Sandy in 2012 and Hurricane Irma at a resolution of 1920 by 1080 pixels which were collected by Drexel University. The database has classes with labels of undamaged buildings, damaged buildings, and ruins. Additionally, the deep learning method can be applied to the database.

Some image samples from the databases are shown in Fig. 12. Also, the databases presented in disaster detection are summarized in Table 8.

4.5 Face recognition

Since drone videos are more often captured from the top view, face and action recognition are challenging problems that should be solved when inspection and security are important for such videos. In the following, we introduce drone-based databases related to face recognition problems.

A very challenging database for human identity recognition was presented in Oreifej et al. (2010). The database is appropriate for detecting, segmenting, aligning, and recognition of humans viewed from aerial cameras with low resolution and adverse conditions. The images were captured by a drone and tested by the weighted region matching (WRM) method as the feature extraction and SVM as classification steps.

In Davis et al. (2013), a database was created to support low-cost facial detection and recognition tasks using an AR.Drone 1.0 that captured images with a resolution of 640 by 480 pixels. Feature extraction method applied on the database was based on local binary pattern (LBP) and the classifier used to train the features was KNN.

The mobile reidentification platforms (MRPs) database Layne et al. (2014) is a collection of images captured at a resolution of 640 by 360 pixels by a quadcopter drone. The database was the first platform for mobile reidentification to be used for face recognition. In addition, the authors used several feature extraction and classifier methods to evaluate the database.

DroneFace Hsu and Chen (2017) is a database that simulated a drone at an altitude of 1–5 m by a GoPro Hero3 camera. The aim of the database is a face recognition task with frontal and side portrait images. Additionally, they evaluated the database with several methods such as wavelet transform and LBP features and an SVM classifier.

The IARPA Janus Surveillance (IJB–S) database Kalka et al. (2018) was presented for face recognition. A small fixed-wing drone collected images captured by a Panasonic WV-SW3955 and Speco O4P30X6 dome cameras with resolutions of 1280 by 960 and 2592 by 1520 pixels, respectively.

The DroneSURF database for exploring the challenges of motion, variations in poses, illumination, and background in face recognition was introduced in Kalra et al. (2019). The images were captured by DJI Phantom 4 in a variety of altitudes and regulations for active and passive scenarios.

A new database (Drone-Action) was presented in Perera et al. (2019) for action recognition based on person images captured by a drone equipped with a GoPro Hero 4 Black camera. The images have HD (1920 \(\times\) 1080 pixels) format. The actions were classified into three categories of the following, side-view, and front-view actions. Deep learning methods were tested on the database that showed the database is suitable for such methods.

PRAI-1581 database Zhang et al. (2020) was introduced to re-identify persons based on images captured by two DJI consumer drones. The images of the database were collected with flying heights ranging from 20 to 60 m. Several state-of-the-art methods such as deep learning methods were tested on the database. Therefore, the database is suitable to test deep learning methods.

Reference Grigorev et al. (2020) presented a database for person re-identification purposes. A remote-operated quadrocopter mounted by HD camera collected images with a resolution of 1920 \(\times\) 1080 pixels at a height of 25 m. The ground truth of the database was included with 18 attributes such as gender (male and female) and type of lower-body clothing (pants and overcoat). Additionally, a deep learning method was applied to the database.

Some image samples from the databases are shown in Fig. 13. Also, the databases presented in face recognition are summarized in Table 9.

Table 9 List of the databases used in face recognition
Fig. 13
figure 13

Samples of the database images related to face recognition methods: a Oreifej et al. (2010), b Davis et al. (2013), c Layne et al. (2014), d Hsu and Chen (2017), e Kalka et al. (2018), f Kalra et al. (2019), gPerera et al. (2019), and h Zhang et al. (2020) (there are not image sample for reference Grigorev et al. (2020))

5 Open research

As the FAA predicts, the number of drones will exceed 4 million units Boroujerdian et al. (2018); therefore, the design and implementation of an accurate system for different applications will play an important role. More research needs to be done in the community, and this research will not happen unless researchers provide more databases for different purposes. Additionally, the domain of applications will increase, and new applications and problems will be introduced without introducing databases for others to use.

One of the new applications is cinematography (for movies and sports) by drones. Although the application is currently manually operated, autonomous approaches based on machine learning and computer vision are being developed. However, several challenges exists, such as tracking fast and unpredictably moving targets. For handling some of these challenges, researchers can use videos from certain websitesFootnote 34 Huang et al. (2019).

Another application in the area is archeology, which can use computer vision to document archeological sites, including 3D maps, orthophotos, and thermal images Xiang et al. (2018). To the best of our knowledge, there is no database for this application.

Recently, indoor approaches for drones such as in Kaufmann et al. (2018) have been introduced, and public databases need to be introduced to facilitate more research. It should be noted that methods used outdoors can also be used in indoor approaches.

One of the new applications related to surveillance and traffic is paying attention to and tracking pedestrian movement for future cities, especially detecting pedestrians, vehicles, and cyclists at traffic intersections for determining transit times Zhu et al. (2019).

Growing websites for datacenters and repositories, similar to DronestagramFootnote 35 are required for researchers to share their achievements in the field. As mentioned in Hochmair and Zielstra (2015), the Dronestagram project provides a space for sharing photos captured by drones. Information such as drone models, camera models and the upload dates are shown on the website. The first photo was uploaded in July 2013. Reference Johnson et al. (2017) also introduced some other hosting servicesFootnote 36,Footnote 37,Footnote 38 and provided a websiteFootnote 39 for consulting companies or volunteer groups that do not have any space to share their data (especially images and videos).

By exploring the tables presented in this survey, researchers can decide to define new projects and present new databases for different applications. For example, providing the databases related to disaster detection, such as fire detection, can be very useful in issues related to assistance and rescue. As shown in Table 3 for the remote sensing and navigation databases, 8 databases have not yet been used, and researchers can use the database for these topics.

6 Conclusion

Today, drones play an important role in automating processes that are too hard for humans. In this paper, we have surveyed applications related to drones and computer vision methods. We categorized images and videos captured by drones into three groups: remote sensing (camera calibration, image matching and aerial triangulation), navigation (flight control, visual localization and mapping, and target tracking and obstacle detection), and applications related to the sensed environment (surveillance, agriculture and forest, animal detection, disaster detection, face recognition). In this paper, we focused on databases for the three categories. Finally, we presented open research based on information obtained in the survey. As mentioned in the open research section, researchers in the field still need to present databases for existing applications and develop databases for new applications. Additionally, because the number of drones based on new hardware is growing exponentially and the rapid advancement of drones is unstoppable, the increasing power and accuracy of software with embedded computer vision methods is an essential reality. This aim seems as if it will not be achieved unless databases are presented in various applications for applying and testing the methods proposed by other researchers.