1 Introduction

Artificial intelligence is based on data, which is the new defining element of science. In the past decade, machine learning techniques have evolved to the point where they are now capable of processing larger data sets than humans could ever imagine or possess. Particularly in the field of computer vision, large-scale data sets improve machine learning performance so dramatically that deep neural networks are able to perform as well as humans on especially high-quality images [31]. The amount of labelled visual data available for various computer vision tasks (such as image classification, segmentation, detection, tracking, etc.) has reached billions of high-quality images [125] available worldwide for use by researchers and engineers.

Visual data that is publicly accessible comes in a variety of formats. Although the available data is overwhelmingly composed of the visible band, or in other words, “RGB” images; public access to images of other modalities, such as multi/hyperspectral, magnetic resonance (MR), computerised tomography (CT), synthetic aperture radar (SAR), to name a few, is also possible. One relatively less public imaging modality is the infrared (IR) imagery, which corresponds to images constructed with the radiation of an invisible portion of the electromagnetic spectrum, known as the infrared band.

All kinds of objects emit infrared radiation [46]. With its low radiation absorption, high contrast, and capacity for hot target detection, the IR band is popular and practical for use in civil and military applications [84]. IR imaging is used in many applications, such as object detection, object segmentation, classification, motion detection, etc. However, in contrast to visible band imagery, IR images are difficult to access for several reasons. To begin with, the technology of most IR imaging systems is relatively expensive for use in consumer electronics. Besides, since most IR vision applications are utilised for military or medical applications, they are inaccessible due to either security reasons or intellectual property rights. As a result, the publicly available infrared image and video sets are limited compared to high-scale labelled visible band image and video sets.

The primary purpose of this article is to compile a list of publicly available infrared image and video sets for artificial intelligence and computer vision researchers. We mainly focusFootnote 1 on IR image and video sets which are collected and labelled for computer vision applications such as object detection, object segmentation, classification, and motion detection. We categorize 109 different publicly available or private sets according to their sensor types, image resolution, and scale. We describe each and every set in detail regarding their collection purpose, operation environment, optical system properties, and area of application.

The number of survey studies on IR vision algorithms and IR vision technologies is increasing [48, 49, 83, 107]. However, to the best of our knowledge, no published survey studies that review IR image or video sets exist. Our aim is to compile a collection of sets so that researchers in the fields of computer vision and deep learning can identify a visual corpus with necessary properties and compare it with other sets already available. As a result, we believe that the survey can contribute to new algorithms in deep learning and vision research using the spectra beyond the visible spectrum. By scanning public academic sources, we compile this list of image and video sets collected using IR imaging equipment. What is more, for the reader to completely evaluate the different properties of IR image and video sets, we also provide a background on the fundamentals of infrared imagery, including topics such as principles of infrared radiation, infrared sensors, infrared optics, and application fields of IR imagery.

The remainder of this paper is organised as follows: Section 2 covers a general overview of IR radiation, IR detectors, IR optics and related applications. Section 3 starts with an analysis of the statistical significance of the entire corpora and follows by providing the compiled sets as a list with brief descriptions. Finally, Section 4 procures conclusions and sets future directions for the paper.

2 Fundamentals of infrared imagery

2.1 Infrared radiation

The discovery of IR radiation dates back to an experiment by Frederick William Herschel more than 200 years ago using prisms and basic temperature sensors to measure the wavelength distribution of the stellar spectra [23]. However, its widespread use is relatively new, starting by the early 20th century with the understanding of Plank’s law and blackbody radiation, and also with the help of modern physics and quantum theory [20, 57]. Today it is almost common knowledge that according to specific known laws of physics, objects emit unique radiation in a broad region of wavelengths called the electromagnetic spectrum (ES). The IR region of this spectrum corresponds to wavelengths from the nominal red edge of the visible spectrum around 700 nanometers to 1 millimetre. IR wavelengths in this region are conventionally categorised into five spectral sub-bands. The wavelength region of 0.7μ m to 1.4 μ m is called the near-infrared (NIR), 1.4μ m to 5μ m: the short-wave infrared (SWIR), 3μ m to 8μ m: the mid-wave infrared (MWIR), 8 μ m to 15 μ m the long-wave infrared (LWIR), and finally 15μ m to 1000μ m the far-infrared (FIR) (see Table 1).

Table 1 The IR spectrum

The conventional categorization of IR sub-bands defined in Table 1 is correlated with how IR radiation is absorbed, reflected or transmitted by the atmosphere. The region of the IR spectrum, where there is relatively little absorption of terrestrial thermal radiation by atmospheric gases, is called the IR atmospheric window, which is roughly between 1 to 15 μ m. The absorption of IR radiation depends on various atmospheric conditions such as altitude, latitude, solar Zenith angle, water vapour, etc. In Fig. 1, a synthetically created spectrum of atmospheric transmission between 0.7-30 μ m, using the ATRAN moduleFootnote 2 [69] is depicted. For instance, as seen in Fig. 1, atmospheric transmittance of the NIR spectrum band is relatively high, which makes this sub-band an effective spectrum for active (i.e. a radiation source illuminating the scene) night vision systems.

Fig. 1
figure 1

Synthetic spectrum of atmospheric transmission between 0.7-30 μ m, created with ATRAN module [69]

It is also seen in Fig. 1 that much of the IR spectrum is not suitable for everyday applications because IR radiation is absorbed by water or carbon dioxide in the atmosphere. However, there are a number of wavelength bands with low absorption, which actually create the IR sub-bands known as the short, medium and long-wavelength IR bands, abbreviated as SWIR, MWIR and LWIR respectively.

Visible, NIR or SWIR light (0.35-3 μ m) corresponds to a high atmospheric transmission band and peak solar illumination. This is why most optical systems usually include detectors sensitive to these bands for the best clarity and resolution. However, without moonlight or artificial illumination, SWIR imaging systems are known to provide poor or no imagery of objects below 300K temperatures. SWIR imaging systems predominantly use reflected light. Accordingly, they are comparable to grey-scale visible images in resolution and detail.

The MWIR (also referred to as the ‘MIR’) band also provides partial regions of lossless atmospheric transmission with the added benefit of reduced ambient and background noise. This region is referred to as the “thermal infrared”. The radiation in this sub-band is emitted from the object itself; hence passive imaging is utilised. Two principal factors determine how bright an object appears in the MWIR spectrum: the object’s temperature and its emissivity (E). Emissivity is a physical property of materials that describes how efficiently it radiates the absorbed radiation.

The LWIR band spans roughly between 8-15 μ m, with almost no atmospheric absorption between the 9-12 μ m region. Because LWIR sensors can construct an image of a scene based on passive thermal emissions only and hence require no active illumination, this region is also considered as “thermal infrared”. LWIR band is better than MWIR for imaging through smoke or atmospheric particles (aerosols). Therefore, surveillance applications usually prefer LWIR technology. On the other hand, for very long-range detection (such as 10km or more), MWIR has greater atmospheric transmission than LWIR in most atmospheric conditions.

Although the FIR spectrum is defined between 0.75μ m and 1mm, the atmosphere absorbs almost all IR radiation with wavelengths above 25μ m. Hence, atmospheric FIR spectroscopy can only be effectively utilised for wavelengths in the limited spectrum between 0.75 to 25 μ m. This region is also an atmospheric thermal band, which we can experience in the form of heat waves. For astronomical observation outside of the atmosphere, the entire FIR spectrum is utilised.

For a general overview of the subject and the fundamentals of radiometry, the reader may refer to [81].

2.2 Infrared detectors

One of the fundamental parts of an IR electro-optical system is the detecting sensor. In order to capture the IR signature of a scene, a detector sensitive to IR radiation is needed. IR-sensitive detectors capture the IR radiation emitted by the objects and the scene, and convert it into electrical signals. Objects that have different temperatures and emissivity, emit different levels of radiation so that the camera produces electrical signals that have different amplitudes. These electrical signals are used to produce the IR image.

Detectors are the core of an IR imaging system. Historically IR detectors can be scrutinised in three generations. The first generation consists of single-cell detectors. In order to create an image plane, the infrared beam emitted from a scene reaches a reflective surface (i.e. mirror). As the position of the mirror is deflected by two-dimensional rotary actuators, the focused infrared beam creates a two-dimensional pattern of the target image plane. In contrast, the second-generation systems comprise an array of detectors with an optical mirror system that rotates only on a single axis. Finally, the modern third-generation IR optical systems have two-dimensional array detectors, known as focal plane arrays (FPA), so that the system does not need a mirror system to scan different parts of the scene [10, 46].Third-generation IR detectors are quite similar to modern digital photographing machines in principle.

In order to measure IR detector performance, three principle metrics are utilised: photosensitivity (or responsivity), noise-equivalent-power (NEP), and Detectivity (D*).

Photosensitivity or responsivity is defined as the output signal per Watts of incident energy. The output may vary according to the type of detector, for example, while the output signals in photovoltaic detectors are usually photocurrent (i.e. Amperes), the output signals in photoconductor detectors are obtained as voltage. Photosensitivity is related to the magnitude of the sensor’s response and is expressed as follows;

$$ R=\frac{S}{PA} $$
(1)

where S is signal output, P is incident energy and A is the detector’s active area [46, 115].

The signal-to-noise ratio (SNR) for a given input flux level is an important parameter used to determine IR image sensitivity [18]. NEP is the quantity of incident light when the SNR is 1 and expressed as follows:

$$ NEP=\frac{PA}{S/N\cdot\sqrt{\Delta}} $$
(2)

where N is the noise output and Δ is the noise bandwidth (and S,P and A are the same as in (1)).

Detectivity D* (normalised detectivity) is the photosensitivity per unit active area of a detector and is expressed as follows:

$$ D^{*}=\frac{\sqrt{A}}{NEP} $$
(3)

Technologically, IR detectors are classified into two main groups: thermal detectors and photon (quantum) detectors (see Table 2) [93]. Thermal detectors include thermocouples, thermopiles, pyrometers and bolometers that use infrared energy for detection. They are constructed using metal compounds or semiconductor materials and are low-cost. These detectors operate at room temperature. Their sensitivity is independent of wavelengths. Consequently, they are capable of capturing scenes in all IR sub-bands. However, they suffer from slow response times, low sensitivity, and low resolution.

Table 2 Types of infrared detectors

In contrast to thermal detectors, photon detectors simply count photons of IR radiation. There are different technologies that operationalize these types of sensors such as photoconductors, photodiodes, Schottky Barrier Detectors, and Quantum Well detectors [93]. Compared to thermal sensors, they are more sensitive and operate faster. However, these types of detectors do not operate at room temperature but require a cooling capability. In addition, they are made from materials such as InSb, HgCdTe, and GaAs/AIGaAs whose sensitivity depends on photon absorption and, therefore are more expensive. They also have a limited IR spectrum. Photon detectors are usually utilised when a high-sensitivity response is required at a specific wavelength.

Comparative studies on thermal and photon detectors show that both sensor types have their pros and cons [51, 93, 115]. Photon detectors are favoured at specific wavelengths and lower operating temperatures, whereas thermal detectors are favoured at a very long spectral range [92]. Photon detectors are fundamentally limited by generation-recombination noise arising from photon exchange with a radiating background. Thermal detectors are fundamentally limited by temperature fluctuation noise arising from radiant power exchange with a radiating background [62].

2.2.1 IR detector raw output

The raw pixel output of an IR detector is the irradiance (i.e. the flux of infrared energy per unit area) transformed into quantised n-bit values. These values are within the limits of the so-called “dynamic range”, which is the difference between the largest and smallest signal value the detector can record or reproduce. Hence, the raw pixel values are usually not uniformly distributed within the dynamic range. In practice, a raw IR detector output is usually confined to a very limited range. In Fig. 2, a 16bit IR detector raw output (taken from [13]), its 16-bit raw pixel histogram and the enhanced image are depicted.

Fig. 2
figure 2

(a) A 16-bit raw IR image, (b) its 16-bit raw pixel histogram (x-axis has logarithmic scale), (c) the enhanced image and (d) the false-colour image are depicted. (The picture is taken from The LTIR Dataset [13])

IR electro-optical systems that provide a visual output for human users, enhance the raw detector output using contrast-enhancing histogram shaping methods [101]. These types of systems usually provide 8-bit contrast-enhanced images as output. The aim of such a process is to increase the contrast of the raw IR image for the human observer. As seen in Fig. 2a, the raw image is barely visible to the human eye. Due to the irreversibility of most image enhancement algorithms, the bit range decreases with the price of sacrificing information. This enhancement is usually a default process for visible band cameras. On the other hand, systems that provide intelligent IR image processing algorithms, such as tracking, detection, recognition, etc., utilize the raw output of pixels; since the raw output is representative of the actual irradiance values collected from the scene and has a higher dynamic bit range. The raw output of the electro-optical system usually has the same bit-depth as the IR detector, such as 11-bits or 14-bits. In the following section, when analyzing the various image and video sets, information regarding the raw or enhanced nature of pixel values for a given set is specifically indicated.

Some thermal cameras utilize false colours for their 8-bit contrast-enhanced output. This is usually done for temperature mapping for cameras that are used for temperature measurement. In Fig. 2d, an example of a false-colour contrast-enhanced infrared image is depicted.

2.3 Infrared optics

IR imaging technology was founded in the late 1920s with the understanding of photon emission, and improvements continue even today [20]. IR imaging is based on a fundamental concept in geometrical optics called the ray model. A ray model ignores the diffraction and assumes that light travels in straight lines from a source point. Each location in the scene can be assumed as a source point, and the source points emit different levels of radiation that create the IR scene.[18].

In geometrical optics an image is constructed via an optical material, by focusing the rays collected from the scene onto an image plane. Hence, the optical material used in an infrared system needs to be transparent (i.e. with transmittance closer to 1.0) at the wavelength the detector is sensitive to. The percentage of incident light that passes through a material for a given wavelength of radiation is defined as electromagnetic transmission, also known as transmittance.

When choosing the correct optical material for an IR imaging system, there are three main points to consider. The first is the thermal properties of the material. Optical materials are typically placed in environments with varying temperatures, and as a result, they can generate a significant amount of heat. To ensure that the user receives the desired performance, the coefficient of thermal expansion (CTE) of the material should be evaluated. Secondly, as mentioned above, sufficient transmittance of the material for the given wavelength is a must. In Fig. 3, the transmittance of different materials in IR sub-bands is depicted. For example, if the system is intended to operate in the LWIR band, germanium (Ge) optics with a thickness of 1mm are preferable to sapphire optics with the same thickness.

Fig. 3
figure 3

Transmittance of different materials in IR sub-bands

Another factor in choosing a suitable optical material is the refractive index, which is the measure of how fast radiation travels through a material. IR refractive index varies among materials, allowing more flexibility in system design. As a solution, anti-reflection coatings are applied to materials used for IR optics, which also limits them to a desired band within the IR spectrum.

For more information on the subject, the reader may refer to [28].

2.4 IR electro-optical system properties

There are some important parameters used in selecting appropriate equipment and characterising the performance of IR systems. The parameters that measure the performance of an IR electro-optical system depend on its ability to detect IR radiation and resolve the temperature differences in the scene. The contrast in an IR image occurs due to variations in temperature and emissivity. The parameters that may affect the performance of an IR electro-optical system, in general, include spectral range, normalised detectivity, temperature range, absolute accuracy, repeatability, frame rate, spatial resolution and thermal sensitivity [113]. Below these parameters are briefly explained:

  • Spectral range: refers to the wavelength range in which the IR system will operate.

  • Normalised detectivity (D*): as defined in (3), is one of the widely used parameters to compare the performance of IR detectors.

  • Temperature range: or the operating temperature, is the minimum and maximum temperatures that can be measured by the IR electro-optical system. It has a unit of K, C, or F.

  • Absolute Accuracy: is a measure of how accurately the system detects the actual temperature and is denoted by temperature units. Related to this measure, Repeatability is defined as the consistency of the system accuracy.

  • Frame rate: is the number of frames displayed per second. For monitoring moving objects, higher frame rate cameras are mostly preferred [10]. It has a unit of Hz.

  • Spatial resolution: also referred to as the “instantaneous field-of-view” (IFOV), is the imaging system’s ability to differentiate the details of objects within a single pixel-sized FOV. It is a measure of solid angle, hence represented by steradians. As spatial resolution increases, so does the image qualityç [10].

  • Thermal sensitivity: is the smallest temperature change detected by the IR imaging system. There are three most common parameters used as a measure of thermal sensitivity, namely “Noise Equivalent Temperature Difference” (NEDT), “Minimum Resolvable Temperature Difference” (MRDT) and “Minimum Detectable Temperature Difference” (MDTD) [113]. It has a unit of temperature (i.e. K, C, or F).

In order to choose the right camera for the right application, all of the aforementioned parameters should be taken into account. There are numerous commercial IR electro-optical systems available in the market. In Table 3, we provide a selection of six different near-infrared electro-optical systems, with their comparative parameters, so as to give the reader a sense of the systems engineering perspective of IR electro-optical system selection.

Table 3 A selection of commercial NIR electro-optical systems and their properties

2.5 Applications of IR electro-optical systems

The development in IR sensing technologies has resulted in countless applications, which we divided into four major categories: military & surveillance, industrial, medical, and scientific. Each category title is briefly explained below. The IR image and video sets provided in the next section are categorised according to these application titles.

  • Military & Surveillance Applications: The military and surveillance field, which also encapsulates law enforcement and rescue applications, cover a wide variety of applications utilised in all IR sub-bands. Warfare applications include target tracking/detection/acquisition in various platforms such as missile seeker heads, forward-looking infrared (FLIR) systems, infrared search and track (IRST) systems, and directional countermeasure (DIRCM) systems. Regarding law enforcement and rescue applications, night vision systems, reconnaissance and surveillance, fire fighting and rescue in smoke, identification of earthquake victims’ locations, forest fire detection, and radiation thermometer are prime examples.

  • Industrial Applications: Industrial applications of IR imaging systems include the utilization of IR sensing technology in various industrial fields, such as infrared heating in process control, nondestructive inspection of thermal insulators, hidden piping location detection, diseased tree and crop detection, hot spot detection, brake lining, industrial temperature measurement, clear-air turbulence detection, pipeline leak and petrol spill detection, just to name a few.

  • Medical Applications: In medicine, IR technology is fundamentally used for diagnosis, such as early cancer detection, determining the optimum site for amputation, determining the location of the placenta, detecting strokes and vein blockages before they occur, monitoring wound healing, and detecting infection. Due to its non-invasive nature, IR technology in medicine provides information about conditions that are directly or indirectly related to the focused region of the body (such as hands [102]), as well as facilitating the assessment of treatment.

  • Scientific Applications: In nearly every scientific field, from remote sensing and meteorology to material science and microbiology, from engineering to biology, IR imaging technologies are used. In this paper, when categorizing a set as ”scientific”, we took into account its use outside of the other categorised sectors, namely military, surveillance, industrial, and medical. An image or video set, for example, is classified as both Military & Surveillance and Scientific if it has the capacity to support both types of applications.

3 IR image & video sets

The paper analyses 109 IR image and/or video sets and provides a list of the sets in Table 4. A total of 77 are public, in other words, they offer public download links, while 3 are private and require payment. The remaining 29 sets can be downloaded for free but require manual registration by contacting the institution that owns them. The entire corpus of sets includes nearly 20 million still images and video frames. In the following, we provide the statistical details of the compiled list of IR image and video sets in terms of application fields, included object categories, resolution, annotation, and preprocessing, before presenting the list with brief descriptions.

Table 4 From left to right, the columns depict the name and the reference, a sample image, the included object classes (if any), total number of frames, pixel resolution of images, the optical system (if specified), pixel bit depth (and if any histogram equalization - HE applied), a brief description and the application fields of the given dataset, respectively

Table 4 provides a sample image and a brief description for every set. There are also separate columns for technical details, such as types of annotated classes, number of total frames, image resolutions, sensor types, image bit depths, and application fields. The description section additionally specifies whether the collection is accessible to everyone (pub), accessible to paid users only (pri), or needs registration (rr). For more details, we suggest that the reader consult the References section for an online link to the image set.

3.1 Application fields

As mentioned in the previous section, application fields for IR image and video sets are scrutinised in 4 main titles: Military & Surveillance (Mil. & Sur.), Industrial, Medical and Scientific. As seen in Fig. 4a, Military & Surveillance comprises 65.2% of the total volume of images and video frames, clearly demonstrating the importance of IR imaging in this industry. Sets collected for scientific applications cover 25.9% of the corpus, while medical applications cover 8.6%, most likely due to the legal challenges involved in collecting or publishing health informatics data. Industrial applications account for a marginal share, which is probably due to the fact that they do not publish their data in public domain. In Table 4, the application fields for every individual set are indicated in the right-most column (titled “App.”).

Fig. 4
figure 4

Distributions of various attributes (application field, resolution, ground truth labelling, object category and image enhancement) in the entire collected image/frame corpus are depicted in pie charts

3.2 Resolution and sensor

IR image and video sets listed in this survey range from lower-definition (LD--), which corresponds to resolutions lower than LD, to ultra-higher-definition (HD++), which corresponds to resolutions higher than UHD. Depending on the application, the resolution plays a significant role. Most surveillance systems require HD or better resolutions for accuracy. On the other hand, LD and standard definition (SD) systems are ideal when the computational capabilities of the system are limited. Figure 4b shows that despite three-quarters of the corpora being SD++ or worse, the rest are almost UHD or better. It is important to note that sets with UHD or better resolutions are recent sets showing a clear future trend. In Table 4, the actual resolutions for every individual set are indicated in column four (titled “res”).

In addition, the optical equipment used to collect each set is provided in column five (titled “Sensor”). When compared to RGB cameras, IR optical systems are capable of different kinds of calibration, and they are capable of producing characteristic output that may not be replicated with similar equipment. The reason for this is that today’s RGB cameras usually use the same preprocessing and aim at producing almost the same output, whereas, with IR vision, it becomes important to know the parameters of the equipment in order to recreate similar scenes or images. Therefore, in Table 4 column titled “Sensor”, we provide details regarding the collection equipment for the sets, which openly specify these details.

3.3 Annotations and object categories

Many computer vision applications annotate data with labels for certain purposes, such as detection, tracking, and recognition. Data annotation/labelling is an expensive effort, which provides means for supervised learning, and hence deep learning if the annotated data are sufficiently large in scale. Similarly, some of the IR sets listed in this survey are annotated with various labels. As shown in Fig. 4c, about 33.5% of the entire corpus is annotated. For some sets, these annotations are black-box locations for objects, whereas for others they are global labels for entire images. A majority of the corpora are not labelled, but we believe that most annotations may not be shared publicly due to their commercial implications. Once again, it is important to note that sets with labels are recent sets showing another future trend.

In Table 4, (in column three, titled “Classes”), categories for any existing annotation of a given set are provided. The entire collection of sets includes a wide range of object annotation categories. The objects are categorised under seven titles in Fig. 4d, namely biometrics, environments, humans, vehicles, animals, unknown and uncategorised. Biometrics annotations include IR images of faces, irises, ears and/or fingerprints, and cover the majority of the annotations with a 52.1% share. Human annotations, including pedestrians, runners, sportsmen, etc cover 37.7% of these annotations. Vehicles of different sorts such as cars, bicycles, motorcycles, aircraft, boats, etc, are also included and cover 8.6% of label annotations. There are a small number of animal class annotations that take 1.4%. There is also a marginal share of annotations that are related to environmental objects, including terrain, roads, clouds, or various objects like food, or uncategorised application-specific labels.

3.4 Image enhancement

As mentioned previously in Section 2.2.1, IR electro-optical systems that provide a visual output for human users, usually enhance the raw detector output using contrast-enhancing histogram shaping methods. However, IR image processing systems that utilize algorithms such as tracking, detection, recognition, etc., utilize the raw output of pixels, which usually has the same bit-depth of the IR detector. The histogram-enhanced image is, in most cases, the only accessible output of an IR optical system. For such systems, the details of the enhancement algorithms are rarely provided to the user. Most systems apply different algorithms that suit their design requirements such as level of contrast, and real-time operation, just to name a few. As seen in Fig. 4e, only a minority of 5.3% of the entire corpora of collected frames are raw detector outputs. The “bit” column in Table 4 gives information about the bit depth of an image/frame for a given set. The number (8, 11, 16, etc.) corresponds to image bit-depth. For some sets, the bit depth is indicated by “8*” showing that the images/frames are of 24bit RGB (i.e 8bit per channel) format. The abbreviation “HE” is to indicate the existence of any histogram enhancement process, whereas “RAW” suggests the accessibility of the raw detector output. The type of enhancement technique is not indicated in the table, because this information is not available for most of the collection equipment.

4 Conclusions and future directions

In this survey, we compile a list of publicly available IR image and video sets for artificial intelligence and computer vision researchers. We mainly focus on IR image and video sets, which are collected and labelled for computer vision applications such as object detection, object segmentation, classification, and motion detection. We categorize 109 publicly available or private sets according to their sensor types, image resolution, and scale. The list includes brief descriptions for each set. The statistical details of the entire corpus of IR image & video sets are provided in terms of applications fields, including object categories, resolution, annotations, sensor types and preprocessing details.

We believe that this survey, with solid introductory references to the fundamentals of IR imagery, will be a guideline for computer vision and artificial intelligence researchers who want to delve into working with the spectra beyond the visible domain. Today, consumer electronics are integrating IR cameras with smartphones, making IR imaging a reality within the consumer market. Within a short time, the IR domain will host a large number of pre-trained deep learning models. Therefore, this collection can be used to research deep learning models for vision problems like IR domain adaptation, multi-modal vision, and fusion in the future. Such an approach may result in IR subband-specific deep feature extractors, which can be used for a variety of vision tasks. These models would need very large-scale sets. A crucial practice in the future would be the ongoing updating of this survey, especially in light of the possibility that annotated IR sets may soon be made available in vast quantities.