Keywords

1 Introduction

The transportation industry is constantly evolving to improve the driver and passenger experience and road safety. Modern vehicles are equipped with innovative features such as route guidance systems (e.g., automatic emergency braking, adaptive cruise control, lane departure warning, speed assistance monitoring, etc.), fatigue detection [1], voice control, etc. Along with population growth, these enhancements have increased the demand for vehicles, resulting in road congestion and higher accident rates. According to the World Health Organization [2], about 1.3 million people die each year because of road accidents caused by human errors, including speeding, disobeying traffic laws, driving under alcohol or other substances, or distracted driving.

To lower the high rate of accidents, scientists needed to find a method to reduce the risk of human errors while driving; for this reason, multiple sensors can be deployed inside and outside the vehicle to add an extra layer of safety. Some essential sensors are placed on the engine to inform the driver of the state of brakes, engine temperate, engine oil level, etc. Nowadays, there are new 2D and 3D sensing technologies deployed inside the car that enable the drivers to adjust different equipments of the vehicle without looking away from the road using simple hand gestures. For example, a touch screen that is used to adjust the air conditioner and the side windows, a camera installed on the steering wheel that detects specific gestures of the driver in order to answer phone calls or skip songs. In addition, a camera attached to the reverse mirror monitors the driver’s state and sees any signs of tiredness on his face. Concerning the sensors used outside the car, in 2000, Nissan’s Infiniti luxury division added cameras in the rear of the vehicles that broadcast live video to the drivers to help them view the area behind the car while reversing [3]. In 2003, Toyota launched the Intelligent Parking Assist option in cars by installing ICSFootnote 1 sensors on the front and rear bumpers of the vehicle to parallel driverless parking [4]. In 2014, Elon Musk was the first to introduce the self-driving car with the Model S, using many sensors all around the car, from short-range to long-range cameras to Radars and ultrasonic sensors. In 2020, Waymo manufactured a self-driving taxi for public use and it implemented the LiDAR Technology as primary sensing method [5]. These new services are allowed thanks to embedded sensors such as GPSFootnote 2 sensors, ultrasonic sensors, cameras, motion sensors, etc.

This work focuses on a new type of sensing technology used for object detection methods known as LiDAR (LiDAR: Light Detection and Ranging). It uses laser beams to generate three-dimensional maps of the vehicle’s surrounding environment. It has been adopted as a reliable sensing technology by multiple companies like Waymo, Tesla, and Cruise. LiDAR sensing technology is becoming a more and more stable and reliable method for mainly autonomous vehicles. Many surveys have introduced LiDAR technology [6,7,8]. Other surveys like [6] and [7] focused on the sensing mechanism of the LiDAR sensor and the way it functions, while [8] gave a detailed taxonomy to the 3D object detection methods that are based on LiDAR sensing data. In this survey, we will focus on the use of LiDAR technology, mainly in the vehicular field.

In this paper, we presented a survey of a LiDAR sensing technology. We included a general introduction to the LiDAR sensing system and its different fields of application while focusing on the vehicular field. In addition, we gave a brief comparison between this technology and other sensing technologies and presented its main advantages and disadvantages. Finally, we presented the different step of object detection process using this LiDAR technology and its different methods of application.

The rest of the paper is organized as follows. In Sect. 2, we briefly introduce the LiDAR technology, explaining its operating mode and range of applications. Section 3 focuses on LiDAR technology in the vehicular field. In Sect. 4, we conclude our paper.

2 LiDAR Technology

In the context of smart mobility, different types of sensors can be deployed to provide new functionalities and in particular to ensure vehicle safety, as mentioned in [9]; the authors classified employed sensors into two categories: (i) in-vehicle sensors including Ultrasonic sensors, RADAR, GPS, Gyro-scope, accelerometer sensors, and LiDAR sensor; (ii) and in-road sensors such as pneumatic road tube, inductive loop detector, magnetic sensors, piezoelectric, infra-red sensor, acoustic array sensors, radio-frequency identification, and LiDAR sensor. Each sensing technology has its advantages and disadvantages; in Table 1, we summarize the most used ones.

Table 1. Sensing technologies.

As shown in Table 1, several sensors could be deployed to ensure the safety of the roads, for example, RGB cameras can be placed on the roadsides to detect vehicles, congested points of traffic, and even license plates of cars in case of road violation. RADAR sensors can be used to detect the speed of moving vehicles, while ultrasonic sensors are wildly used for parking assistant systems. In addition to the services mentioned above, these sensors can also be combined into a detection system to scan the environment around a moving vehicle and detect obstacles in its path. In this work, we will focus on LiDAR technology, as it provides high spatial resolutions range information, useful for many road safety applications such as ADAS (Advanced Driver Assistance Systems) [10], inspecting of railroad infrastructure [11, 12], and inspecting the road pavement condition [13].

In this section, we first explain how LiDAR technology works, then we enumerate the different types of LiDAR sensors and their respective fields of application.

2.1 LiDAR Scanning

a. LiDAR System Components: LiDAR is part of the Optical Wireless communication (OCW) technologies used to generate very high-resolution three-dimensional maps. The LiDAR technology is considered an active remote sensing system because it generates beams of light (ultraviolet, visible, or near-infrared) to detect and measure the distance of objects and generate very high-resolution three-dimensional maps called point clouds. Every LiDAR system is composed of three essential elements:

  • \(\blacksquare \) The LiDAR sensor: These pieces of equipment come in different shapes and sizes, but most of them have a general component structure, as illustrated in Fig. 1:

    • The Transmitter: it represents the light source (e.g., laser, LEDFootnote 3, or VCSLFootnote 4 diode) that generates and emits, in pulses, laser beams from the sensor to the objects.

    • Scanner and optics: a combination of plane mirrors, a polygon mirror, and a dual-axis scanner are used to adjust the angle and range of the detected laser beams.

    • Photodetector: also known as receiver electronics or photodiode, is the light sensor responsible for collecting the laser beams reflected off the objects and converting them into an electrical signal. There are two principal photodetector technologies used in LiDAR: solid-state electronics (e.g., photodiodes) and Photomultiplier.

  • \(\blacksquare \) Position and navigation systems: it includes a GPS receiver and IMUFootnote 5, usually needed when the sensor is attached to a moving platform (car, airplanes, satellites, etc.) to identify the location and orientation of the LiDAR sensor in the X, Y, Z space, alongside the characteristics of the objects like the distance, size, and shape.

  • \(\blacksquare \) A computer and software: they are used to correlate all the information from the LiDAR sensor and the navigation system and generate the point clouds.

Fig. 1.
figure 1

General representation of a LiDAR sensor [6]

b. LiDAR Functioning: The LiDAR technology is like RADAR because both sensors generate and send out multiple waves of energy that travel from the Transmitter to the objects around them. Then, based on the time-of-flight principle as exposed in Fig. 2, they measure the distance separating them from the things; the used equation is:

$$\begin{aligned} D=c* \frac{t}{2} \end{aligned}$$
(1)

where:

  • t: represents the time of flight

  • c: is the constant value of the speed of light

Fig. 2.
figure 2

Time-of-flight principle

The main difference between both technologies is that RADAR uses radio waves while LiDAR uses light waves. Part of this transmitted signal is reflected from the objects and collected by the receiver component of the LiDAR sensor. This reflected energy, known as the Intensity, is collected by the receiver component of the LiDAR sensor and processed by the Global Positioning System and the Internal Measurement Unit to determine calculate multiple objectives like the location and orientation of the LiDAR sensor in the X, Y, Z space, alongside the characteristics of the objects surrounding it like the distance, size, and shape.

c. LiDAR Data: The LiDAR sensor collects the laser beams reflected from the objects, then processes and stores them in files called point clouds that contain information on a significant number of 3D elevation points in a matrix form.

The main four characteristics of each LiDAR point are the three-dimensional coordinates x, y, and z and the Intensity value that represents the strength of the returned laser pulse in addition to other optional pieces of information that are generated by specific sensors:

  • Point classification: each point will be given a class that defines the object it is reflected. American Society for Photogrammetry and Remote Sensing (ASPRS) defines these classifications. For example, as shown in Table 2, a point cloud is given one of the twenty classes.

  • RGB: some sensors can assign a color to each point of the points cloud based on the intensity of the returned laser beams (points with higher Intensity have warmer tone colors).

  • GPS time: this attribute is usually assigned when using a mobile LiDAR sensor (e.g., attached to a moving vehicle) to stamp when the laser beam was emitted from the sensor.

Table 2. Classification value and meaning for LiDAR points [?]

The generated point clouds are stored in files under hundreds of file formats, depending on the LiDAR sensor deployed to scan the area. Still, the majority fall under the ASCIIFootnote 6 and Binary format.

The first type uses text to encode information, making it easier to read by text editors and other applications (e.g., Microsoft Excel) and optimal for long-term archiving. However, these files take longer to process and to read line by line and are more significant than binary files. This format’s most used file types are XYZ, OBJ, PTX, and ASC. The latter format is more compact and can store and transmit more information than the ASCII format; it allows faster processing and viewing of files. Its main drawback is that simple text editors cannot read it. FLS, PCDFootnote 7 and LAS, are some of the most popular point cloud binary formats.

Other files can store ASCII and binary forms like PLY, FBX, and E57, taking advantage of both formats. However, since both of these formats have their properties, it is not advised to convert binary format to ASCII because it could degrade the value of information.

There is a wide variety of software capable of processing LiDAR point clouds, depending on the format of the files. Open-source software provides a limited number of services; mainly they are used to visualize and display point clouds (e.g., QCIS3 [14], Whitebox GAT [15], Fugro Viewer [16], SAGA GIS [17], GRASS GIS [18], Meshlab [19], CloudCompare [20], etc.). Desktop software offers more services and options in addition to the free viewing mode (e.g., Faro Scene [21], Leica Cyclone [22], Trimble Real works [23], Bentley Pointools [24], PointCab [25], Point Fuse [26], EdgeWise [27], Capturing Reality [28], Autodesk ReCap [29], etc.). Table 3 exposes examples of point cloud software and the file formats they can import and export.

Table 3. Point cloud Softwares import and export format

Although the LiDAR data is relatively new, it is available for researchers and scientists to download and experiment with through different websites like Open Topography [30], USGS Earth Explorer [31], NOAA Digital Coast [32], and National Ecological Observatory Network [33]. These websites provide a fixed-point view of LiDAR data irrelevant in the case of model training and machine learning. In addition, different companies offer free datasets for scientists to apply and create new machine learning models like Waymo [5], Kitti dataset [34], and Ouster which alongside its data, it provides unique software used to display and manipulate the information.

2.2 LiDAR Types

Generally, there are two different types of LiDAR application, airborne and terrestrial. Each type requires LiDAR sensors with specific characteristics related to the application objective, the diameter of the area to be scanned, the maximum range of the laser beam needed, and the cost of the sensor.

a. Airborne LiDAR: The airborne LiDAR is an acquisition method that involves attaching the LiDAR sensor to a flying airplane, a helicopter, or a drone to create a top viewpoint cloud over large areas, as shown in Fig. 3.

Fig. 3.
figure 3

Example of airborne LiDAR scanning method [35]

This system comprises three main elements:

  • The LiDAR scanner

  • A GPS device that detects the position of the aircraft holding the scanner

  • The IMU is responsible for processing the LiDAR data, generating the point cloud, and recording the airplane’s altitude.

The aircraft’s height affects the accuracy and density of the point clouds generated by this method. The longer the distance between the airplane and the ground, the lower quality of the data. Compared with the traditional methods, using high-quality RGB cameras to capture top view images, it is possible to filter the vegetation from the point clouds captured by the airborne LiDAR sensors, leaving only the relevant ground surfaces, as shown in Fig. 4.

Fig. 4.
figure 4

Comparison between LiDAR sensing and photogrammetry [36]

The sensors used for these situations are divided into topographic and Bathymetric sensors. Both sensors operate under the same concept, but the main difference is the LiDAR scanners’ capabilities. Topographic scanners used to be mounted on airplanes because of their significant sizes (e.g., Leica TerrainMapper-2, Leica SPL100, RIEGL VQ-880-G, Galaxy T2000, ALTM Galaxy, Trimble AX60i, Trimble AX80), but more companies started manufacturing more compacted sensors that produce inferior but acceptable results. Hence, attaching them to small drones (e.g., DJI M600 Pro LiDAR quadcopter, Draganflyer Commander, Riegl RiCopter Lidar UAV) became possible. This method generates a colored point cloud for above-land surfaces like railroads, highways, and infrastructures while avoiding the potential terrestrial obstacles that could slow down the process or affect the final result of the captured point cloud. Bathymetric LiDAR sensors are physically more significant, more powerful, and require a vital energy source to function. They are usually mounted on airplanes and used to measure the depth of lakes, seas, and oceans or to locate objects underwater and map out the structure of the land under sea level.

b. Terrestrial LiDAR: Terrestrial LiDAR sensors are installed at the ground level and classified into Mobile and Static sensors. With the mobile LiDAR, it is possible to use more than one laser scanner mounted on a moving vehicle (e.g., cars, trains, boats, and vans) to generate dense point clouds along the vehicle’s trajectory. Similar to the airborne LiDAR, mobile sensors (e.g., Topcon IP-S3, Ultra Puck, Alpha Prime, HDL-32E, MRS1000, MRS6000, Valeo Scala, Ouster OS0, OS1, OS2, ES2) are usually equipped with a GPS to detect the location of the vehicle, and an IMU to process the data coming from the LiDAR sensor and the navigation system.

Static sensors, also known as stationary terrestrial sensors (e.g., Faro Focus 3Dx130, Leica C10, Riegl VZ series, Topcon GLS 1500), are commonly used for surveying purposes. They are placed on a fixated tripod at a strategic location to create three-dimensional maps of a specific region from a particular angle. Compared to the traditional methods, static LiDAR sensors can scan in every direction, including upwards and they can easily be relocated after completing one scan which makes them fully portable.

2.3 LiDAR Applications

LiDAR was first introduced by Malcolm Stitch in 1961 as a technology for satellite tracking. This technology has evolved over the years, and it is now successfully deployed in various application fields that require a technology that offers an extensive scanning range and accurate identification and classification of objects in the presence or absence of light:

  • Agriculture: The agriculture sector is one of the oldest and longest-existing markets; it always benefits from new technologies. LiDAR technology is very useful in this field; it is possible to attach sensors to drones and capture bird’s eye view maps that are later processed to study the soil and the terrain. Based on the height level of crops, it is possible to determine the areas with low productivity that need fertilizers, and damaged crops and products, which will help the farmer avoid potential financial loss.

  • Archaeology: The LiDAR technology has been deployed in the archaeology field because it’s a low-cost method that can generate high-resolution 3-dimensional maps of archaeological features like ancient caves, roads, fences, terraces, and even boundaries hidden by vegetation without damaging them. In 2009, the archaeologist Chris Fisher discovered a great city of the Purepecha empire that goes back to 1519 [37]; Fisher stated that with traditional radar technology, it took them two years to survey only 2 km of the site, but with the LiDAR technology it took them only 45 min to scan the entire 13 km surface.

  • Forestry: In the forestry field, airborne LiDAR technology has been deployed to study leaf areas, biomass measurements, and canopy heights and estimate the biodiversity of plants, animals, and even fungi. For example, in 2020, LiDAR sensors were used to map the Australian forests that have been damaged by fire and identify the healthy and burned vegetation. Also, the Save the Redwoods League organization [38] has used LiDAR technology to evaluate the height of trees and learn about the biodiversity of redwood forests.

  • Geology: The point clouds generated by airborne and terrestrial LiDAR have been used in the geology field to study the surface of the Earth. Such as river channel banks and terraces, glacial landforms, the texture of the terrains under the vegetation level, and observing the elevation changes of landscapes between scanning over a long period. For example, in 2005, the Mont Blanc massif was the first high alpine mountain to be scanned by LiDAR to detect rock falls caused by climate changes [39]. In addition, ts technology was combined with GNSSFootnote 8 to locate the Seattle Fault in Washington [40].

  • Atmosphere: There are several applications of LiDAR to the atmosphere. Studying the atmosphere using laser beams goes back to before the Second World War in 1930 by Edward Hutchinson Synge, who suggested examining the upper atmosphere using laser beams. Either terrestrial or airborne LiDAR could be deployed for atmospheric applications. For example, cloud classification uses a powerful laser to retrieve cloud tops, aerosol properties investigated by the EARLINETFootnote 9 [41], atmospheric gazes measuring (e.g., ozone, water vapor), and atmospheric temperature measuring approximately 120 m above ground.

  • Law enforcement: LiDAR technology is being used as a speed gun by the police to detect the speed of vehicles surpassing the speed limit or as a method that records crime scenes to help with the investigation.

  • Military: The most general application of the LiDAR system in the military sector is developing a counter-land mine method by the Areté Associates [42] called ALMDSFootnote 10 [43].

  • Mining: The LiDAR technology has been applied in the mining field by attaching sensors on robots that are wirelessly controlled to map the inside of tunnels and create three-dimensional point clouds [44]. In general, the airborne LiDAR method is the most used for the surveillance of mining sites because of its flexibility against obstacles, and the small size of drones makes them able to reach small spaces [45].

  • Physics and astronomy: The Lunar Orbiter Laser Altimeter (LOLA) is a Moon orbiting satellite equipped with a powerful LiDAR that measures the distance between the Earth and the moon’s surface in millimeters, and generating topographic maps. Similar to the previous example, the Mars Orbiting Laser Altimeter (MOLA) is a Mars-orbiting satellite equipped with a powerful LiDAR sensor to generate global surveys of the red planet.

  • Rbobotics: LiDAR technology has been embedded in robots; through the generated three-dimensional maps of the environment, it is possible for robots to precisely detect and calculate the distance of the objects around them and classify them using machine learning models.

The latest advancement to the LiDAR technology, as of the time this paper was published, are the development of solid-state LiDAR sensors that uses no moving parts which makes it smaller, more reliable and less expensive. In addition, the recent advancement to this technology is called the multi-spectral LiDAR, which uses multiple wavelengths of light to identify more information about the environment such as the materials of the objects. Finally, this technology is being integrated in different mobile devices like smartphones and tablets, which allow it to be applied in a wider range of applications (e.g. indoor mapping, augmented reality).

Some of the main applications that utilize LiDAR are exposed in Fig. 5 with their respective LiDAR sensors.

Fig. 5.
figure 5

LiDAR technology classification, applications, and sensor examples.

In this paper, we will explore the use of LiDAR on the field of autonomous driving and the object detection systems when using terrestrial LiDAR sensors.

3 LiDAR Usage in the Vehicular Field

The LiDAR technology is a valuable safety mechanism for other vehicular field applications:

  • It was used in the railroad field to improve safety by installing a terrestrial LiDAR at a level crossing point to detect the obstacles and then alert the train driver [46].

  • Monitor the state of the railway tracks by attaching a LiDAR sensor on the front of the train to detect irregularities [47,48,49] that need to be fixed to avoid future accidents.

  • Detect objects on the tracks using the airborne LiDAR sensor method [50, 51].

  • Predict rockfall hazard near railway furthermore.

  • Used in the domain of VANets as a solution to ensure secure authentication between vehicles [52].

Still, the autonomous vehicle field remains the field that utilizes LiDAR technology the most as an object detection mechanism [8, 53]. In addition, since early 2010, there have been a decent number of research papers that focus on enhancing the perception of vehicles. We will explore the object detection by the LiDAR technology in the vehicular field.

3.1 LiDAR-Based Object Detection in the Vehicular Field

In the vehicular domain, object detection approaches rely either on raw LiDAR data or on the data provided by LiDAR and a camera; indeed, the fusion of LiDAR technology and RGB cameras offered a stable and feasible solution. The raw data coming from either the LiDAR sensor or the RGB camera must go through three phases:

  1. 1.

    The first phase is the data representation, which is responsible for processing, organizing, and structuring the raw data from the LiDAR sensor for the next step.

  2. 2.

    The second phase is feature extraction which is responsible for generating feature maps by extracting different types of features.

  3. 3.

    The third step is the object detection model. Different approaches can be applied in this step: regression of bounding boxes, determining the object orientation, object class prediction, and deduction of object speed in some cases.

  4. 4.

    The last phase is adopted by models that rely on a two-stage architecture. The first phase is the primary object detection step, which is responsible for extracting the bounding boxes framing the detected objects. Afterward, a second step, called Prediction Refinement, is applied to fine-tune and improve the results of the first stage.

As illustrated in Fig. 6, the authors in [8] sum up the different methods of each step of the 3D object detection process.

Fig. 6.
figure 6

3D object detection system steps and their respective methods

a. Data Representations: This represents the first step in any 3D object detection process. The raw LiDAR point data is refined to enhance the performance of the next phase of the process which is the feature extraction. As illustrated in Fig. 7, this step includes different methods with different output formats for the LiDAR point clouds data, these methods are explained next.

Fig. 7.
figure 7

Feature extraction output formats [53].

  • Point-based: The concept of this first approach is simple to apply; the form of the point cloud is preserved as a collection of sparse points, then each point is represented by its feature vector generated by combining the features of their neighboring points. But since it is composed of thousands of points, object detection could take a significant amount of time to process. For this reason, a preprocessing step is required in order to compact the size of the point cloud to a pre-defined value [54,55,56,57,58,59]. The reduction of the point cloud size is made by a procedure known as downsampling, which eliminates points from the point cloud until reaching the required number of points N (N is the fixed number of points in a point cloud). The downsampling can be applied in two ways, either through a random selection method or a Furthest Point Sampling (FPS)algorithm. In the first method, the points are picked randomly until reaching N-selected points, which could result in an uneven selection of points since dense regions of the point cloud have a higher probability of being downsampled than sparse ones [54, 55]. The second method starts by picking a point randomly, calculating all the distances of other points, and then deleting the farthest one. This process is repeated until reaching the desired prefixed number of points N; this approach maintains a similar representation to the initial point cloud but at the cost of time and hardware [59,60,61].

  • Voxel-based: Voxelization is assigning each point of the point cloud to a voxel according to its 3D coordinates. A voxel is a cubic shape element with distinct coordinates in the 3D space. This approach divides the point clouds into three-dimensional cuboid [62] that could be uniformly spaced or have different sizes inside the x, y, and z Cartesian coordinate grid. In the following step, the features of the raw point cloud are deducted from the group of points inside each voxel as a single feature vector instead of extracting them from each point separately, which lower the computational cost and reduce memory consumption. Some of the features that could be deducted from each voxel are (i) the average value of the intensities inside the voxel, (ii) the 3D coordinates of each voxel point, (iii) and the mean distance between each point and the center of its voxel.

  • Pillar-based: This method was introduced by [63]; it is based on partitioning the point cloud along the Z-axis (in vertical columns) and splitting the 3D space into fixed-size pillars, which are usually viewed as an unbound voxel along the Z-axis. Like the voxel-based approach, the allocation of points to the pillars is done through Fixed or Dynamic voxelization.

  • Frustum-based: The models using this data representation [64,65,66] cut the point clouds into frustums, which is a section that lies between two parallel planes of a cone or a pyramid shape, then apply feature extraction methods on these sections.

  • 2D Projection-based: This data representation method involves projecting three-dimensional point clouds into two-dimensional ones to reduce the computational cost of processing the data. In the literature, three main projection approaches are proposed and applied in various research projects, which are the Range View (RV), the Bird’s Eye View (BEV), and Front View (FV).

  • Graph-based: This last approach converts the point cloud into a graph, where each point is considered a node, and each link between it and its neighbors is an edge. However, since the point cloud holds thousands of points, the number of edges connecting points will be considerably high, resulting in a high computational time and resources. Therefore, this method is preceded by a voxelization step followed by a downsampling phase to preserve specific points [67].

Features Extraction from LiDAR Data: Features extraction is the fundamental phase before applying an object detection method. It enhances the system’s performance by providing well-defined and easy-to-process features from the point cloud. There are mainly three classes of features that could be extracted:

  • Local: also known as low-level features, they represent the spatial information of each point in the point cloud. They are usually extracted at the start of the model pipeline.

  • Global: also named high-level features, they encapsulate the information of the shape and geometric features between a point and its neighbors; they could be extracted from a single network or through a combination of networks.

  • Contextual: these features are the last to be extracted and fed to the model object detection phase. They represent the combination between the localization features of points and their semantic value.

Many research methods rely on combining multiple feature extractors to optimize the results of the detection model. There are two different groups of feature extractors, 3D-based and 2D-based extractors. The earlier extractor is applied directly to the 3D space, while the latter operates in the 2D planes; each type has its distinct architectures and application methods.

Object Detection: Object detection is the principal phase of the 3D object detection process; detection approaches can be classified into five categories based on (1) the feature extraction pattern, (2) the pipeline architecture of the detected module, (3) the detection settings of the approach, (4) the object detection mechanisms, and (5) the type of data used as input, as illustrated in Fig. 8. This section will present these classifications.

Fig. 8.
figure 8

Different classifications of Detection Networks

  1. 1.

    Feature extraction patterns: The phase of the feature extraction process differs from one approach to another. For example, some merge multiple feature extractors to exploit the advantages of different methods, while others use a single method that enhances the execution time of the feature extraction phase. In addition, the architecture of the feature extractor varies from one to another to extract rich information while maintaining spatial information to enhance classification and object localization. When working with a three-dimensional type of data, the size and shape of objects are constantly changing depending on the distance between the targeted object and the sensor and the angle of detection; it is necessary to implement networks capable of extracting multi-scale features. Approaches like [68, 69] that operate on 2D images attempted to achieve this objective by performing object detection while resizing the input images; but, come with a high computational cost. More recent approaches [70] tried another method by increasing the layers of the decoders in the encoder/decoder architecture, which led to generating feature vectors with multiple resolutions.

  2. 2.

    Pipeline Detector architecture: The object detection solutions generally follow two different architectures:

    • The dual-stage approaches: the detection approaches that follow this architecture are composed of two networks. The first starts with a proposal generator (e.g., RPN) to create a set of region predictions known as Intermediate proposals. Then, a second network known as the Prediction Refinement Network is used to optimize the localization accuracy of the detected objects that takes as inputs the generated proposals and the original point coordinates features.

    • The Single-Stage approaches: these approaches combine the classification and bounding box proposals into a collection of connected layers. They directly apply object classification and generate final bounding box estimations for each part of the feature maps without the need to use the bounding box refinement phase.

    Compared with the dual-stage approach, the single-stage is usually more time-efficient, making it more suitable for real-time object detection applications. In contrast, the first approach can achieve more sophisticated precision results.

  3. 3.

    Detection settings: For the point cloud data type, the process of detecting object location can be achieved using two approaches:

    • Rectangular-shaped cuboids (also known as the bounding box level localization): This concept revolves around drawing tight bounding box predictions around the detected objects to locate them. There are various methods applied to draw and optimize the bounding boxes. The most used one starts by pre-defining the size of the bounding boxes in the proposal regions step, then improving them by modifying their sizes and orientations.

    • Segmentation masks (also known as mask-level localizations): This concept utilizes point-based data representation to learn and classify each point as a foreground or a background point. Instead of a cuboid bounding box, this approach uses pixel-based masks to segment the objects. In addition, these masks are usually modified to regress bounding boxes.

    For the first approach, during the training phase of the model, the encoder networks utilize the feature vectors generated by the feature extraction phase and the annotation files that store the dimensions of the bounding boxes. The training step of the second phase uses the point-based features extracted from the ground truth segmentation masks provided by the datasets. Finally, the IoU mechanism is used between the bounding boxes generated by the model and the ground truth provided by the dataset to evaluate the detector’s performance.

  4. 4.

    Detection Mechanisms: The object detection approaches can be divided into four main techniques based on the methods used to generate the region proposals, and they are described in the following:

    • Region proposal method Several examples and variations of the Region Proposal Method were developed in the literature, and the goal with each one was to enhance the results of the one before.

    • Sliding Window Method: The first step of the sliding window detector is to apply a CNN on the training set that contains cropped and labeled objects; it generates a model that can identify the required objects. Next, the same CNN is used to classify the objects inside the image by receiving multiple parts cropped with a square-shaped frame known as a “Window” that scans the entire image with a constant stride. Finally, this step is repeated with different window sizes to find the most acceptable result [71]. The main disadvantage of this method is the high inference time when applied to point clouds because of the sparseness of the points.

    • Anchorless Detectors: The anchorless method avoids using many 3D anchors; instead, it follows the binary (foreground/background) segmentation-based detection settings, allowing models to be more memory efficient with lower computational cost. However, compared to the region proposal frameworks, the accuracy of these detectors is lower when detecting large objects (e.g., trucks, cars) and higher for small ones (e.g., cyclists, pedestrians).

    • Hybrid Detectors: STD [57] is the general dual-stage approach that combines anchors and segmentation to generate region proposals.

  5. 5.

    Input Data Type: When it comes to the input data utilized in detection models, there are notably two different approaches; either base the solution only on LiDAR point clouds as the primary source of data or merge it with images collected by RGB cameras.

    • Various approaches rely on the first method because of the rich geometric information the LiDAR sensor provides. The LiDAR point clouds could transformed into BEVs by omitting the height value of the Z-axis; then applying on them 2D object detection mechanisms used for RGB images. Some models [63] process the point clouds under the structure of 3D voxels or pillar representations are usually more expensive in terms of hardware and time. Finally, other approaches operate directly on the raw point cloud data as it is [56, 57].

    • The approaches [72,73,74] based on both sensing technologies detect objects in more complex scenarios like small and distant objects, which is impossible using only LiDAR sensors. The main advantage of using RGB cameras is the generation of dense pixel images over a significant distance (depending on the camera’s performance). Still, it doesn’t give any information about objects’ depth (the distance). Combining the two data types allows taking advantage of the densely pixelated images generated by RGB cameras and the accurate depth provided by LiDAR.

      The usage of two different types of data will improve the accuracy of the models in the majority of cases, but it comes with many disadvantages:

      • Models require precise calibration and synchronization between the LiDAR and the camera sensors, which makes the accuracy of the solution extremely dependent on any changes to the sensor position or view angle.

      • These fusion solutions are usually slower than the LiDAR-only solutions due to the large number of images to be processed, the usage of dual-stage architectures, and the deployment of RPNs for bounding box generation.

      • These solutions are so dependent on the detection performance of the 2D object detectors, and they are not capable of using the 3D information to enhance the accuracy of the bounding boxes.

      • The approaches relying on extracting and combining the features of multiple views (e.g., MV3D) face the problem of information loss due to the inconsistency of the feature sizes across the BEV projection, the front view projection, and the camera image. Thus, they need to normalize their sizes, which affects the detection performance.

3.2 Challenges

The perception system requires a single or a group of LiDAR sensors that periodically scan and collect the three-dimensional space around it and store it in point cloud files [8]. Next, it extracts important information and classifies the data by their semantic meaning. The LiDAR technology provides 3D point clouds that represent the scenes around the object holding the sensor. However, some factors make this task of perception extremely challenging like:

  • The vast diversity of environments changes each second, including the state of the weather. It has been proven in different studies [75,76,77]that fog and rain can negatively influence the performance of the LiDAR sensor, but the LiDAR could still generated results better than other sensing technologies (e.g. RADAR).

  • Objects could be obscure partially or entirely by other objects or parts of other objects.

  • The input shape and size of an object detected by a LiDAR sensor depends on the distance and angle from which the object was detected. As a result, the same entity can have different shapes and sizes, creating confusion when classifying the object.

  • The performance of the LiDAR sensor is dependent on the entire driving domain.

All the factors mentioned above hinder the quality of service that LiDAR can deliver; therefore, multiple approaches have combined LiDAR with different sensing technologies like RGB cameras [78] and stereo cameras [79], RADAR [80], and ultrasonic sensors [81]. The combination of the LiDAR sensor and monocular cameras is considered the most adopted method of multi-sensing architecture because of the LiDAR’s capability to provide depth information. In contrast, cameras collect information richer in texture [8, 53, 82, 83].

Besides, object detection is an essential step for the autonomous vehicle process. It relies on the data collected from a LiDAR or a LiDAR and RGB cameras and a machine-learning algorithm to create prediction models or enhance the performance of older versions. However, although LiDAR sensors provide high-resolution three-dimensional maps under various lighting conditions; the recourse to these sensors raises new challenges:

  • The data generated by LiDAR sensors are sparse and unstructured.

  • The volume of the point clouds is large, and their processing requires powerful types of equipment since the features extraction and the object detection steps are expected to be performed in real-time.

  • The processing units are resource-constrained since vehicles are equipped with a limited source of energy (the battery of the vehicle); thus, the use of efficient computational models to process the point clouds is required.

1. The data generated by LiDAR sensors are sparse and unstructured. 2. The volume of the point clouds is significant, and their processing requires powerful equipment since feature extraction, and object detection steps are expected to be performed in real-time. 3. The processing units are resource-constrained since vehicles are equipped with a limited energy source (for electric cars); thus, efficient computational models are required to process the point clouds .

4 Conclusion

In this paper, we presented the LiDAR technology, including its functioning mechanism, types, its various application in different fields. We also tried to sum up the main feature that could be extracted from the LiDAR point clouds, and the feature extractors used on this type of data. Our work can still be improved by presenting the different 3D detection methods used by different LiDAR models.