1 Introduction

Connected Sensor Systems (also called Internet-of-Things or IoT) are poised for a disruptive growth in near future. The “Sense-Analyze-Respond” paradigm of IoT system is providing insight, deriving value and creating new ways to do business across multiple verticals be it manufacturing, healthcare, transportation, energy etc.

A typical IoT system has three components—a sensor system to collect data about physical events, a data transport system to send the collected sensor data to cloud and a learning and knowledge based analytics system to derive insights about the physical event from the collected data. So far IoT systems have focused more on using specialized sensors to collect more and more information about the physical events. However, if we look at human beings as connected entities, we, since our birth continuously learn about our surrounding through our five senses—the sensed data is converted into knowledge and wisdom via complex learning systems of our brain. In the same analogy, if the human five senses can be imparted on machines along with sophisticated machine learning techniques, it would be possible to create smarter IoT systems than what is available today. This concept of sensing five human senses on machines followed by advanced machine learning based analytics to derive knowledge and insights is popularly known as “5 senses computing”.Footnote 1 Here we try to present a glimpse of the “art of the possible” in the area of sensing these five senses, which can either be stand-alone sensors or can be mounted on robots—each of these have significant application in remote inspection and predictive analytics, especially for manufacturing and shop-floor scenarios.

Sight Even though camera existed as a sensor for a long time, intelligent processing of image and videos can take vision processing to a new level. This will include 3D vision, vision based measurements, automatic object recognition and semantic understanding of images and video content [1]. New research in this field is also enabling unobtrusive use of camera to sense physical events like vibration, micro motion movements etc.

Hearing There will also be significant improvements in computers’ ability to hear and understand sound. Source localization and analysis of sound captured unobtrusively from microphones can provide insights into a lot of physical events spanning across verticals [2].

Touch Touch sensing and haptic feedback is poised to become core component of human–computer interaction (HCI). Going beyond, our skin not only senses touch, but also senses heat. Heat sensors in form of Infra-red camera and thermal cameras can provide a lot of insight into a machine’s condition. This information, when coupled with 3D vision information can be used for predictive analytics [3].

Smell Olfactory sensing in form of gas sensors and other similar sensors are slowly becoming main-stream. These sensors, backed up by requisite analytics will be able to check for molecular biomarkers in future [4].

Taste Taste sensors that can break down ingredients to their respective chemicals are also being prototyped [5]. While they have more direct applications in food and beverage industry, their application in other areas cannot be ruled out. Both olfactory and taste sensing can be potentially used in future for composition analysis and quality check of components and materials.

Fusion of the above five senses and the associated analytics is also termed as “cognitive computing” [6]. In time, cognitive computing will be able to unobtrusively observe and model complex interactions in complex systems—such a sensory-aware machine can be used the model to predict the system condition and health leading towards predictive maintenance and quality improvement.

In this paper, we first discuss the Current Application Areas in Sect. 2. Then in Sect. 3, we present related work and in Sect. 4 we describe our contribution in 3D vision based measurement and vibration sensing, acoustic source localization and thermal sensing and discuss how it adds value in remote inspection and analytics space. In Sect. 5 we present the results and discuss their outcome. Finally in Sect. 6 we summarize the work and discuss about possible future work.

2 Current Application Areas

The current application areas focus mainly on optical vision systems. Industrial Machine Vision has evolved as a major contributor in automating High end factory operations like Quality Control, Packaging Inspection, Identification, Measurement, Counting and Tracking in different industries. Machine Vision has added speed and efficiency to the manual methods of Inspection. This has improved reliability and precision. The philosophies of Track–Trace-Control Operations have drastically improved the factory efficiency of assembly/manufacturing, Supply Chain and Local Warehouse Management.Footnote 2 , Footnote 3 Image Capturing, Image Processing and Output generation are the standard roles being played by Industrial Cameras and Machine Vision Instruments that is capturing the imagination of the manufacturers. Common machine vision technology applications are used in sorting, quality assurance, robotic guidance, material handling and optical gauging. Some specific usages are Automated PCB Inspection, Sub-Assembly Inspection, Robotic Guidance, Packaging Inspection and Sorting, Reading of Serial numbers, Molding Flash Detection etc. The advantages offered by these technologies has led to semi-automation and full-automation of production environments and manufacturing industries. The benefits include faster processing speeds, automated decision making, high end finished goods, easy trans-shipment of goods and items, elimination of waste and reduction of idle time. Smart and easily configurable cameras have taken the role of Industrial Eyes which is being promoted as replacement to human based inspection.

Earlier Machine Vision was limited to 2-dimensional images and videos only. The ease of availability of 3D vision software tools and algorithms has opened the possibility of 3D Image Processing and Object Identification. Optical three-dimensional (3-D) profilometryFootnote 4 is an important application for 3D vision because of its simplicity, flexibility, high accuracy, and non-contact nature. Latest researches in imaging sensors and digital projection technology further its progress in high-speed, real-time applications, enabling the reconstruction of 3-D shapes of moving objects and dynamic scenes.

Online Real Time Vision sensors may be used to measure and inspect process quality. This may also be used to check the dimensions of parts/components, compare them with the drawing dimensions and take real-time decision on quality. Overcoming the limitations of the Camera, Parts Inspection based on Image Processing can open a new area in the Process of Inspection and Quality Control. Vision Sensors have advanced to the stage where they are able to replicate the functions of Human Behavior or may be even achieve functions which the human eye cannot detect. In next chapter we explore the technology of sensing beyond optical vision.

3 Sensing Capability: Related Work

Machines and Robots are substituting humans in performing everyday tasks and are better suited to completing routine and repetitive tasks than humans. In these situations, it requires robots to be equipped with human-like sensory inputs as well as a robust decision support system to achieve their desired utility. Most of the commercially deployable robots are equipped with proprioceptive and exteroceptive sensors mainly for autonomous navigation, obstacle avoidance and path planning. But robots having capability of human-like sensing & cognition is a next horizon problem. 2D/3D machine vision is a quite explored area but how to work in extremely smoky dark environments where visibility is partially or completely occluded is a challenge. Acoustic imaging capability, thermal sensing to localize and achieve surface temperature profiles and augment them with 3D visual information is a state-of-the art limitation which eventually can resolve uncertainty problems of unknown environmental mapping. Sound source detection is quite a researched area but applying it on robotics to make them working under various acoustic conditions and noisy environments is a pertaining research area. E-Noses are effective inside laboratory environments but put them altogether to build machine olfaction capability to detect, differentiate and localize gases, fumes, steams, haze efficiently in real-time is miles to go effort. Multi-sensing next generation robots to create a novel symbiotic autonomy in which machines are aware of their perceptual, physical and reasoning limitations and proactively act as human is a real necessity.

3.1 Machine Vision

As outlined in Sect. 2, the field of machine vision, or computer vision, has been growing at a fast pace and has numerous applications in automation of manufacturing systems. The imaging system can be composed of a camera (2D/3D) and capturing system. The capturing and processing is entirely dependent on type of camera is used (2D or 3D). Camera selection depends on the desired use cases and accuracy. We describe the 3D systems in detail below.

Automatic remote monitoring of an enclosed space is of compelling research nowadays. Constructing a 3D map/perception model of an unknown indoor or outdoor environment using a robotic platform is needed for such monitoring system. Available IMU sensors and mobile robot kinematics allow 3D reconstruction to be finished in near real-time using a very low cost robotic platform. In a recent work [7], Pradeep et al. has described a methodology for markerless tracking and 3D reconstruction in scenes of smaller size using RGB camera. It generates high quality 3D model reconstruction using a webcam. Pizzoli et al. [8] proposed a solution by adapting a probabilistic approach in which depth map is computed by combining Bayesian estimation and convex optimization techniques. All these implementations are limited to a small scene reconstruction and not suitable for an entire 3D environment creation. The 3D reconstruction of an environment from multiple images or video captured by a single moving camera has been studied for several years and is well known as Structure-from-Motion (SfM) and quite powerful SfM pipeline readily available for 3D reconstructing as shown in [9, 10]. Recently, smart phones are used for image acquisition due to its low cost and easy availability. So researchers used smart phones sensors like accelerometer, magnetometer for data collection and 3D reconstruction, it reduces computation [11, 12] and few works such as [13, 14] have accomplished this, but the output is noisy due to a fast and course reconstruction. A system capable of dense 3D reconstruction of an unknown environment in real-time through a mobile robot requires simultaneous localization and mapping (SLAM) [15].

Another interesting application area for 3D optical vision is Quality Control. Measurement using computer vision is studied extensively in the last decade. Researchers has measured straightness defect in hot rolling steel sheets [16] in real-time. The system is designed for easy implementation in the actual plant but the result presented without any bench marking. The system is specially designed only for flatness measurement of rolling sheets and not applicable directly for auto component measurement. 3D vision based measurement system is proposed using a single moving camera [17] but that is limited to estimate the robot calibration in offline. The trend of using camera for vision based measurement is studied in [18]. Rövid proposed [19] a vision based measurement system for vehicle body inspection where a rotating gray code pattern projection and a multi-camera based tracking system is proposed for measurement of large surfaces. The main drawback of that system is the complexity of the multiple cameras and projector based complex system.

3.2 Thermal Perception for Machines

Unobtrusive heat measurement and monitoring is well accepted in manufacturing, chemical, automobile, construction industries. Conventional industrial thermal cameras are still not in affordable range for everyday life usage. Conventional thermography for energy measurement and non-invasive assessments relies on 2D thermal images, which have significant limitations like lack of information on the shape and geometry or location of the object of interest in the scene. So there is growing interest on representing the environment in 3D which also integrates the temperature information. The combined information will help to detect the object of interest and volumetric measurement precisely. FLIRFootnote 5 lunched a low cost and affordable thermal sensor [20] as smart-phone attachment which increases the possibility of monitoring and verification of heated region using such hand held mobile low cost sensors.

Several studies are performed to explore the potential of 3D thermal mapping and volumetric inspection. The studies are mostly focused on monitoring building power consumption. ThermalMapper [21] is a well-known project which uses a terrestrial laser scanner and thermal infrared camera on a wheel robot. The result from ThermalMapper is a dense 3D point cloud which can be visualized in both RGB and thermal. Volumetric heat measurement and analysis is not part of the presented system. There is significant cost and mobility difference between the presented systems with our proposed system due to the usage of a light weight (approximately 78 g) low cost FLIR attachment with smart-phone. In a recent work [22], Vidas et al. represent a 3D thermal mapping to monitoring building interiors using Microsoft Kinect [23] and a thermal camera. In computer vision and robotics, the use of RGBD cameras like Microsoft Kinect facilitates the development of techniques for highly-detailed and spatially-extended reconstructions [24, 25]. Such costly and bulky coupled sensors are capable of reconstructing in real-time [26], but the use of structured light pattern make the product usage limited within indoor environment and short range measurements. The working environment along with cost, dimension and weight are the main drawbacks for Kinect to be used as a light weight low cost system. Though active depth sensors have many advantages, there are certain scenario where passive RGB cameras are preferred due to its low power consumption, outdoor capable and form factor. This has motivated many researchers to investigate methods for 3D reconstruction using only passive cameras.

Industrial thermal cameras are capable of measuring the temperature accurately from a specified distance and few costly cameras provide dimension of the heated regions in 2 dimensions. FLIR smart-phone thermal attachment is also providing information in 2 dimensional spaces. Volumetric measurements are limited due to 2 dimensions. The cost of thermal cameras is another metric which restrict these products to be used only in industrial segment. FLIR smart-phone thermal attachment brings the opportunity to be used as house hold product for everyday life usage due to the cost reduction, increase mobility for small dimensions and weight and finally user friendly instead of expensive or bulky thermal systems. Automatic Volumetric measurement requires the heat analysis on 3 dimensions, so there are limitations in state of the art for an autonomous affordable system which is capable of area or volumetric measurement of any heated regions.

3.3 Machine Audition

Machine audition is of great importance when machine vision and visual odometry cannot work effectively because of the poor lighting condition or the target is not within the field of view. Acoustic source detection, localization and profiling can be an important part of unobtrusive sensing. In general the sound that a machine listens to consists of not a single sound source but multiple sound sources and finally should process all of these data to extract the pertinent information. Biologically inspired sound localization systems can be built by making use of an array of microphones, which are connected to a processor. In addition, such a system of microphones can be made to extract any particular sound from multiple sources produced simultaneously by several sources. Although the spatial resolution is relatively low compared with that of vision, audition has several unique properties—(a) vision occlusion never happens in audition (b) it works well equally in darkness (c) it is omnidirectional and (d) it has high time resolution and low computation overhead. Because of these unique properties mentioned, a 3D audio sensing system can be of great utility. Acoustic 3D imaging using the audio sensing can be another application. The techniques for acoustic based imaging of objects is well established and takes precedence in situations where optic based systems fail due to strong attenuation of EM waves e.g. underwater SONAR and imaging in dispersive medium e.g. smoke haze environment.

3.4 Tactile Sensing and Machine Olfaction

Once a region-of-interest has been identified in 3D space using techniques outlined the above sections, a moving machine with all these sensors (e.g. a robot or a drone) can be navigated to go in proximity of the region. The optical, thermal, audio, and smell sensors would get better resolution data as the robot moves nearer. However, at a touching distance from the region, a new sensor in form of tactile sensing can come in play to deduce further details just like a human being does by moving closer and touching by hand. Newly reported opto-tactile sensors can be used for object shape, size and surface assessment and can enhance 3D object model in mobile robots.

Smell sensing can be applied to Gas localization and gas distribution mapping. Such systems can comprise of an on-board array of gas sensors in a mobile robot equipped with electronic noses. According to state of the art, statistical methods to build 3D-DepthMap using three e-noses mounted at different heights in a mobile robot is becoming possible. The gas distribution model (GDM) can be improved with wind measurement obtained by ultrasonic anemometer.

4 Proposed Integrated Sensing System

As part of the fusion of multi-sensor information [27, 28], we propose to augment 3D robotic vision using odometry data and ultrasonic/laser range finder readings together with gas sensing and develop a probabilistic framework for simultaneous localization and mapping [29]. Schematic representation of a multi sensor robotic platform is depicted in Fig. 1.

Fig. 1
figure 1

Proposed robotic platform with multi-sensory perception

In our system, we present an end to end framework capable of generating 3D reconstruction of an environment based on the image/video captured through a remote platform mounted on a two wheel based robot. This work is a core part of our system presented in [27]. Our experimental results show that our technique is efficient and robust to a variety of indoor and outdoor environment scenarios with different scale and size. In our work, Firebird VI robot [30] (refer Fig. 2) is used whose all operations are controlled through Robot Operating System (ROS) [31]. The framework is capable to work on any robotic platform that supports ROS and Firebird VI is chosen due to its low cost and ready availability.

Fig. 2
figure 2

Firebird VI robotic platform

In the system an off-the-shelf webcam is used as 2D image capture device. The camera is calibrated and mounted on a servo on the robot as shown in Fig. 2. The servo allows the camera to pan and tilt which, in turn helps to capture the surrounding environment. Odometer and IMU sensor data is also captured simultaneously with the images in a time synchronized way. The captured information is pushed back to a backend server from where user is controlling the robot. Odometry and IMU sensor data are used for robot localization along with camera pose estimation [13]. Multi-view geometry [32] is used for creating a 3D map [11] of the environment. The details of the entire process are explained in [33].

Additionally for thermal imaging, we present a cost effective 3D thermal mapping system capable of area or volumetric measurement of heat in a continuous and non-invasive way. Initial work with handheld smart-phone based system is presented in [34]. FLIR thermal attachment with a smartphone is mounted on the Firebird VI robotic platform and it is calibrated. So the webcam and the FLIR thermal attachment pair are calibrated to produce a three dimensional environment with thermal annotation on the surface areas. This system is thus capable of measuring the thermal area or volume at any abnormally heated region. Three dimensional reconstructions are carried out using previously described method as in Sects. 3.1 and 3.2. Back projection is used for thermal overlay on surfaces of the reconstructed environment.

For an automatic quality inspection system for manufacturing industry, we use our proposed robotic platform with a Kinect mounted as three dimensional depth sensor. The capture process is executed with a Kinect connected with an onboard computing unit and the captured three dimensional data is stored in the form of point cloud. Kinect is kept at a distance about 0.8 m for best performance along with controlled lighting. The component objects are with different shapes and size with variety of measurable attributes, so the implemented algorithms are customized according with the shape and geometry of the objects. The object is segmented from the background by color and shape after proper noise removal and clean-up. We are able to measure different geometrical properties of objects and Point Cloud Library (PCL) is used intensively at the different steps of our implementation.

We also added an audition system [35] on top of the Fire Bird VI based mobile robot platform by using Time Delay of Arrival (TDOA) information. We used a four-microphone array with 24-bit analog-to-digital converter (ADC) including acoustic echo cancellation and noise suppression producing 16 kHz; 24-bit pulse code modulation (PCM) modulated audio. The system receives time series signals and processes TDOA between adjacent microphones using cross-correlation, phase transformation and maximum likelihood techniques. It has been observed that classical cross correlation method is easy to implement but it is sensitive to noise and also performance degrades if reverberation time is >150 ms. PHAT normalizes the cross power spectrum of the signals. It offers the advantage by minimizing the spreading of the peak of the correlation function. It removes all energy content from the cross spectrum and provide very sharp correlation peak.

Finally, for acoustic imaging, we use an acoustic array in our system [27] which localizes acoustic sources and provides visualization augmentation for the existing machine vision capability. The basic principle lies in electronic beam-steering and beam-shaping where the beam is steered in both azimuthal and elevation directions. Using 2D acoustic array, all ultrasonic waveforms are transmitted using proper phase-shifts at each antenna element so that the combined beam is steered in both azimuth and elevation directions, covering the complete target. On the other hand ultrasonic receivers collect the reflected wave across the two distinct dimensions. Reconstruction is carried out using phase correction factors.

5 Results and Discussion

5.1 Optothermal Vision

3D environment creation implementation environment consisted of Firebird VI robot as shown in Fig. 1 and a back-end system having Intel(R) Xeon(R) E5606 processor running at 2.13 GHz along with a NVIDIA Tesla C2050 Graphic Card. One ZOTAC ZBOXHD-ID11 is mounted on top of the robot. ROS hydro is installed inside Ubuntu 12.04 LTS in all the systems. The entire capture task was run on the ZOTAC box mounted on the Firebird VI. Image is captured with 640 × 480 resolution using the Logitech C920 webcam. The 3D model reconstruction is carried out on the backend system due to less processing power of ZOTAC box.

In Fig. 3 we present result of the 3D reconstruction where the data is captured in a living room. The three sides of the room are captured where different objects are placed. The dimension of the room is about 13 × 11 feet. The user can guide the robot to go closer and capture the frames to produce a more accurate and dense points in any required zone.

Fig. 3
figure 3

3D reconstruction using robot locomotion

For opto-thermal mapping, we present a sample heat measurement to demonstrate the usability of our solution. Figure 4 shows a sample 3D thermal point cloud of a mug having hot water inside. The presence of water is not detectable through RGB image but corresponding thermal image shows distinct temperature differences in hot regions. The testing is performed using 11 images with two iterations. The 3D thermal cloud shows the structure of the mug along with the hot region. The idea is to measure the volume of hot water present inside. The mug is segmented by the knowledge of its cylindrical shape. The segmentation finally is used to detect the dimension of the mug. Mostly heated region is extracted from the temperature and the dimension is calculated from 3D structure. The volume of hot water present is calculated as 63.7 cc by our method against actual volume of 68 cc.

Fig. 4
figure 4

Volumetric measurement: top to bottom shows captured RGB, thermal image, segmented 3D thermal model, detected dimension of the heated region

For 3D vision based measurement for quality control, we present a comparative measurement result of different automobile component parts using Kinect. The experiments are carried out with every single object several times to test the stability of the implemented system. The presented results are the average of the repeated measurements and are presented in Table 1. We could reach the sub millimeter level accuracy. The tolerance of our measurement method is seen to be about ±0.5 mm.

Table 1 Comparative result for auto component measurement

5.2 Acoustic Sensing

We varied frequencies, ping durations, source distances and azimuths as given in Table 2 to know the effect of these parameters in finding DOA. From Table 2, the parameter ‘Pings-Pause’ of value ‘f0.006, 0.003 g’ indicates the time series signal generated with pause duration 0.003 s (48 samples) followed by sinusoidal ping signal of duration 0.006 s (96 samples). The accuracy of the localization techniques is dependent on the relative bearing of the acoustic source and greatly improves for sources placed at an angle of 90° wr.t. microphone array i.e. broadside as compared to others. The platform locomotion and a prior 3D environmental modeling can now be leveraged to localize and track the source efficiently. The stopping criteria of the mobile robot towards reaching the source is carried out by using the range information (i.e. distance of microphones from the acoustic source) obtained from 3D model structure generated from low cost optic camera.

Table 2 Different parameters of signals

The performance of each localization algorithm is evaluated by comparing the original DOA and the estimated DOA for all ping durations at different frequencies and for different distances. It is evident from Fig. 5, that PHAT algorithm always outperforms CC and ML.

Fig. 5
figure 5

Boxplots of DoA’s for CC, PHAT and ML. Signal Frequency 1 kHz, Distance 1 m, Samples 4800, Pause 9800

For acoustic imaging, a fully populated 2D planar array (λ/2 spacing) of ultrasonic transducers has been designed in a 4 × 4 grid with electronic beam steering in both azimuth and elevation. A target is acoustically reconstructed at a distance of 5.0 m from the acoustic sensor array (ASA) using pulsed-CW 40 kHz signal. The simulation results of the performance of such a system are represented Figs. 6 and 7 below. Performing regression tests on various combinations of parameters like total number of array elements, frequencies of insonification etc. an optimal set of values of the parameters are arrived at. Such an opto-acoustic 3D machine vision can provide a new dimension of sensing mechanism when robots will be deployed in an uncertain environment which is visually occluded due to smoke, fumes, dust and chemical vapor contents.

Fig. 6
figure 6

Sample beam pattern of the entire array

Fig. 7
figure 7

Acoustic shape of a surface. No of elements 16, Frequency of operation 40 kHz, Pulse duration 1 ms

6 Conclusion

In this paper, we have first introduced the concept of human-like five senses computing and have outlined how it can be applied for remote inspection and analytics. We have discussed specific scenarios where 3D vision, audio and thermal sensing can be used for measurement, quality control and predictive analytics use cases and have presented results of some working prototypes. As seen from the results, the sensing part yields good outcome. In future the collected data can be fed into artificial intelligence engines backed by machine learning and that can indeed herald a new era of automation in remote inspection and predictive analytics space.