1 Introduction

With the developments of biomedical and living conditions, people’s life expectancy is growing. Human society is facing a foreseeable structural adjustment: the population ageing. According to demographic studies by World Health Organization (2012), the proportion of the population over 65 years will rise continuously in the coming decades. Ageing problems will bring significant social influences. On the one hand, there is a growing demand to provide daily healthcare services; on the other hand, the aging problem is accompanied by the lack of adult labor, which will lead to the lack of enough labor to provide daily assistance. Therefore, it is necessary to develop some ambient intelligence technologies to create automatic health monitoring systems for improving the living conditions of older adults.

Fall is a common activity among the older people. According to statistical data in the literature (Blake et al. 1988; Prudham and Evans 1981; Campbell et al. 1981; Chan et al. 2007), approximately 28–35 % of the elderly people over 65 fall once each year, and the proportion will be increased to 32–42 % for the people aged of 70 and over (Tinetti et al. 1988; Stalenhoef 2002). In addition, the frequency of falls increases with the increasing age and worse frailty. Fall usually results in devastating physical injury and psychological stress (Noury et al. 2007). Moreover, with the increasing age, the mortality caused by the fall is rising sharply. Besides, fall is a potential issue among patients in the hospital and research shows that close to one-third of falls can be prevented (Cameron et al. 2010). However, there has been still no efficient method of decreasing the number of falls, especially the older people or patients living alone. If one can find a fall timely and make the appropriate assistance, the traumatic injuries and mortality caused by fall will be able to be controlled. Therefore, how to develop a rapid yet efficient fall detection system is an important topic on health monitoring.

A recent surge of study interests have been paid on the automatic fall detection and the other abnormal behaviors of the older adults. The wearable sensor and the vision sensor are the most used technologies. Wearable sensors are usually placed in the human clothing and able to achieve accurate fall detection, such as acceleration (Karantonis et al. 2006; Lai et al. 2010; Bourke et al. 2007), gyroscopes (Bourke and Lyons 2008) and wireless push-button (Hori et al. 2004). However, these solutions break the physical and psychological feelings of the older people. It is reluctant and inconvenient to wear these wearable devices. More importantly, if the older adults forget to wear them, the alarm system would not provide monitoring services.

The camera based vision sensor is able to provide the health monitoring for residents with a non-intrusive fashion. Several authors explored the commercial cameras to capture the video in the house scene and extracted the object feature for fall detection (Lee and Chung 2008, 2011). Although the existing methods have obtained accurate fall detection, the precise body extraction and robustness against the changing lighting are challenging issues in the computer vision community. The body poses extraction may be corrupted by the clustered clothes and shadows. Besides, camera-based methods expose the privacy of the older adults in free-living environments. A number of alternative solutions have arisen to analyze human motion with isomorphism thermal cameras (Han and Bhanu 2005; Sixsmith et al. 2005). The human body is considered as a natural emitter of infrared ray. Normally, the body temperature is different from that of surrounding. This leads to the convenience of motion detection from the background regardless of lighting conditions and colors of the human surfaces and surroundings. This sensing pattern can extract meaningful information for human motion directly. But the thermal cameras are expensive and the information processing is still difficult to deal with.

Recently, various studies exploit the use of Microsoft’s Kinect infrared sensor in activity monitoring. The Kinect device produces a depth map image using a pattern of actively emitted infrared light. The value of each pixel in the map image is related to the distance between the device and the being viewed object. It has prominent advantages in overcoming the limitations of the traditional camera-based sensing method as mentioned above, since the three-dimensional foreground information of human can be extracted directly for motion recognition, instead of fully sensing redundant chromatic and facial information. Obdržálek et al. (2012) explores the accuracy and robustness of Kinect pose estimation for demonstrating the significant potential use for motion capture and body tracking in healthcare applications. But, the skeleton fitting algorithms with Kinect device is in a non-anthropometric kinematic model with variable limb lengths. Stone and Skubic (2011) presents the use of Kinect devices for obtaining gait parameters, the extracted temporal and spatial parameters are effective predictors of falls. Two calibrated Kinect devices with orthogonal field of view are used for obtaining the three dimensional point cloud of a person, the extracted gait parameters has high consistency with ground truth data. Mastorakis and Makris (2012) presents the use of a single Kinect devices for obtaining human 3D bounding box and calculates the velocity of width, height and depth. A two-step decision tree is used for correct detection based on velocity parameters. However, the parameters of the decision tree are heavily dependent on extrinsic parameters of Kinect and the training data set. Although the above works can solve the problems of the elderly care in a non-privacy fashion, accurate body extraction from Kinect devices has some limitations. First, the Kinect will fail to depth imaging under certain types of clothing and air condition. Second, the Kinect has the limited field of view with approximately \(60^{\circ }\), multiple devices will be required in a larger monitored region. Finally, Kinect is not supported for tomographic imaging and can not effectively get the entire target’s silhouette with obscuration.

The new research of radio tomographic imaging (RTI) has been conducted by several studies for locating people in surveillance applications (Patwari and Agrawal 2008; Wilson and Patwari 2009, 2010, 2011). The shadow fading of received signal strength (RSS) is a kind of intensity cumulative effect, when the radio frequency signal propagates through obstacles or interested targets. RTI aims at reconstructing the attenuation image caused by targets using the information of shadow fading. RTI is essentially a non-isomorphic computing imaging technology, and has the advantage of supporting sparse signal reconstruction to achieve the goal of compression imaging mode, which can reconstruct the state image of the environment using far less dimension measurements. In particular, the RTI based environmental information sensing can operate in low visibility conditions and complex temperature field, such as the changing illumination and non-metallic objects blocking. It is a complementary mode with the optical, thermal, infrared sensors and other imaging techniques. In addition, the reconstructed attenuation image is coarse and has no chromatic information, thus the consideration on privacy invasiveness is moved. But, the most of existing studies focus on the object location and reconstruction methods. Mager et al. (2013) presents a two-level array of RF sensor nodes at different heights for body detection, and used a hidden Markov model (HMM) to detect fall. However, their method does not support the precise pose imaging and the parameters of HMM are difficult to determine.

Inspired by the advantages of RTI, we will explore the use of RTI based sensing method for fall detection, especially the older adults or patients living alone. The RSS measurements would have statistical shadow losses caused by the obstruction of human body, and the shadow loss is able to be separated from the value of RSS. We use a wireless network organized by a group of radio-frequency (RF) sensors for human pose acquisition in the vertical direction, and then use the non-negative total variation (TV) minimization for reconstructing the gray image of body. RTI belongs to the radio imaging technology and responds only to shadow loss produced by body obstruction. It has promising advantages in overcoming the limitations of the traditional camera-based sensing method as mentioned above, since the body attenuation information is acquired only, instead of fully sensing redundant background and chromatic information. Experimental results demonstrate that the proposed method can benefit the development of low-cost, low-power and accurate devices being more suitable than optical countparts for wireless sensor networks (WSN).

The structure of this article is as follows: Sect. 2 presents the mathematical formulation of the RTI for human pose acquisition. Section 3 details the non-negative TV minimization for image reconstruction. Section 4 describes the experimental setup and results on actual data. Section 5 gives the conclusion.

2 Measurement model

2.1 Brief review on RTI

The task of RTI is to get the distribution image generated by shadow fading in a coverage area. However, the shadow fading can not be acquired from physical measurements directly. The measurement values of shadow fading need to be obtained from the separation of RSS. In the architecture with fixed wireless RF networks, the shadow fading is approximated by the difference value of measured RSS. The difference value of measured RSS is calculated based on the difference between online and baseline RSS value. The baseline RSS value is the measured RSS that does not include the information of shadow fading.

Following the general propagation law of narrow-band RF signal, the RSS from transmitter \(\mathbf t\) to the receiver \(\mathbf r\) has the following model structure in decibels (dB):

$$\begin{aligned} y(\mathbf r ,\mathbf t )=y_{PL}(d_\mathbf{r ,\mathbf t })+y_{SH}(\mathbf r ,\mathbf t )+y_{MP}(\mathbf r ,\mathbf t ), \end{aligned}$$
(1)

where \(y_{PL}(d_\mathbf{r ,\mathbf t })\) is the path loss component only depending on the link distance \(d_\mathbf{r ,\mathbf t }\), \(y_{SH}(\mathbf r , \mathbf t )\) is the shadow fading component generated by body’s occlusion, \(y_{MP}(\mathbf r ,\mathbf t )\) is the multi-path loss caused by the environment (Wilson and Patwari 2010). From the perspective of sensing the environmental distribution of shadow fading, \(y_{SH}(\mathbf r , \mathbf t )\) is the component associated with the shadow fading directly, which can be represented as the projection vector of attenuation image. \(y_{MP} (\mathbf r , \mathbf t )\) is the uncertainty interference for the measurement of the shadow fading, which is related to the deployment of links. \(y_{PL} (d_\mathbf{r , \mathbf t })\) is the component unrelated with shadow fading nor multi-path effects.

Fig. 1
figure 1

Radio tomographic imaging model for body pose sensing, the monitored area is divided into \(28\times 20\) pixels

Based on the above structural characteristic of RSS, the shadow loss \(y_{SH}(\mathbf r , \mathbf t )\) is the key component to guide the measurement. Assuming a monitored network region is given, we can divide it into pixel array according to the imaging accuracy, as shown in Fig. 1. It should be noted that the resolution of monitored area is determined by imaging accuracy. For the task of large-scale person location, a pixel is usually represented as a region with \(50 \times 50\) cm. However, for the local pose imaging, higher imaging accuracy is needed. We will give some test parameters with different accuracies in the experimental section. The monitored area in the Fig. 1 is divided into \(28\times 20\) pixels, and each pixel is represented as a spatial region with \(7.5 \times 7.5\) cm. If we deploy two nodes around the region and the region is divided into an image vector of dimension \(\mathbb {R}^{N\times M}\), the shadowing loss \(y_{SH}(\mathbf r , \mathbf t )\) for this link can be expressed approximately as a sum of attenuation in each pixel. The mathematical form is denoted by:

$$\begin{aligned} y_{SH}({\mathbf r} ,{\mathbf t} )=\sum _{j=1}^{N\times M} M_{{\mathbf {rt}},j}\,x_j, \end{aligned}$$
(2)

where \(x_j\) is the attenuation in pixel \(j\), \(M_{{\mathbf {rt}},j}\) is the measured weight for pixel \(j\) for the link \((\mathbf r ,\mathbf t )\). The measurement vector \(M_{{\mathbf {rt}},j}\) can be approximated by an ellipse with foci at each pair of nodes locations (Patwari and Agrawal 2008; Wilson and Patwari 2010), simplified as

$$\begin{aligned} M_{{\mathbf {rt}},j}= d^{-1/2}_{{\mathbf r} ,{\mathbf t } }\times \left\{ \begin{array}{ll} 1, \quad\text {if}\,d_{{\mathbf {rt}},j}({\mathbf t} ) + d_{{\mathbf {rt}},j}({\mathbf r} )\le d + \lambda ,\\ 0, \quad\text {otherwise}. \end{array} \right. \end{aligned}$$
(3)

where \(d_{{\mathbf{rt}},j}({\mathbf t })\) and \(d_{{\mathbf{rt}},j}({\mathbf r} )\) are the distance from pixel \(j\) to the two nodes of link \((\mathbf r ,\mathbf t )\) respectively , and \(\lambda\) is a tunable width of the ellipse. The elliptical width parameter \(\lambda\) is a tradeoff between modeling error and imaging quality. In the experiments, we also empirically specify the parameter \(\lambda\).

If a RF sensor network organized by a group of RF nodes is given, as illustrated in Fig. 1, the RF signals will be affected by the occlusion of the targets close to the wireless links. We can infer the occlusion of the body from pairwise RSS measurements caused by shadow fading between links, which can be formulated as a linear model uniformly:

$$\begin{aligned} \mathbf {y} =\mathbf M \mathbf {x} + \mathbf {n}. \end{aligned}$$
(4)

The link fading is simplified as a linear combination of values in pixels, and with a plus noise \(\mathbf {n}\). \(\mathbf {y} \in \mathbb {R}^m\) is the RSS measurements. \(\mathbf M \in \mathbb {R}^{m\times (MN)}\) is the weight or measurement matrix of the image vector \(\mathbf {x}\). Each row of the matrix \(\mathbf M\) on the link \((\mathbf r ,\mathbf t )\) can be expressed as a weighted sum of the losses in each pixel.

2.2 RTI for body pose sensing

The existing RTI is widely applied in the field of security and monitoring system, especially for finding the position of persons for indoor or outdoor areas (Patwari and Agrawal 2008; Wilson and Patwari 2010, 2011). However, little work has been focused on its use for sensing the profiling information or silhouette in a special or interested region. An interested region is generally the place where the abnormal activity happens with high risk, such as kitchen, bathroom and bathtub (Tideiksaar 1988). If the RTI network is deployed with a vertical sensing fashion, we can infer the body pose from the reconstructed image.

Fig. 2
figure 2

a Prototype of the sensor network. b Links generated by sensor network

Figure 2 illustrates the proposed sensing method for acquiring the body pose. Usually, the object images of normal pose and fall have different heights, and this difference is apparent. Thus, we deploy the sensor network around the wall with vertical fashion. The approximate height images can be captured by the RTI method. In this article, the sensor network organizes 24 RF sensors to capture the human pose information. There are ten sensors on the both side of the vertical wall respectively, and there are four sensors on the horizontal floor. The lowest sensor in the vertical direction is 15 cm above the ground, and the pitch between any two separated sensors is 15 cm. The pitch between any two separated sensors in on the floor is 42 cm. Therefore, the sensor array is able to acquire a coarse image in the region with total height 150 cm and width 210 cm. (Refer to Fig. 1 for a placement diagram of the proposed sensing network.)

Currently, the near field RF transmission technology has been commercial and off-the-shelf, and the cost will be cheaper with the development of microchip technology. Our proposed sensor network is integrated by the smart system-on-chip (SoC) \(CC2430\) from Texas Instruments. The \(CC2430\) combines the RF transceiver with an industry-standard enhanced \(8051\) micro-controller unit (MCU). After configuring the \(CC2430\), the RF signal will be run on the frequency band of 2.4 GHz and followed by the Zigbee (802.15.4) protocol, which is low data rate and low power consumption compared with other wireless protocol. The scanning rate is set with ten times per second, and then the signals from all links are transmitted to the data center wirelessly for signal processing. Figure 3 presents four typical scenarios of the proposed sensor network for pose sensing and the corresponding approximated ground truth.

Fig. 3
figure 3

Four typical scenarios of the proposed sensor network for pose sensing and the corresponding ground truth. a Standing; b sitting on the ground; c, d lying on the ground

3 Image reconstruction

The task of RTI is to reconstruct the attenuation image of coverage area using the values of RSS. Thus, the image reconstruction task is transformed to solve the under-determined linear equations:

$$\begin{aligned} \mathbf {x}_{LS} = \arg \underset{\mathbf {x}}{\min } \Vert \mathbf M \mathbf {x} - \mathbf {y} \Vert _2^2. \end{aligned}$$
(5)

However, the existing literature found that the measurements matrix \(\mathbf M\) is ill-posed or rank-deficiency, which is highly sensitive to measurement and modeling noise (Wilson and Patwari 2009, 2010, 2011). Thus, Tikhonov regularization method is introduced into the linear model to alleviate the ill-posed problem and makes the inverse problem stable (Wilson and Patwari 2009, 2010). This method has good performance for locating single and double persons in a large-scale coverage. Soon afterwards, Kanso and Rabbat (2009) proposes a compressive RTI method for person location. With the priori assumption that human targets in the monitoring area are sparsely distributed, they cast the imaging problem as sparse decomposition. Absolute shrinkage and selection operator (LASSO) and orthogonal matching pursuit (OMP) are used for image reconstruction. They got better performance for single and double persons location. However, the existing reconstruction methods for RTI aim to find the actual location of targets, and the number of targets is extremely sparse compared with the environment image. This is different from the ground truth we try to reconstruct, as shown in Fig. 3. We use the Tikhonov regularization, LASSO and OMP to reconstruct the body image, and Fig. 4b, d show some typical results. It can be found these methods fail to image reconstruction.

It is not difficult to find the image of body pose is inherently piecewise constant and sparse, as shown in Fig. 3. The gradient of the approximated ground truth image is sparse. Thus, we can use the TV method to measure the discontinuities instead of measuring the sparsity of image. The TV minimization method is able to seek the solution with the sparsest gradient (Chambolle and Lions 1997). Due to the piecewise constant property and sparsity of body image, we first use standard TV minimization to reconstruct the image. The standard TV minimization is denoted as:

$$\begin{aligned} \min TV ({\mathbf {x}}) \,subject\,to \,{\mathbf {Mx}}={\mathbf {y}} \end{aligned}$$
(6)

and \(TV (\mathbf {x})\) is defined as:

$$\begin{aligned} TV (\mathbf {x})=\sum _{j}\big \Vert (\Vert D_{Y}{x_j}\Vert ^{2}+\Vert D_{X}{x_j}\Vert ^{2})\big \Vert _1 \end{aligned}$$
(7)

where \(D_{Y}\) and \(D_{X}\) are the difference operator for the vertical and horizontal direction respectively. The augmented lagrangian and alternating direction method are used to get an efficient algorithm for solving the above TV minimization, more detailed description and well established algorithm are referred to the reference Li (2010). Figure 4e show some results on the four typical scenarios. We can find the reconstructed image has a visible representation for corresponding scenarios. However, the reconstructed image still has large deviation with the approximated ground truth, particularly in the fourth scenario.

Another prior information can be found from the approximated ground truth is the pose image should be equal to or greater than zero. So, we can introduce this non-negative constraint to improve the reconstruction results. The non-negative TV minimization is denoted as:

$$\begin{aligned} \min TV ({\mathbf {x}}) \,subject\,to\,{\mathbf {Mx}}={\mathbf {y}}\,and\,{\mathbf {x}}\ge 0. \end{aligned}$$
(8)

By solving the non-negative TV minimization, we get the reconstructed images as shown in Fig. 4f. It can be found that the non-negative TV minimization makes the reconstructed images sharper by preserving the boundaries more accurately.

Fig. 4
figure 4

Comparison of different methods for image reconstruction, the reconstruction resolution is \(28\times 20\) and \(\lambda =15\) cm. a Typical scenarios; b Tikhonov regularization; c LASSO; d OMP; e general TV Minimization; f non-negative TV minimization

4 Experimental results

4.1 Experimental setup

To validate the proposed sensing method, the experiments were carried out in a laboratory environment. Some kinds of typical normal activities and falls were collected as the samples for evaluation , including standing, sitting on the ground and fall. There were totally four volunteers participated in our data collection, including one female and three males. The height of them ranges from 160 to 180 cm, and the weight of them ranges from 48 to 75 kg. Each volunteer imitated the above activities in the effective sensing area, as shown in Fig. 3. For each category of activity, each volunteer was asked to repeat 30 times at a self-select speed and strategy. Totally, we obtained 360 samples, including 120 fall-simulated samples and 240 normal activity samples.

4.2 Parametric analysis

Reviewing the entire imaging process, there are still two important variables to be analyzed: the link’s ellipse width \(\lambda\) and imaging resolution. We first chose four different \(\lambda\) with range from 7.5 to 30 cm empirically. Then, we divide the monitored scene into three kinds of resolutions, and calculate the normalized mean square error (MSE) between the reconstructed image and corresponding ground truth. The normalized MSE can be described as:

$$\begin{aligned} MSE =\frac{\sum _{M,N}[\mathbf I _{r}(m,n)-\mathbf I _{g}(m,n)]^2}{M\times N}, \end{aligned}$$
(9)

where and \(\mathbf I _{g}(m,n)\) are the pixels in the reconstructed and ground truth images respectively. \(M\) and \(N\) are the dimension of rows and columns in the images.

Figure 5 presents the reconstruction results with different \(\lambda\) and imaging resolutions using Tikhonov regularization, LASSO, OMP, general TV minimization and non-negative TV minimization respectively. Obviously, the reconstructed images using Tikhonov regularization, LASSO and OMP have greater distortion than general and non-negative TV minimization. The non-negative TV minimization is able to achieve the best image reconstruction. The reconstruction accuracy has a slight change in the imaging resolution direction. But, following the direction of ellipse width, the reconstruction accuracy has the similar and better performance on \(\lambda =15\) cm and \(\lambda =22.5\) cm. Therefore, we choose the parameters \(\lambda =15\) cm and \(\lambda =30\) cm for fall detection comparison.

Fig. 5
figure 5

Reconstruction accuracies with different \(\lambda\) and imaging resolutions using: a Tikhonov regularization, b LASSO, c OMP, d general TV minimization and e non-negative TV minimization

4.3 Recognition method and results

Figure 6 gives the flowchart of the fall detection method developed here. Based on the above analysis of the parameters with different reconstruction method, we can find the non-negative TV minimization gets the best performance. In the following experiments, we only use the non-negative TV minimization to reconstruct the body images. Then, the experiments are divided into two stages: the training samples selection and the testing stage. In the training samples selection, half of the total reconstructed samples are randomly selected as the reference database. After these reference samples are selected, we use the rest of the samples for testing. In the testing stage, each test sample is matched against the reference sample. The fall activity is distinguished from the normal ones according to the mean square error based nearest neighbor criteria.

Fig. 6
figure 6

Flow chart of fall detection method

Tables 1 and 2 show the fall detection accuracy with different \(\lambda\) and resolutions. We calculate the average normal and fall pose recognition rate based on 100 times cross-validation. It can be found the proposed method is able to detect the normal pose and fall accurately, and the misclassification rate is low. The relevant algorithms in our experimental studies are run on an Intel Pentium 4 2.8 GHz computer by Matlab codes. Even in the case of resolution of \(56 \times 40\), the longest detection time will not exceed \(0.4\) s.

Table 1 Recognition rate with the \(\lambda =15\) cm
Table 2 Recognition rate with \(\lambda =30\) cm

4.4 Discussion

Some experimental results of related vision-based methods are listed in Table 3 for comparison. The first one related to vision-based methods is conducted by Lee and Chung (2011). In their study, the system used a video sensor to detect falls. Specifically, a weighted subtraction between consecutive difference images and a motion history image on temporal templates was employed to classify falls. However, the feature extraction is heavily dependent on illumination condition. Another study related to vision-based methods is conducted by Sixsmith et al. (2005). A SIMBAD system using low-cost array of infrared detectors was proposed to detect falls. Specifically, a neural network was employed to classify falls using the extracted vertical velocity information. However, the vertical velocity could not be sufficient to discriminate a real fall from other similar activities and the normal activities. More importantly, the feature extraction of vertical velocity is view-dependent. The third study related to vision-based methods is conducted by Mastorakis and Makris (2012). They used depth map image generated by Kinect to extract 3D bounding box and measured the velocity of box’s width, height and depth. A two-step Boolean decision tree was introduced for fall detection and the experimental results were ideal with no false positives. However, their method is also view-dependent and the parameters of recognition algorithm need to be fully trained.

Table 3 The average experimental results

The primary insight of our method is that by dividing the monitored region into distinct sensing pixel and acquiring the attenuation variation of them, it is an efficient method for fall detection. In other words, the variation of each pixel is an effective feature for pose representation. The output of the RTI network reflects the spatial characteristics of different human poses. As a result, the proposed method gives rise to two prominent advantages. First, the recovered gray images produced by RTI give stable human pose cues, which can capture richer discriminative signatures and contribute to performance improvement for fall-detection tasks. Second, it is flexible to use the proposed method in a distributed yet large-scale sensing system and WSN to cover wider range scenes. Encouraging experimental results demonstrate the effectiveness and efficiency of our approach.

5 Conclusion

In this article, we present a RTI sensing method for detecting fall among the older adults. We use a wireless network organized by a group of RF sensors for human pose acquisition in the vertical direction, and then use non-negative TV minimization for reconstructing the gray image of body. This work is not only advantageous in providing a low-cost, noninterference with the lighting condition and non-invasive privacy sensing method, but also it can boost a coverage scalable, easy construction and energy saving sensor network for healthcare applications. While the proposed method does not help to prevent falls or decrease the number of falls, it may provide a sense of comfort and reassurance for the older adults. If an emergency occurred, the immediate help and care would be available to them.

An intriguing question for practical application is how to use the proposed method at the indoor nursing home or medium-size house. Although the monitored region by proposed method is limited, the RF sensor network is specifically suitable for sensing human pose in the field with high fall risk. Tideiksaar (1988) presents some typical places where fall takes place easily, such as the places with wet floor surfaces (i.e. kitchen, bathroom, bathtub and toilet). It should be noticed the existing visible optics and active infrared technique are unavailable or unstable in these specific places. Benefitted from the nature of RF based measurement, the proposed sensing method will not be influenced by changing illumination, moist air and furniture obscuration. In addition, the cost of a commercially available RF sensor is low (less than 5 dollars) and will be cheaper with the development of microchip technology. It is easy to increase the number of sensors to cover a larger area of monitored region. Another limitation of this article is the proposed method only supports single use. Future work will focus on further exploring the three dimensional RTI networks for multiple body poses sensing and holographic attenuation images reconstruction.