1 Introduction

Falling among the elderly has always been an important healthcare issue. Each year, one in every three adults aged 65 years and above falls [1]. In Malaysia, population aged 65 and above has increased from 4.3 % in 2007 to 4.5 % in 2009 and the elderly population is expected to increase to 7 % by 2020 [2, 3]. As in other Western countries, this population grows at a larger rate each year [4]. Since majority of the elderly population live in private house, falls can be a major risk for those who live alone as injuries from a fall can be life threatening. Report shows that 2.2 million nonfatal fall injuries among older adults in United States of America were treated in emergency departments and more than 581,000 of these patients were hospitalized in 2009 [5]. A serious fall at home may lead to the risk of death or “post-fall syndrome” if the person cannot call for help immediately after a fall. In light of this, development of fall detection system has gained attention in recent years as this system can alert paramedic personnel in time after a fall has occurred in order to prevent post-fall consequences and save the life of the elderly.

Fall detection systems available in the market are mostly worn-sensor based [6, 7], which are basically electronic devices whereby the elderly need to wear or put in the pocket. These wearable fall detectors normally use accelerometer or manual help button as the sensor to detect a fall. However, these wearable fall detectors have a few drawbacks. One of the weaknesses for such detectors is that the elderly people may forget to wear them and help buttons are useless if the person is unconscious after falling down.

The recent advancements in computer vision technology have brought together a new solution to overcome these drawbacks. One of the main advantages of visual-based fall detection is that such system does not require a person to wear anything, and it is less intrusive compared to the wearable sensor. Moreover, computer vision system provides more information on the behavior of a person compared with the normal wearable sensors. With this, visual-based home monitoring system is able to give information on falls and also other activities of daily living behaviors which are useful for home healthcare monitoring, such as medication intake, mealtime, and sleep duration.

This paper is organized as follows. Section 2 presents topics related to visual-based fall detection techniques with special emphasis on techniques that are based on human shape variation. Section 3 discusses the proposed simple visual-based fall detection technique, in which a number of essential points are used to model the human shape and the methodology to detect falls by analyzing the point features. Section 4 discusses the performance of the proposed techniques and compares its performance with the conventional fall detection techniques. Finally, Sect. 5 provides a conclusion of our work and discusses some possible future research directions.

2 Related work

In recent years, due to the advancements in computer vision technology, some work has been done in visual-based fall detection [6]. One of the main challenges for visual-based fall detection system is how to maximize the fall detection rate with minimal computational complexity. Although there are complex algorithms [8, 9] that can give very high accuracy in fall detection, these techniques normally require high processing power for real-time video processing which may not be practical for real-time practical deployment.

One of the conventional methods of detecting a fall from the surveillance video is to analyze the person’s bounding box in a single image [10, 11]. The bounding box method is simple and easy to implement. However, this method works effectively only when the surveillance camera is placed sideways or at the same level as the human object. The accuracy of this technique depends on the relative position of the person and the field of view of the camera and can fail due to occluding objects. Tao et al. [12] used the aspect ratio of the bounding box to detect a fall and placed two cameras at different heights in a room to test the accuracy of their system. All simulated falls were detected for the camera placed at the table while only some of the falls were detected for the camera mounted on the wall.

Some researchers mounted the surveillance camera on the wall instead of placing it at the same level as the human object to have a larger field of view and less occluding objects [13, 14]. Since bounding box could not efficiently discriminate “fall-down” from “fall-like” activities under an oblique setting of camera, ellipse features were later introduced with improved results [13, 14]. Rougier et al. [13] and Chen et al. [14] combined human shape analysis with other analyses, namely motion analysis and posture estimation analysis in their approaches to detect falls. In their human shape analysis, they represent the human in the video by using ellipse shape. In comparison with the bounding box method, ellipse shape–based approach in [13, 14] gives a better representation of human shape and good accuracy in fall detection, but some lure activities like sitting down brutally and squatting down brutally in parallel with the camera optical axis are still detected as fall. Moreover, ellipse features alone are not sufficient to reduce the false alarm rate to an acceptable range [14]. Therefore, combination of other analysis was used to increase the specificity of the system at the expense of increased computational complexity [13, 14].

To address this problem, we propose a novel visual-based fall detection technique that is based on human shape analysis. Our proposed technique represents the person in the video sequence by using three points instead of the conventional ellipse or bounding box. From the three points, two lines are formed and features extracted from these lines are used to detect a fall. The main contribution of this paper is a low computational complexity algorithm for human shape analysis to reduce the execution time required to process one frame of video while still maintaining high fall detection accuracy.

3 Our approach

Human shape is one of the simple features used by many algorithms in detecting a fall by using surveillance video. When a person falls, the human shape will change rapidly while during normal routine daily activities, the human shape will change slowly, and this is the principle used by our proposed technique. Due to the weaknesses of the bounding box and the approximate ellipse mentioned in Sect. 2 above, we chose to model the human shape by a number of essential points. Extraction of human body points from video frame is simple and less complicated, in comparison with the conventional ellipse technique. The point representation used in our approach can provide good information on the orientation and the height proportion of a person. Background subtraction method is used to detect the person in the video sequence. From the foreground detected, three points that represent different regions of a human body, namely the head, body, and legs, are computed. Based on the computed three pointes, features such as change of orientation, sum of the heights, and ratio of heights are then derived and used to analyze shape change of the human. Figure 1 shows the general block diagram of the proposed technique.

Fig. 1
figure 1

General block diagram of our approach

3.1 Human detection

Our background subtraction approach is based on median filtering method [6]. The background of the scene and the moving person is detected by finding the difference between the incoming frames with the background model. In comparison with background subtraction by using mixture of Gaussian [6], our approach of using median filtering method has lower computational complexity and provides relatively good object detection.

3.2 Our proposed three-point human shape representation

The foreground detected is represented by three different points, which are the centroids of three different regions of the foreground region, as shown in Fig. 2. The bounding box of the foreground blob is first computed, after which the bounding box of the blob is divided into three portions with the ratio of 30:40:30 %, as depicted in Fig. 2. The ratio of 30:40:30 % is a preliminary estimate to distinguish between the upper, mid-, and lower body parts. Let these regions be \(R1, R2, \text{ and }\, R3\). The heights, \(h_{R1},\, h\,_{R2,}\,h_{R3}\), and the widths, \(w_{R1},\, w_{R2},\,w_{R3}\), of \(R1,\, R2, \text{ and }\, R3\) are calculated as:

$$\begin{aligned} h_{Ri}&= \left\{ \begin{array}{ll} (0.4i-0.1i^{2})H&{} { if}\;H>W\\ H&{} \text{ otherwise } \end{array}\right. \nonumber \\ W_{Ri}&= \left\{ \begin{array}{ll} W&{}{ if}\;H>W\\ (0.4i-0.1i^{2})W&{} \text{ otherwise } \end{array}\right. \end{aligned}$$
(1)

where \(i\) = 1, 2, 3. \(H\) and \(W\) are the height and the width of the bounding box, respectively.

Fig. 2
figure 2

Illustration of the proposed three-point human body shape representation

Since \(h_{R1},\, h_{R3},\, w_{R1},\, w_{R3}\), the starting point and the end point of the bounding box, are known, pixels in the blob that lie between these points can be used to calculate the centroids in these regions. The coordinates of the centroids are computed by:

$$\begin{aligned} g_{Rix}&= \frac{1}{N_{Ri}}\sum \limits _{l=1}^{N_{Ri}} {x_l},\quad i=1,2,3\end{aligned}$$
(2)
$$\begin{aligned} g_{Riy}&= \frac{1}{N_{Ri} }\sum \limits _{l=1}^{N_{Ri}} {y_l},\quad i=1,2,3 \end{aligned}$$
(3)

where \(N_{Ri}\) is the number of foreground pixels in the region Ri. The centroids are points \((g_{R1x},\, g_{R1y}),\, (g_{R2x},\, g_{R2y})\), and \((g_{R3x},\, g_{R3y})\) for the regions \(R1,\, R2\, \text{ and } \,R3\), respectively.

From the three centroids, we consider two lines, one from \(P1\) to \(P2\) and another from \(P2\) to \(P3\), as shown in Fig. 2. Since each line represents half portion of the foreground, any changes in the distance and the orientation of the lines can indicate a change in the shape of the person in the image. The distances and orientations of each line are then computed for shape analysis. The distances, \(D1\) and \(D2\), between the points are given as:

$$\begin{aligned} D1&= \sqrt{\left( {g_{R1x} -g_{R2x} } \right) ^{2}+\left( {g_{R1y} -g_{R2y} } \right) ^{2}}\end{aligned}$$
(4)
$$\begin{aligned} D2&= \sqrt{\left( {g_{R2x} -g_{R3x} } \right) ^{2}+\left( {g_{R2y} -g_{R3y} } \right) ^{2}} \end{aligned}$$
(5)

The angle between the horizontal axis \(x\) and the line formed gives the orientation of line can be computed as follows:

$$\begin{aligned} \theta _1&= \arctan \left( {\frac{g_{R1y} -g_{R2y} }{g_{R1x} -g_{R2x} }} \right) \end{aligned}$$
(6)
$$\begin{aligned} \theta _2&= \arctan \left( {\frac{g_{R2y} -g_{R3y} }{g_{R2x} -g_{R3x} }} \right) \end{aligned}$$
(7)

where \(\theta _{1}\) and \(\theta _{2}\) are the orientations of the line formed by \(P1\) and \(P2\) and the line formed by \(P2\) and \(P3\), respectively.

3.3 Fall recognition based on human shape analysis and inactivity period

Our approach in fall recognition is based on the fact that different changes in human posture, such as a change from standing to sitting down or a change from standing to falling, will have different changes between the upper portion and the lower portion of the human body, as shown in Fig. 3. By analyzing the shape change in these two portions of a person, we can distinguish a fall from normal daily activities. For this purpose, we compute the ratio of the distance, \(p=D1/ D2\), and the difference between the line orientations, \(\theta _{1}\) and \(\theta _{2}\). Figure 4 shows the pseudocode of the fall recognition algorithm of our proposed technique.

Fig. 3
figure 3

Example of change of orientations and shape dimensions at the upper portion and lower portion of a person during posture change: a from standing to sitting down, b from standing to falling down

Fig. 4
figure 4

Pseudocode of the shape analysis algorithm of our proposed technique

As the shape of the upper portion and the lower portion of human body does not change much in most of the daily activities, the value of the ratio, \(p\), will be 1 at most of the time. By computing the ratio of the distance, \(p\), for several video sequences consisting of daily activities and simulated falls, we have Fig. 5, in which the value of the ratio will only change from value 1 to other values during a fall and some normal daily activities like squatting down, crouching down, and walking. Thus, we consider there is a possible fall if the ratio of the distance, \(p\), suddenly changes from value 1 to another value.

Fig. 5
figure 5

Example of change in the ratio of the distances, \(p\), during: a falling, b walking, and c crouching-down event

The upper and the lower parts of the human body will have similar orientation for standing and lying poses. Since the shape of a falling person will change from standing to lying, the difference between the line orientations themselves before a fall and after the fall will be small. Therefore, we check the difference between \(\theta _{1}\) and \(\theta _{2}\) for each frame. \(\theta _{1}\) and \(\theta _{2}\) are considered to be similar to each other if the difference between them is less than \(10^{\circ }(\Delta \theta < 10^\circ )\).

First, ratio of the line distance from the previous frame, \(p_{t-1}\), is compared with the ratio of line distance of the current frame, \(p_{t}\). If there is no possible fall (\(\Delta p = 0\)) and the line orientations are similar, \(\theta _{1}\) is stored as a reference angle, \(\theta _{r}\), and the sum of \(D1\) and \(D2\) is stored as length reference, \(D_{r}\). Whenever there is a possible fall (\(\Delta p > 0\)), we will search for line orientations, \(\theta _{1}\) and \(\theta _{20}\), that are similar to each other at the 10th frame after the possible fall and store them as \(\theta _{N1}\) and \(\theta _{N2}\), respectively. Ten frames are used as a time interval for fast movement in our test videos. Other number of frames can be set to cope with frame rate of the video sequences. We then compute the difference between each of the most recent stored line orientations with the reference angle, as follows:

$$\begin{aligned} \theta _{D1}&= \left| {\theta _{N1} -\theta _r } \right| \end{aligned}$$
(8)
$$\begin{aligned} \theta _{D2}&= \left| {\theta _{N2} -\theta _r } \right| \end{aligned}$$
(9)

where \(\theta _{D1}\) is the orientation difference between \(\theta _{N1}\) and \(\theta _{r}\) while \(\theta _{D2}\) is the orientation difference between \(\theta _{N2}\) and \(\theta _{r}\). To overcome the problem of mapping two portions of the human body before a fall and after a fall, the mean of these two orientation differences, \(\mu _{\theta }\), is computed. We then consider a fall detected if \(\mu _{\theta }\) is more than 30\(^{\circ }\).

However, there are falls where the value of \(\mu _{\theta }\) can be less than or equal to 30\(^{\circ }\). Therefore, under this condition, we check the difference between the total line length at the 10th frame after a possible fall and the length reference, \(D_{r}\). Based on our observations, the height of the person will reduce drastically during a fall in the field of view of the camera, as depicted in Fig. 6. The change in sum of the length of the lines after a possible fall, \(D_\mathrm{diff}\), is given by:

$$\begin{aligned} D_\mathrm{diff} =\left| {D_r -\left( {D1+D2} \right) } \right| \end{aligned}$$
(10)

A fall is detected if \(D_\mathrm{diff}\) is larger than 40 % of the length reference, 0.40*(\(D_{r}\)) pixels. If the line orientations, \(\theta _{1}\) and \(\theta _{2}\), are not similar to each other at the 10th frame after the possible fall, the system considers it as no fall is detected. Figure 7 shows some examples of daily activities and simulated falls with their corresponding changes in \(p,\, \theta _{1}\) and \(\theta _{2}\).

Fig. 6
figure 6

Changes in the total distance of the human body during some fall events. Example of the change in the total distance of the human body during a fall which is parallel to the camera optical axis

Fig. 7
figure 7

Examples of daily activities and simulated fall with their corresponding changes in \(p,\,\theta _{1}\), and \(\theta _{2}\). a Example of detecting a fall where the difference between the line orientations before the fall and after the fall is greater than 10\(^{\circ }\), followed by no movement of the person after 5 s. b No possible fall is detected for sit-down activity as the ratio of the distances is still 1 throughout the activity. c Similar case with (b) where no possible fall is detected for squat-down activity. d Example of error detection where a fall is detected due to the change in the mean of orientation difference is greater than the predefined threshold value during crouch-down activity

A fall will end with an inactivity period if the person is immobilized or unconscious after the fall. The last verification of our approach is to check whether there is any movement of the person after a possible fall. A fall is confirmed if the following condition is fulfilled:

  1. 1.

    Change in distance moved by the centroid, \(P2\), is smaller or equal to 5 pixels for 5 s. For testing purpose, 5 s was chosen to be the duration of the inactivity period in our approach. A longer duration of the inactivity period can be used to ensure the person is completely unconscious after a fall.

4 Experiment results

Our system was implemented using MATLAB on a PC using Intel Core i3 2.13 GHz CPU with 4 GB RAM. All test video data were acquired from an uncalibrated IP camera (Dlink DCS-920) through Wi-Fi connection in MJPEG format at a resolution of 320 \(\times \) 240.

Our test video data consist of video sequences of 30 daily activities such as walking, sitting down, crouching down, and squatting down, and 21 simulated falls such as backward falls, forward falls, sideway falls, and falls due to loss of balance. Our fall video data set is available at http://foe.mmu.edu.my/digitalhome/FallVideo.zip. Figure 8 shows that our proposed technique is able to track the human correctly in the presence of other moving objects by setting a minimum area threshold to differentiate human from other moving objects, for example, pets, with the fact that the size of a normal adult is always larger than the sizes of the pets.

Fig. 8
figure 8

Example of the correct tracking of human in the presence of other moving object during: a walking and b falling. In this case, the moving robot on the ground can be treated as a pet

Table 1 shows the experiment results of our proposed fall detection technique. Two fall incidents were not detected because the human body of the person was in a straight line during these falls, as shown in Fig. 9. Thus, the ratio of the distances, \(p\), remained as 1 (\(\Delta p = 0\)), and the system considered there is no possible fall. On the other hand, two brutally crouch-down activities were detected as fall because of the sudden change in the ratio of the distances, \(p\), from 1 to 2, and due to the change in the mean of orientation difference is greater than the predefined threshold value (\(\mu _{\theta }> 30^\circ \)). Overall, our proposed system can achieve high accuracy in fall detection with a detection accuracy of 90.5 % and false alarm rate of 6.7 %. The operating characteristics of the proposed technique are tabulated in Table 2. The threshold values for mean orientation difference, \(\mu _{\theta }\), and post-fall total length difference, \(D_\mathrm{diff}\), from Sect. 3.3 above are defined based on the result in Table 2. Instead of 40\(^{\circ }\) or 50\(^{\circ }\), 30\(^{\circ }\) is chosen as the threshold value for \(\mu _{\theta }\) because a smaller angle difference gives a bigger chance of detecting falls which are almost parallel to the camera optical axis.

Table 1 Fall detection results of our proposed technique
Fig. 9
figure 9

Example of the no changes in the ratio of the distances (\(\Delta p = 0\)) during a fall

Table 2 Operating characteristics of our proposed technique

We compare our proposed technique with the bounding box ratio analysis approach [10], ellipse shape analysis approach [13], and Chen’s approach [14] in terms of time complexity. The time complexity of our proposed technique is similar to the time complexity of the bounding box ratio analysis approach [10] and the ellipse shape analysis approach [13], which is \(\text{ O }(n)\) complexity where \(n\) is the pixel resolution of one video frame. The time complexity of Chen’s approach [14] is \(\text{ O }(n \log n)\). Table 3 compares the total number of primitive operations required to extract the features for shape analysis in one video frame for those approaches that have the order of \(n\) time complexity, \(\text{ O }(n)\). Other than comparing the time complexity of these techniques, Table 4 compares our proposed technique with other four human fall detection approaches in terms of fall detection rate, false alarm rate, and execution time. Results for the bounding box ratio analysis approach [10] and the ellipse shape analysis approach [13] in Table 4 were obtained through simulation from the same video data used in testing our proposed technique, while the results for Chen’s approach [14] and Biomechanics approach [7] were obtained from their original paper.

Table 3 Comparison of the total number of primitive operations executed for approaches having \(\text{ O }(n)\) time complexity
Table 4 Comparison of our proposed technique with four human fall detection approaches in terms of fall detection rate, false alarm rate, and execution time

It can be observed from the time complexity of each approach and Table 3 that our proposed technique has lower computational complexity compared to Chen’s approach [14], which has high accuracy in fall detection and low false alarm rate due to the combination of two different analyses used, but having nearly the same computational complexity as the bounding box ratio analysis approach [10] and ellipse shape analysis approach [13]. Despite having the same time complexity with the two shape analysis approaches [10, 13], our proposed method, as shown in Table 4, can achieve better performance in terms of detecting falls and distinguishing a fall from normal daily activities. As compared to Chen’s approach [14], our proposed technique results in similar performance in terms of fall detection rate and false alarm rate, but with a lower computational cost. Table 4 also summarizes that the accuracy of using biomechanics approach [7] to detect typical falls still surpasses the accuracy given by visual-based human fall detection approaches. Furthermore, biomechanics approach does not have the issues of occlusion and varying illumination. However, visual-based fall detection system can be treated as the complement to the existing fall alert products to overcome the weaknesses that fall alert products have, which is the elderly might forget to wear the fall detection sensor.

5 Conclusions

In this work, we presented an improved visual-based fall detection technique with high detection accuracy. Our proposed technique simplifies the ways to represent human shape by using three centroids of different regions of the human body instead of the conventional bounding box or an ellipse. This method is able to give information on the changes in the upper portion and the lower portion of the human body. Experiment results indicate that our proposed technique can achieve high accuracy for human fall detection in real-time indoor video sequences.

In our future work, we will look into ways to improve the accuracy of our proposed technique in varying lighting condition and methods to cope with occlusion. We will explore the possibility of reducing the false alarm rate of our system by adding low computational head tracking algorithm in our proposed technique.