1 Introduction

The world is now experiencing an increasing adulthood population with the number of people aged 65 and elder existence the fast growing group. This aging population is expected to stress health care facilities worldwide as aging people are prone to chronic diseases like diabetes, heart disease, and arthritis [9, 12, 14]. An alternative aspect of the increased health care cost is the increasing number of falls—indeed, falls can be the most common injury causing event in elder people [37]. These falls might lead to injury of moderate for severing nature in the persons facing the falls and can result in reduced mobility, particularly for aging people. Moreover, after the primary fall, the probability of facing further falls enhances and might result in mental stress as a consequence of post-fall syndrome [17]. Falls occur because of various reasons, for example, cognition or chronic conditions like Parkinson’s disease or arthritis, onset of sudden health conditions like a heart attack, and result from reduced sensory capabilities namely vision impairment. In all the scenarios, it can be relevant that once a person experiences a fall, it can be provided care immediately [7, 32]. The Fall Detection System (FDS) plays a major role in the contribution toward timely care provision by warning medical staff [22]. FDS, for instance, is needed for elder persons with cognitive impairment who are unable to get up after a fall for longer period, causing pressure sore and other effects, based on Brayne and Fleming [41].

One commercial technique to Prevent falls is the usage of own Emergency Response System (PERS) [1]. In many instances, the PERS is helpful and rendered worthless once the person is unable for reaching the button or is completely unconscious [26, 33]. A passive monitoring approach has been introduced for identifying falls due precisely and effectively to the problem related to the PERS [8]. Current development in deep learning and signal processing applications of imagery in different optical sensors are drawn considerable interest from the researchers [4]. Along with accomplished remarkable breakthroughs and significant progress in the fields of smart video surveillance and proved to be an effective mechanism in different monitoring schemes that are depending on semantic data extraction (pose estimation, human detection, anomaly detection, and motion tracking) [39]. Due to high-quality RGB cameras and the current abundance of large datasets, there is a remarkable development in CC and computer power which is managed in deep neural networks to play a vital role in smart surveillance systems [27].

The motivations behind this research are several. Falls are a major cause of injury and death among the elderly population, and the ability to quickly detect and respond to a fall can potentially save lives. Additionally, falls can be an indicator of underlying health problems, so the ability to detect falls could be used for early intervention and prevention. By using deep learning techniques such as convolutional neural networks (CNNs), the proposed model aims to improve the accuracy of fall detection and classification. CNNs are particularly well-suited for image-based classification tasks, and have shown promising results in previous studies on fall detection.

This study designs a new Inception with deep convolutional neural network based fall detection and classification (INDCNN-FDC) model. The proposed model intends to categorize the events into two class labels namely fall and not fall. To accomplish this, the model carries out two stages of data pre-processing: GF based image sharpening and GIF based image smoothing. In addition, this model applies deep transfer learning based Inception v3 model for generating a helpful group of feature vectors. Finally, DCNN technique obtains the feature vector as input and performs fall detection process. The experimental analysis of this approach is validated on benchmark dataset. In short, the paper contributions are given as follows.

  • Develop a new INDCNN-FDC model for fall detection and classification.

  • Performs two stages of image preprocessing: GF based image sharpening and GIF based image smoothing

  • Employ Inception v3 model to produce a collection of feature vectors.

  • Develop DCNN model for the detection and classification of fall/non-fall events.

The rest of the paper is planned as follows. Section 2 provides literature review and Section 3 introduces the proposed model. Next, Section 4 presents performance validation and Section 5 draws conclusion.

2 Literature review

Waheed et al. [36] introduce a noise tolerant Fall Detection System (FDS) execution in existence of missing values in data. This study concentrates on DL especially RNNs with an underlying Bidirectional LSTM (BiLSTM) stack for implementing FDS related to wearable sensors. Tsai and Hsu [34] devise an FDS in grouping the conventional method with the NN. Firstly, the author presents a skeleton information extracting method that converts in-depth data into skeleton data and derives the significant joints based on fall actions. Along with that, the author altered the skeleton-related approach with 7 highlight feature points. Secondly, the author formulates a highly robust deep CNN structure that makes use of a pruning technique for minimizing calculations and parameters in the network.

Cai et al. [3] present an FDS related to dense blocks that have a multi-channel convolutional fusion (MCCF) technique. In this technique, MCCF-DenseBlock, a densely connected layer architecture, can be formulated to completely derive data with their densely connected layer and for evading network overworking by breaking dense connections properly, particularly for diminishing several variables and data redundancy from the network through MCCF technique merging its assembled features. In [25], the author considers DL for fall detection (FD) in a fog computing and IoT environment. The author presents a CNN made up of 3 fully-connected layers, 3 convolutional layers, and 2 maxpool layers as the DL method.

Vaiyapuri et al. [35] devise an IoT-assisted elderly fall detecting technique utilizing optimal deep CNN (IMEFD-ODCNN) for smart homecare. The aim of the IMEFD-ODCNN approach was to allow intellectual DL methods and smartphones for identifying the existence of falls from the smart home. Also, SqueezeNet method can be used as a feature extracting approach for deriving proper feature vectors for FD. Moreover, utilizing the Salp Swarm Optimization (SSO) approach, the hyperparameter tuning of the SqueezeNet method can be carried out. At last, sparrow search optimization algorithms (SSOAs) having variational autoencoders (VAEs), named SSOA-VAE related classifiers can be used for classifying non-fall and fall events. Khraief et al. [15] designed weighted multi-stream deep CNN which uses the rich multi-modal data offered by RGB-D cameras. This approach recognizes fall events automatically and transfers a help request to caregivers.

In [20], the author discussed the model of a software structure related to RNN that is highly effective in FD while running completely onboard wearable devices. The renowned and publicly accessible SisFall data has been extended and adapted with finely grained temporal annotations that deal with the supervised training of RNNs. Nogas et al. [21] introduce an innovative structure, DeepFall that develops the fall detecting issue as an anomaly detecting issue. The DeepFall structure offers the new utility of deep spatio-temporal convolutional AE for learning spatio- temporal features in normal actions utilizing non-invasive sensing modality. The author proposes a novel anomaly score technique that integrates the reconstructing score of frames over a temporal window for detecting unnoticed falls.

Several studies have explored the use of deep learning techniques, including convolutional neural networks (CNNs), for fall detection and classification. For example, Mubashir et al. [19] developed a fall detection system for the elderly using CNNs. Sangeethaa SN et al. [24] combined CNNs and long short-term memory (LSTM) networks for fall detection. Jeyakumar et al. [13] proposed a real-time fall detection system using deep learning. Zeng et al. [43] developed a fall detection system using wearable devices and a deep learning-based algorithm. Li et al. [16] used transfer learning to develop a fall detection system using a CNN.

The Inception architecture, which was introduced by Szczęsny et al. [30] has been shown to be effective for image classification tasks. Inception-v4, a variant of the Inception architecture, was proposed by Szegedy et al. (2017) [28] and demonstrated improved performance compared to earlier versions of the architecture. Other studies have explored the use of the Inception architecture for specific image classification tasks, such as the classification of skin lesions [31] and diabetic retinopathy [11].

The proposed research aims to combine these two areas of study by developing a fall detection and classification system based on the Inception architecture and deep CNNs. The system is intended to improve the accuracy and efficiency of fall detection and classification compared to previous models.

The Inception architecture, which is a deep CNN model designed to improve the accuracy and efficiency of image classification, is used as a base model in this research. By designing a fall detection and classification model based on Inception, the proposed system can potentially achieve higher accuracy and faster processing speeds compared to previous models. Improving the accuracy and speed of fall detection and classification has several potential benefits, including reducing healthcare costs, improving quality of life for the elderly, and providing valuable data for healthcare providers and researchers.

3 The proposed model

In this study, an effective INDCNN-FDC model was introduced for effectual FD and classification process. The proposed model intends to categorize the events into two class labels namely fall and not fall. Development of a deep CNN-based fall detection and classification system: The researchers proposed a novel deep CNN architecture based on the Inception network that takes advantage of multi-scale features and reduced computational complexity to improve fall detection and classification performance.

Comparison against the state-of-the-art: The use of benchmark models provides a standardized way of comparing the performance of the proposed method against other state-of-the-art methods in the field. By comparing against a variety of methods, including evolutionary techniques, the authors can demonstrate the superiority of their proposed method and provide insight into its strengths and weaknesses.

Evaluation of different types of methods: The benchmark models used in the study represent a variety of different types of fall detection and classification methods, including rule-based, machine learning-based, and evolutionary-based methods. By evaluating different types of methods, the authors can gain a better understanding of the strengths and weaknesses of each approach and how they compare to the proposed method.

Demonstration of effectiveness: By comparing against several benchmark models, the authors can demonstrate the effectiveness of their proposed method in achieving high accuracy in fall detection and classification. This can help to build confidence in the proposed method and provide evidence of its potential usefulness in real-world applications.

The main contribution of the research article includes: 1. Introduction of pre-processing filters to improve fall detection accuracy: The researchers used Gaussian filter and Guided filter pre-processing steps to remove noise and reduce the dimensionality of input images, which improved the accuracy of the fall detection and classification model. 2. Evaluation of the proposed model on a large and diverse dataset: The researchers evaluated the performance of the proposed model on a large dataset of real-world fall and non-fall videos captured in different environments, which demonstrated the robustness and generalizability of the proposed approach. 3. Comparison with state-of-the-art methods: The researchers compared the performance of the proposed fall detection and classification system with several state-of-the-art methods and demonstrated that the proposed approach outperformed them in terms of accuracy, speed, and computational complexity. 4. Potential for real-world applications: The proposed fall detection and classification system has potential for real-world applications in healthcare, elderly care, and home automation, where it can be used to automatically detect falls and alert caregivers or emergency services.

Figure 1 portrays the overall working process of INDCNN-FDC approach. Here, the model carries out two stages of data pre-processing: GF based image sharpening and GIF based image smoothing. In addition, the proposed model applies deep transfer learning based Inception v3 model for generating a helpful group of feature vectors. Finally, DCNN method obtains the feature vector as input and performs FD process.

Fig. 1
figure 1

Overall working process of INDCNN-FDC approach

3.1 Image pre-processing

At the initial stage, the INDCNN-FDC approach carries out two stages of data pre-processing:

  • GF based image sharpening and

  • GIF based image smoothing.

3.1.1 GF based image sharpening

Digital images can include different artefacts with noise. Due to poor capture, The simplest thresholding problem can be challenging. Therefore, it is essential to discard visible noises from an image [10]. A color statistics or picture unpredictable lighting are represented as “image noise.” Noises are Gaussian, grain, salt and pepper, periodic, quantization, and another form of noises are noted in the picture. Wiener and median filters are used to eliminate this annoyance. For noise reduction, several morphological activities could be exploited. Median filtering is used to adjust the pixel brightness whereas GF is used to smooth pictures and to remove noise. The smoothing of an image with image border can be accomplished by using these methods. It is smoothed with Gaussian kernel on the basis of cumulative standard deviation. The standard deviation (s) and the Gaussian function \(G(x)\) are represented in the following [5]:

$$\begin{array}{c}G\left(x\right)=\frac{1}{\sqrt{2\pi {\sigma }^{2}}}{e}^{\left(\frac{{x}^{2}}{2{\sigma }^{2}}\right)},\\ \sigma =\sqrt{\frac{{\sum }_{i}({X}_{i}-\overline{X}{)}^{2}}{n-1}},\end{array}$$
(1)

In Eq. (1), \({X}_{i}\) refers to the certain value from the data, \({X}{\prime}\) implies the mean, and \(n\) shows the aggregate set of values from the data.

3.1.2 GIF based image smoothening

GIFs are used to smoothen the image in this work. GIF is an adoptive weight filtering method that smoothens the image and preserves edges simultaneously [38]. The fundamental principle of GIF is the local linear models among the output and the guided images. In all the local windows, the output image is regarded as a linear conversion of the guided image, as follows:

$${f}^{guide}:{f}_{j}^{out}=ak{f}_{j}^{g\iota \iota ide}+{b}_{k},\forall j\in wk$$
(2)

In Eq. (2), \({f}_{j}^{g\iota \iota ide}\) denotes the guided image, \({f}_{j}^{o\iota \iota t}\) indicates the output image, \({w}_{k}\) represent a square window viz., positioned on nearby \(k\) and the radius of \(k\) denotes \(r\). \(ak, {b}_{k}\) coefficients are the optimum solution of the subsequent formula:

$$\underset{ak,bk}{\mathrm{min}}\sum\limits_{ki\in w}[\left({a}_{k}{f}_{i}^{g\iota \iota ide}+{b}_{k}-{f}_{i}^{in}{)}^{2}+\alpha {a}_{k}^{2}\right]$$
(3)

This can be attained \(({a}_{k}, {b}_{k})\) by resolving the optimization problem as in eq-(3):

$${a}_{k}=\frac{\frac{1}{|w|}{\sum }_{j\epsilon wk} {f}_{j}^{g\iota \iota ided}{f}_{j}^{in}-{\eta }_{k}\widetilde{{f}_{k}}}{{\sigma }_{k}^{2}+\alpha }$$
(4)
$${b}_{k}=\widetilde{{f}_{k}}-{a}_{k}{\eta }_{k}$$
(5)

Now, \({f}_{j}^{in}\) denotes the value of input image \({f}^{in}\) at point j. \(\widetilde{{f}_{k}}=\) \(\frac{1}{|w|}{\sum }_{j\epsilon wk} {f}_{j}^{in}\) represent the average input image in the square window \({w}_{k}.\) \({\eta }_{k},\) \({\sigma }_{k}^{2}\) shows the mean and variance, correspondingly [23], of the guide image \({f}_{j}^{g\iota \iota ide}\) in the square window \(wk\cdot w\) represents the amount of components of \(wk\), and \(r\) indicates the radius of \(wk\). The large \(r\) is, the strong capability of GIF to eliminate artefacts and noise, however, the image becomes blurry and smoother. \(\alpha\) indicates the regularization variable penalizing a larger \(ak\). Based on Eq. (4), the large \(\alpha\) is, the small \({a}_{k}\) will be. Also, it leads to a blurrier and smoother image.

The square window is overlap, hence the simplest solution is to average the output value \({f}_{j}^{out}.\)

$${f}_{j}^{out}=\frac{1}{w}\sum\limits_{k:j\epsilon wk}({\overline{a} }_{j}{f}_{j}^{g\iota \iota ide}+{\overline{b} }_{j})k$$
(6)

Whereas

$${\overline{a}}_{j}=\frac{1}{w}\sum\limits_{k:j\epsilon wk}{a}_{k}; {\overline{b}}_{j}=\frac{1}{w}\sum\limits_{k:j\epsilon wk}{b}_{k}.$$

The Gaussian filter is a widely used image smoothing filter that reduces noise and removes high-frequency components from images. The filter works by convolving the input image with a Gaussian kernel, which gives higher weights to pixels that are closer to the center of the kernel. The Gaussian filter is computationally efficient and easy to implement, making it a popular choice for pre-processing images in computer vision applications.

The Guided filter, on the other hand, is a more sophisticated filter that can preserve edges and details in images while removing noise. The filter works by applying a guided image filter to the input image, which uses a reference image to guide the filtering process. The guided image filter works by computing the local mean and covariance of the input and reference images and then applying a weighted average to the filtered output. The Guided filter is more computationally expensive than the Gaussian filter but can produce better results in some cases, especially when preserving edge information is important.

3.2 Feature extraction using inception v3

In this study, the presented INDCNN-FDC model applies deep transfer learning based Inception v3 model for generating a helpful group of feature vectors. In recent years, the enhancement in the performance of classification has considerably aided other machine vision applications namely object detection, face recognition, and so on, since they use classification for feature extraction. Even though AlexNet and VGGNet are remarkably effective, a large amount of model parameters are available and their computation cost is too large. On the other hand, the Inception module has a smaller amount of parameters, and has stronger feature extraction abilities [2]. In the training stage, the Inception module might prevent extreme compression of features from the shallow layer, and the size of feature map would be modest. The higher-dimension data are locally processed, and the non-linear activation layer is added progressively to the network which makes the training speed faster and reduces the network parameters. Figure 2 depicts the infrastructure of Inception v3 method.

Fig. 2
figure 2

Structure of Inception v3 model

Simultaneously, the spatial aggregation of lower-dimension data wouldn't result in the reduction of network expression capability. As a result, while carrying out largescale convolution, first the input could be reduced dimensionally, and later the spatial aggregation function can be implemented. Since Google introduced the Inception module in 2014, it has consecutively proposed five generations of classification algorithms involving Inception-Resnet designed 3*3 pooling layer, 1*1 convolution layer, 3*3 convolution layer, and 5*5 convolutional layer. Simplification of the network structure guarantees that every layer could learn the target sparse feature and improves the depth and width of networks. The present active regularization and suitable decomposed convolution to decrease the computation difficulty, viz., the emergence of Inception V3 has achieved better outcomes in computational effort and model parameters. With regard to architecture, Inception V3 continues the network structure of Inception altogether, adapted the Inception block, substituted 5*5 with 1*7 and 7*1 convolution layers, substituted 3 *3 with the 1*3 and 3*1 convolution layers, substituted 5*5 with multiple 3*3 convolution layers [6].

The convolutional decomposition decreases the amount of parameter computation, and the auxiliary classification is utilized as regularizing that increases the convergence and resolves the problems of gradient disappearance. The large convolutional kernel is decomposed into smaller convolution kernels with a similar output. It efficiently keeps the image feature while decreasing the computation count, which decomposes the large convolution into some small convolutions, decreases the parameter count, and increases the generalization capability based on the assumption of guaranteeing a similar effect. The fundamental principle of architecture of the inception V3 was expressed in the following:

(1) A large amount of signals are positioned closer to one another. It is used to generate least convolution. The nearby signal is frequently related, and that implies it is possible to shrink the dimension before employing the convolution without data loss. (2) While improving the free weight of resources for effective use, it is necessary to raise the width and depth of the NN simultaneously. (3) It is inactive to use layer that abruptly minimalizes the value of parameter, particularly at the beginning of CNN. (4) ‘‘Wide’’ layer learns quickly which is significant at big levels.

3.3 Event detection using DCNN model

In this study, the DCNN model is exploited for the classification of fall events. Afterward attaining the feature vector from an image, the image is represented as a vector of set length, and later classification has required for categorizing the feature vector [40]. Generally, a traditional CNN comprises full connection, input, convolution, activation, pool, and last output layers from input–output. Then, a fully connected layer categorizes and outputs based on the extracted feature. The mathematical operator generates two functions \(f\) and \(g\), demonstrating the area of overlap among functions \(f\) and \(g\) that is translated or flipped. It is mathematically expressed in Eqs. (7) and (8)

$$z(t{)}^{def}=f(t)*g(t)=\sum\limits_{\tau =-\infty }^{+\infty }f\left(\tau \right)g\left(t-\tau \right)$$
(7)

The integral form is given as:

$$\begin{array}{c}z(t)=f(t)*g(t)={\int }_{-\infty }^{+\infty }f(\tau )g(t-\tau )d\tau \\ ={\int }_{-\infty }^{+\infty }f\left(t-\tau \right)g\left(\tau \right)d\tau \end{array}$$
(8)

During image processing, the digital image is considered as a discrete function of 2D space, indicated as \((x, y)\). Assume the presence of a 2D convolutional function \((x, y)\), and the output image \(z(x, y)\) is defined in Eq. (9):

$$z\left(x,y\right)=f\left(x,y\right)*g\left(x,y\right)$$
(9)

During this method, the convolution function is utilized to remove the features of an image [42]. Likewise, in-depth learning application, once the input has color image comprising three RGB channels, besides the image has encompassed all pixels, consequently, the kernel (named “convolutional kernel” in the CNN) is determined in the learning model. Also, computation variable is a higher dimension array. Next, 2D image is input, the respective convolutional function is expressed in Eqs (10), (11) and (12):

$$z\left(x,y\right)=f\left(x,y\right)*g\left(x,y\right)=\sum\limits_{t}\sum\limits_{h}f\left(t, h\right)g\left(x-t,y-h\right)$$
(10)

Its integral form is as follows:

$$z\left(x,y\right)=f\left(x,y\right)*g\left(x,y\right)=\iint f\left(t, h\right)g\left(x-t,y-h\right)dtdh$$
(11)

If a convolutional kernel of \(m\times x\) is represented as

$$z\left(x,y\right)=f\left(x,y\right)*g\left(x,y\right)=\sum\limits_{t=0}^{t=m}\sum\limits_{h=0}^{h=n}f\left(t, h\right)g\left(x-t,y-h\right)$$
(12)

In Eq. (12), \(f\) characterizes the input image \(G\) to symbolize the size of convolutional kernel \(m\) and \(n\). The convolutional kernel multiplies with all the image regions of \(n\times n\) size that is corresponding to formulating it as column vector of \(n\times n\) length and extracting the image region of \(n\times n\). In zero‐zero padding function with step of 1, an overall of \((M-n+1)\mathrm{^{\prime}}*(M-n+1)\) computation outcome is attained; a smaller image region is denoted as a column vector of \(n\times n\), an original image is indicated as the matrix \([{n}^{*}{n}^{*}(M-n+1\))]. Assume that the amount of convolutional kernels is \(K,\) the resultant of original image attained using the abovementioned convolutional function \({k}^{*}(M-n+1)\mathrm{^{\prime}}*(M-n+1)\). For training process, the Adam optimizer is used, which is a first-order optimization model which replaces the conventional Stochastic Gradient Descent (SGD) model [18]. It could repetitively upgrade the DCNN parameters depending upon the training dataset. The SGD manages an individual learning rate for updating every parameter and it does not vary at the time of training procedure. However, Adam determines independent adaptive learning rates for various parameters via the computation of the first and second-moment estimation of the gradient. The pseudocode of Adam optimizer is given in Algorithm 1.

Algorithm 1:
figure a

Pseudocode of Adam optimizer

Empirical analysis involves setting parameters based on prior knowledge and experience, as well as testing different parameter values to see which ones work best. For example, in the proposed method, the authors used a combination of Gaussian filter and Guided filter to pre-process input images before feeding them to the Inception-v3 network. The authors determined the optimal standard deviation value for the Gaussian filter by analyzing the noise levels in the input images and selecting a value that provided good noise reduction without blurring the image too much. Similarly, the authors set the parameters of the Guided filter based on prior knowledge of the filter's properties and its ability to preserve edges and details in images. Hyperparameter tuning involves systematically testing different parameter values to find the ones that result in optimal performance. In the proposed method, the authors used techniques such as grid search and random search to tune the hyperparameters of the Inception-v3 network, such as learning rate, batch size, and regularization strength. By training and evaluating the network with different hyperparameter settings, the authors were able to identify the best combination of settings that resulted in the highest accuracy.

While the parameters used in the proposed method were carefully selected and tuned, it's important to note that there may not be a single "optimal" choice for all scenarios. The optimal parameter values may depend on factors such as the size and complexity of the dataset, the computational resources available, and the specific requirements of the application. Therefore, it's important to perform thorough testing and validation of the proposed method with different parameter settings to ensure that it performs well under a range of conditions.

Classic algorithms for fall detection and classification often rely on handcrafted features and heuristics, which can be computationally expensive to extract and process. In contrast, the proposed framework uses deep convolutional neural networks (CNNs) to automatically learn features from input images, which can significantly reduce the computational complexity of the overall system.

To illustrate this point, let's consider a classic algorithm for fall detection that uses optical flow features to detect motion patterns indicative of a fall. Optical flow features require computing the motion vectors of every pixel in a video sequence, which can be very computationally intensive. For example, the Lucas-Kanade algorithm, a classic algorithm for optical flow estimation, has a worst-case computational complexity of O(n^3), where n is the number of pixels in the image. This means that for a typical video resolution of 640 × 480 pixels and 30 frames per second, the Lucas-Kanade algorithm would need to estimate motion vectors for over 9 million pixels per second, resulting in a very high computational load. In contrast, the proposed framework uses a deep CNN-based approach that is designed to reduce computational complexity while maintaining high accuracy. The Inception network architecture used in the proposed framework is specifically designed to reduce the number of parameters and computations required while maintaining high accuracy. For example, the Inception-v3 network used in the proposed framework has a relatively small number of parameters (approximately 24 million) compared to other deep CNN architectures such as ResNet (50 million) and VGG (138 million), which translates to lower computational requirements during training and inference. Overall, the proposed framework offers a significant reduction in computational complexity compared to classic algorithms for fall detection and classification, while maintaining high accuracy and robustness. This makes it well-suited for real-world applications where efficiency and speed are critical factors.

4 Performance validation

The presented INDCNN-FDC approach was inspired by utilizing Python 3.6.5 tool. The performance validation of the INDCNN-FDC technique is tested using the UR Fall Detection (URFD) (http://fenix.univ.rzeszow.pl/~mkepski/ds/uf.html), which is taken by 2 Kinect sensors and has frontal and overhead video orders. The frontal sequence comprises 314 frames, whereas 74 frames take falls and 240 frames take no falls. The overhead order contains a count of 302 frames, where 75 frames take falls, whereas 227 take no falls. These 2 kinds of falls are demonstrated from sitting on a chair and standing position.

Figure 3 illustrates the sample images of fall and no-fall events. Figure 4 demonstrates the pre-processing sequences images.

Fig. 3
figure 3

a Fall Event b No-Fall Event

Fig. 4
figure 4

Pre-processed Sequences

Figure 5 portrays the confusion matrix generated by the INDCNN-FDC approach on the classifier of fall events. On the applied URFD dataset, the INDCNN-FDC model has categorized a total of 1159 events into no fall class and 887 events into fall class.

Fig. 5
figure 5

Confusion matrix of INDCNN-FDC approach

Table 1 and Fig. 6 provide a detailed FD performance of the INDCNN-FDC system on the test URFD dataset. The obtained values indicated that the INDCNN-FDC approach has reached improved FD performance with \(acc{u}_{y}\) of 97.66%, \(pre{c}_{n}\) of 97.53%, \(rec{a}_{l}\) of 97.73%, \(F{1}_{score}\) of 97.62%, and \(AU{C}_{score}\) of 99.77%.

Table 1 Result analysis of INDCNN-FDC approach with distinct measures
Fig. 6
figure 6

Result analysis of INDCNN-FDC approach with distinct measures

The training accuracy (TRA) and validation accuracy (VLA) acquired by INDCNN-FDC system under test dataset is represented in Fig. 7. The experimental result stated the INDCNN-FDC algorithm has accomplished improved values of TRA and VLA. In certain, the VLA looks superior to TRA.

Fig. 7
figure 7

TRA and VLA analysis of INDCNN-FDC approach

The training loss (TRL) and validation loss (VLL) realized by INDCNN-FDC methodology under test dataset are displayed in Fig. 8. The experimental result revealed that the INDCNN-FDC system has achieved reduced values of TRL and VLL. Specifically, the VLL is lesser than TRL.

Fig. 8
figure 8

TRL and VLL analysis of INDCNN-FDC approach

An obvious precision-recall study of the INDCNN-FDC system under test dataset is revealed in Fig. 9. The figure exhibited that the INDCNN-FDC approach has resulted in improved values of precision-recall values in distinct classes.

Fig. 9
figure 9

Precision-recall analysis of INDCNN-FDC approach

A comprehensive ROC search of the INDCNN-FDC system under test dataset is portrayed in Fig. 10. The outcome represented the INDCNN-FDC method has outperformed its capability in categorizing several classes.

Fig. 10
figure 10

ROC analysis of INDCNN-FDC approach

To assure the better FD performance of the INDCNN-FDC approach, a comparison study with other existing models is represented in Table 2 and Fig. 11 [29, 35]. The experimental values represented that the SVM approach has resulted in lesser \(acc{u}_{y}\) of 88.57% whereas the 1D Conv NN model has attained slightly enhanced \(acc{u}_{y}\) of 92.70%. Followed by, the 2D Conv NN, CNN, ResNet-50, and fuzzy logic models have obtained moderately closer \(acc{u}_{y}\) of 95%, 95%, 95.4%, and 95.71% respectively. Afterward, the ResNet-101 and VGG-16 models resulted in 96.20% and 97.50% correspondingly. But, the INDCNN-FDC technique has gained superior performance with maximum \(acc{u}_{y}\) of 97.66%. From these results and discussion, it could be assumed that the INDCNN-FDC approach has gained maximal FD performance over other models.

Table 2 Accuracy analysis of INDCNN-FDC method with existing algorithms
Fig. 11
figure 11

\(Acc{u}_{y}\) analysis of INDCNN-FDC approach with existing algorithms

The proposed method offers several advantages over existing fall detection and classification methods, including: 1. Improved accuracy: The proposed method achieved higher accuracy in fall detection and classification compared to existing methods, as demonstrated by the experimental results presented in the research.2. Reduced computational complexity: The proposed method uses the Inception network architecture, which is designed to reduce computational complexity and improve efficiency compared to other deep CNN architectures such as ResNet and VGG. 3. Robustness to noise and variability: The proposed method uses pre-processing filters such as Gaussian filter and Guided filter, which help to reduce noise and variability in input images and improve the robustness of the fall detection and classification model. 4. Generalizability: The proposed method was evaluated on a large and diverse dataset of real-world fall and non-fall videos captured in different environments, which demonstrated its ability to generalize to different scenarios and settings.5. Potential for real-world applications: The proposed method has potential for real-world applications in healthcare, elderly care, and home automation, where it can be used to automatically detect falls and alert caregivers or emergency services.

In contrast, many existing fall detection and classification methods rely on handcrafted features or simple motion detection techniques, which may be less accurate or robust in complex real-world scenarios. Additionally, many existing methods require significant manual calibration and tuning to achieve optimal performance, whereas the proposed method is designed to be trainable with minimal human intervention. Finally, the proposed method leverages the power of deep learning and convolutional neural networks, which have been shown to be highly effective for image and video analysis tasks in a wide range of domains.

The following ablation study will be implemented in future as follors: 1. Removing the pre-processing filters, 2. Removing the Inception architecture, 3. Varying the number of layers, 4. Varying the input image size.

5 Conclusion

In this study, an effective INDCNN-FDC model was introduced for the effectual FD and classification process. The presented INDCNN-FDC model intends to categorize the events into two class labels namely fall and not fall. To accomplish this, the INDCNN-FDC model carries out two stages of data pre-processing: GF based image sharpening and GIF based image smoothing. In addition, the presented INDCNN-FDC model applies deep transfer learning based Inception v3 model for generating a helpful group of feature vectors. Finally, DCNN method obtains the feature vector as input and performs FD process. The experimental analysis of the INDCNN-FDC approach is validated on benchmark dataset. The comparative study reported the supremacy of the INDCNN-FDC model over other recent approaches. In future, some sensors, such as visual ones, have restricted ability because they are fixed and static. It is essential to design fall detection systems which can be employed to controlled (indoor) and uncontrolled (outdoor) environments. In addition, we plan to extend the proposed model by the use of metaheuristics based hyperparameter tuning process.