1 Introduction

Malaria, typhoid, pneumonia, and dengue diseases can have an important effect on a person’s health and could lead to death if they are not treated properly [9]. These diseases are classified as Airborne, Waterborne, and Vector-Borne. Droplets of microorganisms that are discharged into the air by cough, sneezing, or talking cause airborne diseases [15]. The pathogens in question may be viruses, bacteria, or fungi. Tuberculosis, influenza, and smallpox are just a few of the diseases that can be transmitted through the air. Pathogenic bacteria are the most prevalent source of waterborne infections, which are transferred primarily through unclean fresh water. Bathing, washing, drinking, and eating food that has been infected can transmit infections. Some of the most serious dangers to human health are Malaria, dengue fever, and West Nile virus are examples of diseases transmitted by mosquitoes. Insects infected with the disease can spread it to humans, as can direct transmission between humans [10].

Weather conditions must be just right for germs to survive, reproduce, and spread. Unpredictable weather conditions have a significant impact on the environment’s resources. That’s why it’s important to focus on changes in the temperature or weather. Viruses like malaria and cholera have been linked to severe rains, for example. This cyclicity is found in acute infectious disorders and may be found in human infectious diseases generally. There is a seasonal window of incidence for each acute infectious disease, and these windows can range significantly among geographic areas and from other diseases within the same region. When epidemics occur, the seasonal fluctuation in infectious disease transmission is an essential factor, but it is not the only factor. Variations in seasons are regarded as “seasonal forcing” in Ecology of infectious diseases and epidemic modelling, respectively [12].

The seasonal drivers have been environmental factors, particularly climate conditions. This may be because seasonal disease incidence tends to follow a similar pattern. Temperature and rainfall are classic examples of abiotic environmental drivers that influence transmissions through their effects on parasites or hosts, but various examples incorporate seasonal non-climatic abiotic environments including the salinity of the water that might influence water-borne pathogens. Pathogen survival might be affected by environmental conditions when moving between hosts. For instance, droplet-transmitted diseases may have short transition windows (e.g., a few minutes) or longer transition windows (like several weeks) (e.g., parasites with life stages of environments). Other than infections, environmental factors can impact host vulnerability to infections or the dynamics of vector populations [11].

In this research, the main objective is to predict seasonal diseases based on the proposed ML model. For the evaluation, a dataset was collected from different regions around the Madurai district in Tamilnadu. Based on the features from the dataset, the ANOVA technique is used for the feature selection, and prediction and a hybrid random forest with the XG-Boost technique is used. A brief introduction about seasonal diseases was presented in Section 1 and the remaining part of this paper was divided into the areas listed below. Section 2 represents a discussion of the related works and their applications, Section 3 represents the overview of the suggested model, Section 4 explains the experimental analysis, and the conclusion is presented in Section 5.

1.1 Contribution and significance of the work

The contribution of this research includes proposing a hybrid machine-learning model for predicting seasonal ailments such as dengue fever, malaria, pneumonia, and typhoid are examples of diseases based data from the Madurai district in real time. The model utilizes Antlion Optimization Algorithm for feature selection and Random Forest integrated with the XG-Boost technique for classification, achieving high accuracy, precision, recall, specificity, and f1-score, outperforming other compared models.

2 Related works

A group forecasting model based on bragging rights [17] was used to anticipate the amount of illness episodes determined by previous events. Log transformation and z-score transformations were used in the preprocessing. A specific disease dataset was used to demonstrate the applicability of this ensemble model. This model was tested for dengue, tuberculosis, chickenpox, and food poisoning among other infections. In [8], a comparative analysis of machine learning classifiers’ performance was presented based on disease prediction. Classifiers such as Naive Bayes and decision models, and RF were utilized in the development of the disease prediction system. The symptoms from the obtained datasets for 41 different diseases were used to evaluate this disease prediction algorithm.

AI approaches were employed in [4] to forecast diseases using a feature selection method. Classifiers such as decision trees, SVM, random forest, and XG-Boost algorithms were utilized for the classification of diseases like Malaria, Jaundice, COVID, Pneumonia, Dengue, and Typhoid. Dengue fever/dengue hemorrhagic fever (DF/DHF) has been a problem in Rajasthan for the past decade, and in [2] a model was proposed to predict the prevalence of DF/DHF in 2011. A SARIMA model was employed to conduct statistical analysis. Research shows that adding a forecasting model to a current disease control program can have a significant influence on reducing disease incidence.

To better understand seasonal influenza, a model created and analyzed an epidemiological model known as Seasonal SIR (Seasonal SIR) [1]. To better understand how vector-borne diseases propagate, this model takes into account temperature fluctuations and a variety of human interaction networks. The Seasonal SIR model has been shown to accurately replicate outbreak dynamics observed in the actual world. More accurate forecasts of disease outbreaks may now be made, allowing for the implementation of effective control measures to keep epidemics under control.

LASSO and RF regressions with LSTM, a DRNN was proposed in [13] and compared to forecast dengue incidence on a weekly basis in 790 Brazilian cities. Predictors were based on multivariate time series, and time series from related regions were employed for capturing the spatial components of disease transmissions. Predicting future dengue outbreaks in towns of varying sizes was made easier with the use of an LSTM RRN model. An LSTM model developed in [7] shown to be successful in predicting epidemic disease-infected areas and intensity.

In [5], a PSO-ANN-based system was proposed. The ANN method optimized using the PSO methodology in this model. Dengue patients were identified using a PSO-optimized ANN method. This diagnostic approach was found to be an effective and powerful tool for detecting dengue disease early and more accurately. A diagnostic model for predicting dengue sickness, developed by the same author in [6] using three widely used machine learning approaches (Decision Tree, Artificial Neural Network, and Naive Bayes), was examined.

In the prediction problem of an infectious epidemic, a new scalable feature selection approach and a seasonal adjustment-based predictive modelling framework were proposed. [18]. More stable and accurate epidemic predictions against both long and short-term variations in search engine logs were accomplished by evaluating each component separately. A machine learning strategy based purely on human genome data was reported in [3] for the predictions of dengue disease severity depends on the genome information of humans. The SVM algorithm was used for finding the classification subset of optimal loci and ANN was utilized for classifying patients into those with mild or serious dengue fever.

Previous research in disease prediction using machine learning has shown promising results in terms of accuracy and efficiency. However, some limitations and knowledge gaps need to be addressed. Some of the benefits and drawbacks of relevant research.

2.1 Benefits

Improved disease, Early warning systems, Real-time data utilization.

2.2 Drawbacks

Variability in datasets, Limited scalability, Lack of interpretability.

This publication addresses some of the current trends and knowledge gaps in disease prediction research. Specifically, it proposes a hybrid machine learning model that incorporates the Antlion Optimization Algorithm for feature selection and Random Forest integrated with XG-Boost for classification, achieving high accuracy and performance. The research also highlights the rigorous analysis of the dataset selection process to ensure the reliability and representativeness of the real-time data from different regions around the Madurai district. Furthermore, the proposed model has the potential for broader applications beyond the Madurai district and can be extended to incorporate climatic and weather changes for disease prediction. This addresses the current trend of incorporating environmental factors in disease prediction models, considering the impact of climate change on disease dynamics.

3 Comprehensive experimental flowchart

The experimental flowchart Fig. 1 provides a comprehensive overview of the proposed strategy for predicting seasonality diseases using a hybrid machine learning model, from data collection to model evaluation and optimization.

Fig. 1
figure 1

Experimental flowchart

4 Proposed methodology

The proposed model aims to predict and classify different diseases, including dengue, malaria, pneumonia, typhoid, kala-azar, Japanese encephalitis, measles, and normal fever and cold infections, using real-time data gathered from numerous sources around the Madurai district between 2019 and 2020. The suggested model is a hybrid machine-learning approach that consists of two main steps: feature selection and classification. A critical phase is choosing features in machine learning, as it involves identifying the most relevant features from the dataset that contribute the most to prediction accuracy. In this model, the Antlion Optimization Algorithm (ALO) is utilized for feature selection. ALO is a metaheuristic technique by the hunting behavior of antlions, and it is employed to choose the best qualities from the dataset that are most informative for disease prediction. After feature selection, the Random Forest (RF) algorithm, integrated with the XG-Boost technique, is used for classification. Multiplexing is the technique of combining multiple signals into a single transmission channel to efficiently utilize the available bandwidth. Time-division multiplexing (TDM) involves dividing the transmission channel into discrete time slots, with each time slot dedicated to a different signal. The signals are interleaved in time and transmitted sequentially. Frequency-division multiplexing (FDM) involves dividing the available bandwidth into multiple frequency bands, with each band dedicated to a different signal. The signals are transmitted simultaneously over their respective frequency bands. TDM and FDM are commonly used in telecommunications and networking to enable multiple signals to be transmitted over a single channel, allowing for efficient use of resources and increased capacity., RF is a robust ensemble learning technique which integrates numerous decision trees and accurate prediction model. XG-Boost is a popular gradient-boosting technique that enhances the performance of decision tree-based models by addressing issues like overfitting and improving prediction accuracy.

Once the classification is done, the results are compared with the original data for validation. This involves evaluating the efficacy of the proposed model in relation to performance metrics, which are commonly used performance metrics in classification tasks. This comparison helps to assess the efficiency of the proposed method in predicting and classifying seasonality diseases based on the live data obtained from the Madurai region in (Fig. 2).

Fig. 2
figure 2

Workflow of the proposed model

5 Feature selection using ALO

As the name implies, this is a method of searching that emulates the hunting strategy used by the antlions in common. ALO has a significant potential to avoid local optima stagnations due to its usage of the roulette wheel and random walk. In ALO, the antlion’s random selection and the ant’s random movement around it assure searching for space, while the adaptative lowering boundaries of antlion traps check that the search region is being used.

The discrete optimization problem that decimal coding cannot directly address is the question of feature selection. For all individuals in the binary arrays, the binary coding version translates the population value of probability into the value of 1 or 0 for all binary vectors. Consequently, all ants in ALO are encoded using the binary coding method. A 0 or a 1 can only represent a discrete binary condition’s one dimension. Variables change from 1 to 0 or vice versa when traveling across a dimension. Assume that each ant is a continuous algorithm to implement binary modes for ALO. Binary ALO updates ants by switching between “0” and “1,” which is the primary difference between binary and standard ALO. With the probabilities given in Eq. 1, it was assumed that the new ant’s location would be either 0 or 1, which was updated by a condition.

$${S}_j^k=\left\{\begin{array}{c}1\ if\ rdm<\left|\tanh \left({GZ}_j^k\right)\right|\\ {}0\kern6.25em otherwise\end{array}\right.$$
(1)

Hyperbolic tangent functions are represented by tanh, while the representation of an ant’s location using binary code \({S}_j^k\). Eq. 1 represents a function that generates a binary output (1 or 0) based on a comparison between a random variable rdm and the absolute value of the hyperbolic tangent of \({GZ}_j^k\), where \({GZ}_j^k\) is likely a calculated value based on inputs or parameters. The specific use and interpretation of this function would depend on the context in which it is being used.

Due to elitism, which allows them to keep the best solutions at any stage of optimizations, swarm intelligence algorithms have a substantial advantage over traditional binary coding. Many parent solutions are required for crossover, and the child solutions are generated from across the entire population. It was a comparison of the two binary solutions generated by the random motion. The best antlion was kept as an elite in each iteration. Each ant on the repetition must be influenced by the movement of the strongest and fittest of the antlions. As a result, according to Eq. 2, it was assumed that ants travel between the elite and antlions following the roulette wheel.

$${Ant}_j^k= Crosover\left({S}_P^k,{S}_C^k\right)$$
(2)

Equation (2) represents the crossover operation in the Antlion Optimization Algorithm (ALO), where “Ant” refers to the ants in the algorithm, “j” denotes the ant index, and “k” denotes the iteration number or the current generation of the algorithm. \({S}_P^k\)represents the position of the parent ant in the k-th iteration, and \({S}_C^k\) represents the position of the child ant generated through crossover in the same iteration.

Wrapper-based approaches require a lot of training and testing for a single assessment of the provided solutions. This means that the method of the search must be carefully chosen. This work used the ALO for searching the feature spaces for a combination of the optimal feature that maximized classification accuracy. Prey positions in the Antlion optimizer are randomly chosen to represent a particular feature combination. Additionally, Antlion tries to search for its prey by setting up n random positions for each of its ants. Using a roulette wheel method, an antlion is chosen for hunting by the antlion method, and randomly walks ants around this ant, as well as randomly walks the best/elite antlions. A given ant adjusts its location based on its last two random treks. If the fitness level of an ant exceeds that of an antlion, then the antlion consumes it and moves into the ant’s place. This procedure is applied iteratively in the hopes of reaching a satisfactory conclusion after some iterations. It’s worth noting that the ant’s random walk is currently constrained by its exploration rate. To facilitate extensive search, it is usual practice to reduce the exploration rate as optimization continues. As many qualities as there are in the supplied data collection, a continuous-valued vector represents each solution. It is restricted to a range of [0,1] for the solution vector. The binary form of approach with a continuous value is the threshold during the solution fitness function evaluation using Eq. (3).

$${y}_{ij}=\left\{\begin{array}{c}0\kern0.75em if\left({x}_{ij}<0.5\right)\\ {}1\kern1.75em Otherwise\end{array}\right.$$
(3)

Here, xij, i is the solution in dimension j, and yij is the solution vector x in discrete representation [19].

figure a

6 Integration of random Forest with XG-boost

Regarding classifications and regressions tree (CART), the random forest was a strong ensembled learning approach. Regression and classification can be applied to it because it has been widely utilized and has a good performance. The theory of statistical learning called bootstraps resampling is used for extracting several samples from the real sample, modeling decision trees for every bootstrapped sample, and hence integrating the prediction of several decision trees for averaging the final prediction outputs. Put-back samples and random changes in the integration of predictors are used to boost the diversity of decision trees. Regression trees (RT) could be useful for this work, which was to predict regressions. It was determined at every branching point of RT that the mean square error between samples was calculated for each leaf node. The regression tree will stop growing if the minimum mean square error at each leaf node is used as the branching condition. This can be done until there are no more characteristics were present or the total mean square error was optimal. The count of regression trees (N estimators) and the count of the node’s random variables are two important custom factors (Max depth). To reduce the number of errors that takes place during data processing, these parameters must be improved.

The RF algorithm’s modeling approach is as follows: extract N from the initial data collection estimations and training sets are generated using bootstrap sampling. Approximately two-thirds of the original information set is used for training. For each random forest training process bootstrap sample, only around a third will be picked. Out-of-bag data is the name given to this portion of the data.

To create a “forest,” N estimators’ total regression trees were built, but these trees are not pruned. No attribute is picked as an internal node for branching in each tree, but a random Max depth attribute is chosen from a pool of these characteristics to serve as a branching node. Thus, the RF approach improves the combined regression analysis model’s extended forecasting capability by increasing the difference between the regression models by building various training sets. To build multi-RM designs, a RM of {t1(x), t2(x), …, tk(x)} was created by n-time model training. Then, using the regression tree predictions of the N estimators, determine the new sample’s value using a simple average technique. The formula for regression decision is,

$${\hat{f}}_{rf}^K(x)=\frac{1}{K}\sum\nolimits_{k=1}^K{t}_i(x)$$
(4)

here \({\hat{f}}_{rf}^K(x)\) denotes the model of mixed regression, ti is a regression framework with an individual decision tree, and K denotes the no of regression trees.

The RF method integrates different tree decisions and supports improving accuracy. To enhance accuracy, the proposed model employs RF and extreme gradient boosting (XG-Boost). For randomization, it employs bagging. Bagging is an abbreviation in Bootstrap aggregation it enhances the method’s reliability and precision.

$$Bagging=\frac{1}{B}\sum\nolimits_{b=1}^B fb\left({x}^{\prime}\right)$$
(5)

here xwas the predicted value for the unseen sample, was the total trees, b = 1,2,3… B; and fb = Train the DT fb on Xb, Yb [16].

XG-Boost has recently been widely used in numerous research fields, and it has produced favorable outcomes in a variety of machine-learning competitions. XG-Boost differs from GBDT in some aspects (Gradient Boosted Decision Trees). To begin with, the GBDT technique simply uses a first-order Taylor expansion, but the XG-Boost approach increases the loss functions with the Taylor expansion of the second-order. Second, normalization is used by the objective function for preventing overfitting and reducing the technique’s complexity. Third, XG-Boost was very flexible, enabling users to define the objective of optimizations and assessment categories. Nonetheless, the XG-Boost classifier can handle uneven training data effectively by setting class weights and employing AUC as the evaluation parameter.

In summary, XG-Boost is the scalable and adaptable tree structures optimization model capable of managing sparse data, increasing the speed of the algorithm, and reducing processing time and memory for data of large-scale. The XG-Boost algorithm could be formalized as,

Given an n-sample training dataset T = {(x1, y1), (x2, y2), …, (xn, yn)}, xi ∈ Rm, yi ∈ R, the objective function could be expressed by,

$$obj\left(\theta \right)=\sum\nolimits_i^nl\left({y}_i,{\hat{y}}_i\right)+\sum\nolimits_{t=1}^T\varOmega \left({f}_t\right)$$
(6)

Here, \(l\left({y}_i,{\hat{y}}_i\right)\) is the difference between the target yi and the prediction \({\hat{y}}_i\) and ft is the prediction score of the three. Based on the Taylor expansions of the objective functions, the estimated loss function may be computed:

$${L}^{(t)}\simeq \sum\nolimits_{i=1}^k\left[l\left({y}_i,{\hat{y}}^{\left(t-1\right)}\right)+{g}_i{f}_t\left({x}_i\right)+\frac{1}{2}{h}_i{f}_t^2\left({x}_i\right)\right]+\varOmega \left({f}_t\right)$$
(7)

Here, \({g}_i={\partial}_{{\hat{y}}^{\left(t-1\right)}}l\left({y}_i,{\hat{y}}^{\left(t-1\right)}\right)\) represents every sample’s initial derivative and \({g}_i={\partial^2}_{{\hat{y}}^{\left(t-1\right)}}l\left({y}_i,{\hat{y}}^{\left(t-1\right)}\right)\) signifies every sample’s second derivative, and the loss function only requires the initial and second derivatives of every element of information [14].

To generate model predictions XG-Boost is the most used approach. The main disadvantage of the RF was its slowness. So, to eliminate the problem of speed, XG-Boost was integrated with it to achieve the best outcomes for the proposed research. XG-Boost was a technique utilized in classification approaches to enhance prediction results and minimize errors by lowering bias.

7 Experimental analysis

The suggested framework was tested with MATLAB Simulink 2019a. The studies were done out using a PC equipped with an Intel Core i7–10,700 CPU running at 2.9 and 4.8 GHz, 8 GB of RAM, and a 64-bit version of Windows 10. Initially, the dataset was preprocessed and the features were selected using the ALO. The information set is divided between 75% training and 25% testing.

8 Dataset description

This information was obtained in real time from different regions around the Madurai district between 2019 and 2020. The dataset varies between January to December. 21,574 samples were collected, which includes 10,879 female samples and 10,695 male samples. The ages of the people range from one to 89. District, geographical area, Month, year, age, gender, illness, death, nausea, bleeds, dividing fever, headache, high temperature fever, mild headaches, bloodstream vomiting, eye pain, nose bleeding, gum bleeding, discomfort in the abdomen, rashes on the skin, muscle pain, joint pain, pain in the body, intense bleeding, dehydration, which organ damage, respiration problem, swelling glands, and tiredness are all variables to consider are among the 29 characteristics in the information set. Out of these 29 features, district, region, and death features are ignored and the remaining 26 features were selected for analysis. The dataset was extracted in the following table (Tables 1, 2, 3 and 4).

Table 1 The seasonal influences on illness [11]
Table 2 Dataset details about the number of cases
Table 3 Infection rate with number of cases
Table 4 Performance analysis comparison of proposed model

Based on the number of cases infected each month, most of the cases are infected from Malaria infection. This malaria infection covers 52% of the infection from the total of 21,574 cases. Then dengue infection covers 23% of the infection the overall cases. Both these malaria and dengue infections were the most infected disease each month. Next to malaria and dengue, pneumonia infection covers 16% of infection, typhoid with 5%, kala-azar with 2%, and Japanese encephalitis and measles each covers 1% separately.

8.1 Performance metrics

The performance metrics are the evaluation metrics used in this research for performance analysis.

$$\textrm{Accuracy}=\frac{TP+ TN}{TP+ FP+ TN+ FN}\%$$
(8)
$$\textrm{Precision}=\frac{TP}{TP+ FP}$$
(9)
$$\textrm{Recall}=\frac{TP}{TP+ FN}$$
(10)
$$Specificity=\frac{TN}{TN+ FP}$$
(11)
$$\textrm{F}-\textrm{Score}=\frac{2 TP}{2 TP+ FP+ TN}$$
(12)
  • TP: It is the number of correct classifications of normal classes is referred to as the real positive value.

  • FP: It is a misleading positive value resulting from an inaccurate classification in typical classes.

  • TN: It was the total true categorization in anomalous classes that was real false case.

  • FN: It was an improper classification in anomalous classes that resulted in the incorrect negative value.

The proposed design was implemented for prediction of a various set of diseases from the given dataset. Based on the training of the model, the model was evaluated with the test data. A disease like Japanese encephalitis was predicted by the proposed model. The difference between the actual and predicted data was compared as shown in Fig. 3. As shown in the figure, the suggested design has close performance related to the actual data.

Fig. 3
figure 3

Proposed model’s predicted results for Japanese encephalitis

The Measles disease was predicted by using the evaluation dataset, in the suggested approach. The difference between the actual and predicted data was compared as shown in Fig. 4. As shown in the figure, the suggested design has close performance related to the actual data.

Fig. 4
figure 4

Proposed model’s predicted results for measles

As shown in Figs. 5 and 6, the diseases like Kala Azar and Typhoid were predicted by using the evaluation dataset, in the suggested approach. The difference between the actual and predicted data was compared. As shown in the figure, the proposed model has close performance related to the actual data in both these predictions.

Fig. 5
figure 5

Proposed model’s predicted results for Kala Azar

Fig. 6
figure 6

Proposed model’s predicted results for typhoid

As shown in Figs. 7 and 8, the diseases like Pneumonia and Dengue were predicted by using the evaluation dataset, in the suggested approach. The difference between the actual and predicted data was compared. As shown in the figure, the proposed model has close performance related to the actual data in both these predictions.

Fig. 7
figure 7

Proposed model’s predicted results for pneumonia

Fig. 8
figure 8

Proposed model’s predicted results for dengue

In this research, Malaria is the most infected disease that occurred throughout the year annually. The proposed model has predicted the malaria disease from the test dataset with close performance to the actual data as shown in Fig. 9.

Fig. 9
figure 9

Proposed model’s predicted results for malaria

The suggested model was evaluated with the performance metrics like the correctness, exactness, completeness, selectivity, and harmonic mean of precision and recall are being referred. As per the experimental analysis, the suggested method has achieved 96.17% accuracy, which is 1.2% to 5.9% improved compared to ACO-ANN, PSO-RF, WO-RF, and ANOVA-SVM models. The precision score of the proposed design was 93.95%, which is 1.7% to 4.8% better than other designs. The recall score of the proposed model was 95.86%, which is 2.5% to 6% enhanced than the other compared models. The specificity of the proposed model was 93.23%, which is 0.9% to 4.3% better than other compared models. The f-score of the proposed design was 96.22%, which is 1.08% to 5.3% improved than other designs. Based on the comparison, the proposed method has better results all the other compared models in terms of all the parameters. The ANOVA-SVM has a close performance related to the proposed model and the least efficiency was achieved by the ACO-ANN model (Fig. 10).

Fig. 10
figure 10

Graphical plot for the comparison of performance analysis

The sensitivity results of the proposed method in Fig. 11 showed in terms of high correctness, exactness, completeness, selectivity, and harmonic mean of precision and recall are being referred and to outperforming other compared models. The Antlion Optimization Algorithm for feature selection and Random Forest integrated with XG-Boost for classification contributed to the toughness and efficacy of the model. The proposed method demonstrated promising sensitivity results, highlighting its potential for accurate seasonality disease prediction.

Fig. 11
figure 11

Graphical plot for the comparison of sensitivity analysis

9 Conclusion

In conclusion, the proposed hybrid machine learning model for predicting seasonality diseases based on real-time data from different regions around the Madurai district has shown promising results. The use of the Antlion Algorithm and Random Forest integrated with XG-Boost for classification has resulted in high performance analysis. Compared to other models such as ACO-ANN, PSO-RF, WO-RF, and ANOVA-SVM, the proposed model has demonstrated superior performance across all parameters. The industrial benefits of this research are significant. By accurately predicting seasonality diseases, such as dengue, malaria, pneumonia, typhoid, and others, based on real-time data, it can enable proactive measures to be taken to prevent or control outbreaks on time. This can lead to improved public health management, reduced healthcare costs, and more efficient allocation of resources. Additionally, the suggested model can be improved further by integrating DL-based techniques to potentially achieve even higher accuracy and efficiency in disease prediction. Furthermore, the proposed model has the potential for broader applications beyond the Madurai district, as it can be adapted to other regions and extended to incorporate climatic and weather changes for disease prediction. This could contribute to early warning systems for disease outbreaks, aiding in proactive planning and preparedness measures. Overall, the proposed model has significant industrial benefits and holds promise for improving the efficiency of disease prediction and management in real-world scenarios.