Introduction

The Internet, covering a broad area of our daily lives, is an indispensable element. Many individuals use it for a variety of purposes, including shopping, bill payment, banking, and communication. Users suffer security issues as a result of increased usage, as well as in identifying theft, hacking phishing, and other cybercrimes. The most prevalent cybercrime assault is phishing. It is characterised as a social engineering technique used to trick customers into visiting phoney websites to steal sensitive details of customers such as bank details. People often fall for the information included in phishing emails and websites due to a lack of awareness, which is utilised by the attacker as a way of penetrating the user's privacy and obtaining critical information. This occurs when an attacker creates a phishing website that is so similar to legal websites that it is impossible for certain users to tell the difference. Sending an email with links to bogus websites that are identical to actual websites is one of the most prevalent strategies employed by fraudsters. They appear to be legitimate pages when they are opened, regarding details of bank account or check account regarding details [1].

In recent years, cybercrime has become a worry for many companies and academics. Phishing is a sort of cybercrime that is often regarded as one of the most dangerous. In phishing, the attackers steal the user's credentials and information by impersonating legitimate emails or websites. Because it impacts a large number of Internet users and companies, this form of assault has become a problem. In phishing, an attacker impersonates a certain organization’s LW and distributes it to victims via fraudulent means. Bogus websites are opened by clicking on links in the emails [2].

Phishers are constantly refining and improving their methods for creating false websites that appear to be authentic. The problem is figuring out which bogus websites are being utilised in phishing scams. To avoid these assaults and safeguard user privacy and security, new and updated solutions must be developed. Phishing is a type of cybercrime that uses social engineering and technology tactics to defraud people. Its goal is to hurt consumers by stealing personal information, passwords, and bank account information. Phishing attempts often use social engineering to trick the target into clicking on a spoofed link that takes them to a false web page that looks just like the real one. As a result, instead of being forwarded to the specified website, motivated for more secure. A firewall and anti-virus software alone will not protect from an online phishing assault. Users lose millions of dollars each year as a result of this type of attack. From APWG’s most recent data on phishing events, the number of complaints of mishaps reached 138,328 in the fourth quarter of 2018. Then, in the first half of 2020, it climbed by 15% to 162,155, indicating a 15% rise over the previous year. Furthermore, in 2019, phishing assaults were the most common web attacks. According to intelligence study, these assaults are predicted to continue to rise. Several machine learning algorithms have been presented to detect phishing websites automatically [3].

Related Work

Abdelfettah and Hassan suggested increasing the performance in identifying web pages and predicting phishing websites by adopting the GA approach. Although the performance of the GA-based URL detector was improved, the prediction time was quite long when dealing with a large number of URLs [4].

Rao and Pais used page elements such as logo, favicon, scripts, and styles. The technique used a server to update the page characteristics, which slowed down the detection system’s speed [5].

Aljofey et al. proposed a CNN-based detection algorithm for detecting the phishing page. To discover URLs, a sequential pattern is employed. According to previous studies, CNN performs better when fetching pictures rather than text [6].

AlEroud and Karabatis developed a generative adversarial network to get around a discovery system. A KNN organized system can detect the impression of an unfavourable network [7].

Geo et al., for example, introduced a detective ensemble model that blended two models to build a new classifier that outperformed each model alone. Furthermore, a combined feature selection technique of phishing website detection that focussed on improving phishing site features was introduced to increase detection [8].

Jain and Gupta discussed about phishing website and suggested about malicious URLs, identified using both NB and SVM algorithms. Both SVM and NB are sluggish learners, who do not remember their past outcomes. As a result, the URL detector's efficiency may be diminished [9].

Purbay and Kumar observed multiple machine learning approaches to categorise URLs. They matched the execution of several types of machine learning algorithms. However, there were no comments concerning the algorithms' retrieval capabilities [10].

Gandotra and Gupta discussed multiple categorisation techniques to detect dangerous URLs. The results of the studies showed that the system performed better than other machine learning approaches. However, it has limitations when it comes to managing massive amounts of data [11].

Basit et al. proposed a unique ML ensemble technique for detecting phishing assaults. This model was created using the results of three different classifier combinations to improve detection. However, because phishing attacks are extremely risky and have devastating consequences for people and companies, the detection rate of phishing websites must be improved. As a result, implementing accurate, effective, and up-to-date phishing detection tools to protect against the phisher’s adaptive approaches is critical. Other techniques were used to improve the accuracy of PW detection systems. In comparison to other recent methodologies, the results of these research showed that the recommended approaches improved significantly. However, there is still room for improvement in terms of detection accuracy [12].

Hung Le et al. organised a URL detector, based on deep learning. The authors claimed that the approach can extract information from URLs. Deep learning algorithms take longer to achieve results. It also parses the URL and compares it to the library to provide an output [13].

Hong et al. extracted URLs crawler from data repositories. Phishing websites were identified using a lexical characteristics technique. The crawler-based dataset was used to assess performance. As a result, there is no guarantee that the URL detector will work with real-time URLs [14].

Kumar et al. organised ML-based URL for shortlisted dataset. A lexical feature technique was also used to distinguish between dangerous and authentic websites. The authors used an older dataset, which might lower the detector's performance when using real-time URLs [15].

The current work proposes an optimal ensemble classification approach for detecting PWs. Training, feature optimisation, and testing are the three primary steps in this process. The classifiers (RF, J48, KNN, and naive Bayes) were first trained, and no optimisation strategy was used in this stage. In the second stage, a hybrid features selection approach is utilised to optimise these classifiers that may be used to improve the classifiers’ overall accuracy. Following that, depending on their ranking, optimised classifiers were used as the stacking ensemble technique basis classifiers. Finally, a test dataset is created by gathering new websites, which is then utilised to predict the websites’ eventual class designation. The following is the study’s structure. “Materials and Methods” provides associated literature. In “Conclusion”, the approach and materials were not employed. In the next section, the results of the current study's experiments are reported. The findings are described and compared with related literature in the same section. The results and recommendations are reported in the final section, which summarises the current study's conclusion.

Materials and Methods

Data Preparation

The dataset from UCI repository for phishing websites was utilised to perform experiments and assess the efficacy of the suggested strategy in this study. We used the publicly available informative index for the execution and testing of our machine learning computation, which provides the following assets that may be used as contributions for model structure: a collection of site URLs for (11,430, 89) locations. Each example contains 89 site limits and a class name that indicates if the site is phishing or not (1 or − 1). There are 5715 phishing sites in the sample Fig. 1.

Fig. 1
figure 1

Graphical representation of phishing websites attributes by histogram

Methods

Random Forest

Random forest uses a supervised learning method to detect phishing websites. In this experiment, random forest predicted an ensemble classifier which integrated different weak decision trees. This algorithm is organised to enhance the average accuracy in various experiments. It depends on the majority of voting for strong prediction in various predictions [16].

Naïve Bayes

The Bayes’ uses naive Bayes classifiers to detect phishing websites. Naïve Bayes algorithms always support an object likelihood and do not predict different ideas for each pair as dependent on others. Naïve Bayes well predicted by rapid machine learning algorithms [17].

J48

Phishing data overfitting is a problem encounter, in which decision trees provide balance to solve the problem. In this situation, the ID3 algorithm experiences data overfitting. The difficulty with decision trees is that they separate the data into pure sets. Pruning is directed to detect data overfitting in J48's extension. J48 is an ID 3-based machine learning decision tree classification technique. It is quite useful for categorising and continually examining data. J48 always generates new patteren in each experiment with correct observation [18].

KNN

There is no such thing as the best classifier; it all relies on the situation and the type of data or problem at hand. Because it does not generalise across data in advance, kNN is sluggish when there are a number of observations. Instead, it reads the historical database each time a prediction is needed, as mentioned. The qualities of KNN are that it is a non-parametric technique and a sluggish learning algorithm. Because it simply saves the stage, the algorithm requires practically no time to consider. After that, the saved data is utilised as a new observation point. There is no such thing as the optimal classifier; it is always dependent on the situation and the type of data or problem at hand. Because it does not generalise across data in advance, kNN is sluggish when there are a lot of observations. Instead, it reads the historical database each time a prediction is needed, as mentioned [19].

Proposed Method

The proposed approach was as follows: a cross fold was applied to the dataset with ten folds ranging from 1 to 10. Heat matrix feature optimisation is achieved as a result of the resultant subsets. The phishing websites were then classified using four different classifiers: J48, KNN, NB and RF, naive Bayes, J48, and KNN. The enhanced accurate rate, F-measure and precision were used to determine the best classifier for detecting phishing websites in Fig. 2.

Fig. 2
figure 2

Graphical representation of the proposed model for phishing website attributes

Following the identification of key traits, the ML systems could be trained to detect whether or not the site was real. As one rises, the other rises with it, and the closer one gets to 1, the stronger does the bond become. Overlaying a data-driven “paint by numbers” canvas on top of a picture creates a heat map in Fig. 3. The dataset was subjected to a set of classifiers [20].

Fig. 3
figure 3

Graphical representation of phishing websites 89 features by heatmap

The proposed stacking ML ensemble model for improved phishing website detection is shown and explained in this part.

The experiments are divided into three stages: training, rating, and testing. The next sub-sections go through these processes in further detail. The classifiers (RF, NB, J48, and KNN) are trained without optimisation in the training stage. After that, the goal is to first gain a general idea of the classification performance before optimising it, and then to figure out which PW properties are the most useful. To improve the above-mentioned classifiers, the Chi-square, extra tree, and heatmap are utilised. The stacking approach was used to assemble the optimised classifiers and form an ensemble classifier in the ranking step, and the stacking ensemble method was used to boost the overall accuracy of the recommended model by selecting the ideal values of model parameters. The efficacy of each strategy was determined by comparing the outcomes of each method separately. We employed random forest, naive Bayes, J48, and KNN. The strategy that was the most effective in terms of improving and developing the identification of phishing sites was established. For each combination, the accuracy, precision, and F-measure values were determined.

Result and Discussion

In the studies, the tenfold cross-validation approach was employed to validate the models and minimise prediction uncertainty. Using this method, the training dataset was divided into ten subgroups. Each of these subgroups has to be assessed in each of the remaining nine subsets. Each evaluation subgroup was utilised once in each of the ten repetitions. Figure 4 depicts the performance of the four classifiers (RF, NB, KNN, and J48) as well as the derived average accuracy score of 97.024%, 93.171%, 94.446%, and 96.726%, respectively.

Fig. 4
figure 4

Representation accuracy of RF, NB, KNN, and J48 classifiers for phishing websites attributes

In this experiment, we consider the genuine value near to accuracy. The precision is restricted and always assigned the same value in the same experiment for other processes. In the studies, the tenfold cross-validation approach was employed to validate the models and minimise prediction uncertainty. Figure 5 depicts the performance of the four classifiers (RF, NB, KNN, and J48) as well as the estimated average accuracy score (96.582, 94.695, 94.519 and 94.985).

Fig. 5
figure 5

Representation precision of RF, NB, KNN, and J48 classifiers for phishing websites attributes

In this experiment, we examine the true positive values. The recall always shows goodness of that model and search and identify the genuine position values. In the studies, the tenfold cross-validation approach was employed to validate the models and minimise prediction uncertainty. Figure 6 depicts the performance of the four classifiers (RF, NB, KNN, and J48) as well as the estimated average recall score (98.084, 96.601, 96.393 and 96.730).

Fig. 6
figure 6

Representation recall of RF, NB, KNN, and J48 classifiers for phishing websites attributes

Optimisation by Hybrid Features Selection Methods

In the second phase, the dataset with the selected features were obtained by Chi-square, extra tree and heatmap methods, then, on each of the subsets, four classifiers (Fig. 7).

Fig. 7
figure 7

Representation heatmap for 20 selected important features for phishing websites attributes

Correlate heat_map generates boundary for good relationship of two variables. The boundary limit assigned by 1, − 1 and 0 always have very weak or no relationship. The attributes’ boundary variables go to a positive direction, indicating strength association between attributes [20].

Chi-square

Chi-square creates a relation between the observed and predicted frequencies of collected events or variables. Chi-square is a valuable tool for assessing category differences, especially ones that are nominal in character [21]. Table 1 shows 20 important features of a good score.

Table 1 Representation of Chi-square for phishing websites attributes

Extra Tree Features Selection Method

[0.00417966 0.00464072 0.00366814 0.00175792 0.00289009 0.0046448 0.00391138 0.00215644 0.008263 0.01136815 0.01407321 0.00750137 0.0043768 0.01343768 0.00148478 0.01369747 0.00198299 0.21401142 0.04441774 0.63753626].

In this experiment, we applied extra tree classifier as ensemble learning technique. Extra tree does well in generating decision tree by de-correlated decision tree as aggregrates of trees outcomes [22]. In Fig. 8, the important phishing website attributes are selected.

Fig. 8
figure 8

Representation of extra tree features selection for important features for phishing website attributes

The previous experiment employed a tenfold cross-validation approach to assess the accuracy, precision, and recall; however in this phase, we used Chi-square, additional tree, and heatmap to calculate the feature intensity and significance. The findings of hybrid performance of feature selection approaches lower the uncertainty of prediction. Each evaluation subgroup was utilised once in each of the ten repetitions. Figure 9 displays the performance of the four classifiers (RF, NB, KNN, and J48) as well as the estimated average accuracy score (96.744%, 93.633%, 97.014% and 96.897%).

Fig. 9
figure 9

Representation classification accuracy after using hybrid features selection methods

During the first and second phases of the experiment, the values for accuracy, precision, and recall were calculated, but no viable classifier that provided continuous results was found. We employed these classifiers in a single unit as a stacking ensemble approach (RF + NB + KNN + J48) to forecast phishing websites, since some algorithms produce high values in phase-1 and then compute low values after features selection.

For the same dataset, the accuracy results were compared to suggested hybrid feature selection approaches and stacking ensemble methods. In phase three, the ensemble stacking approach was proven to be more accurate than the other four classifiers in Table 2. Finally, the acquired accuracy was 99.7% with a tenfold increase, which outperformed the other classifier techniques in the previous phase.

Table 2 The accuracy of the optimised stacking ensemble method

Conclusion

An efficient stacking ensemble model is proposed in this work to detect phishing sites. Using Chi-square, additional tree, and heatmap, the optimisation approach was utilised to determine the optimal parameter values of multiple classifier learning algorithms. The suggested model is made up of three steps. Several classifier learning methods, including RF, NB, J48, and KNN, were learned in the phase-1, training stage, without employing hybrid feature selection methods. Phase 2 is used to optimise these classifiers by picking the hybrid feature selection method values and calculating the accuracy, precision, and recall score; however if there are any imbalances, the following phase is employed. Certain classifiers were utilised as foundation classifiers for the stacking ensemble approach in phase 3. The best ensemble approaches were these classifiers (RF, NB, J48 and KNN). Finally, all methods were tested using the same dataset as the class’s test dataset (legitimate or phishing). With the suggested optimised stacking ensemble approach, phase 3 obtains sufficient findings and a superior performance compared to previous ML-based detection methods. The level of precision attained was 99.7%. A statistical study was undertaken to establish that the gained improvements were statistically significant. Furthermore, the findings revealed that the proposed approaches were more accurate than previous research that employed the same phishing dataset. More light detection methods will be more accurate with IoT surroundings, as a guideline for future investigations. It is also a good idea to use deep learning algorithms to analyse and increase the detection rate of PWs, as well as to use more phishing datasets.