1 Introduction

With respect to the advancement in information technology (IT), there has been an increasing number of digital services provided through the Internet ranging from financial services to gaming applications [1]. Financial services, social media, and online gaming applications are the top engaging digital web-based services with a huge and increasing audience. The enormous amount of users of these digital services signifies its success and acceptability in modern society [2]. With a high level of acceptability, there are cybersecurity issues such as privacy disclosure, identity theft, and phishing that comes with these digital services via the Internet [3].

In recent times, fake websites are being developed and hosted by criminals to steal sensitive information such as credit card information, passwords, and usernames from unsuspecting users for illegal activities or transactions. This is referred to as a form of a phishing attack [4]. It is a critical cybersecurity issue plaguing cyberspace with a severely damaging after effect on the Internet users and businesses [5, 6].

Vrbančič et al. [7] described phishing as an extensive fraud that occurs when a malicious website acts, looks, and feels almost identical to a legitimate website, bearing in mind that the utmost goal is the obtainment of victim’s sensitive data. Over the years, efforts and countermeasures are put in place in detecting phishing websites such as having a blacklist repository, educating digital users about cybersecurity, Google PageRank method, and even developing machine learning (ML) models [8]. There are three categories of phishing detection mechanisms according to [9], namely (1) machine learning; (2) heuristic; and (3) list-based methods.

However, phishing website is highly becoming capable of avoiding detection due to the evolving nature of conducting phishing attacks by attackers as there are ways of evading these conventional countermeasures [10, 11]. In the case of ML methods, which is the focus of this study, there have been models with relatively high detection accuracy. However, they often suffer from high false alarm rate (FAR) aside seeking the improved performance of phishing methods [2, 12].

Given the unrelenting efforts of attackers to conduct subtle digital activities through social engineering that often lead to stealing of private information, identity theft, financial loss, and customers’ inability to trust previously attacked organizations such as bank and e-commerce ventures [13], and the growing prowess of phishing websites to evade detection [10], the need for a robust, up-to-date Phishing Websites Detection Models (PWDMs) is imminent to effectively and efficiently prevail in categorizing legitimate website from a phishing website and thereby abate this nefarious activity.

Hence, this study proposed novel ForestPA-based meta-learning models for detection of phishing websites. ForestPA uses a weight assignment and weight increment strategy to build highly efficient decision trees by exploiting the prowess of all attributes (non-class inclusive) in a given dataset.

Specifically, the following are contributions of this study to the body of knowledge:

  1. (1)

    The use of more recent and comprehensive featured phishing website data as input data, i.e. the UCI phishing website dataset, for the development of PWDMs.

  2. (2)

    Implementation of ForestPA algorithm for detecting both legitimate and phishing websites;

  3. (3)

    Implementation of Bagging and Boosting Meta-learners for improving ForestPA performance; and

  4. (4)

    An empirical comparison of the proposed PWDMs with existing state-of-the-art phishing methods.

More so, it is the intention of this study to answer the following research questions:

  1. (1)

    How effective is the ForestPA algorithm implementation for detecting phishing and legitimate websites?

  2. (2)

    How effective are the Meta-learners (Bagged-ForestPA and Boosted-ForestPA) in detecting phishing and legitimate websites?

  3. (3)

    How well is the performance of the proposed PWDMs compared with existing state-of-the-art methods?

The remaining Sections of this paper include Related Works which presents the critical review of existing related methods that are currently published. The ‘Method’ section provides details about the phishing website datasets, the implemented algorithms, this study experimental framework, and the performance evaluation metrics for testing the proposed phishing website detection model. The ‘Experimental Results’ section reports the performance results of all developed and tested models in a step-wise manner (i.e. ForestPA-PWDM, Bagged-ForestPA-PWDM, Boosted-ForestPA-PWDM). The reported performance of each model (as espoused in the ‘Method’ section) was discussed individually while some figures (i.e. summarized visualization) being provided. The ‘Discussion’ section comprehensively discussed the performance of the proposed methods of this study. More so, a comprehensive comparative analysis of the methods developed and evaluated in this study against existing methods (as reviewed in the ‘Related Work’ section) was presented. Lastly, the ‘Conclusion’ section brings this study to an end by providing the answers to the highlighted questions of this research. In addition, the ‘Conclusion’ section identifies the future works of this study.

2 Related Works

The critical review of some related existing studies based on phishing attack is essential in order to establish and amplify the importance and significance of this study. Zamir et al. [14] presented a machine learning-based method for detecting phishing website using the same dataset as this study. The study conducted various experiments using information gain, gain ratio, relief-F, principal component analysis (PCA), and recursive feature elimination feature selection algorithms. Also, it made use of support vector machine (SVM), Naïve Bayes, Random Forest, K-nearest neighbour (kNN), Bagging and Neural Network (NN) machine learning algorithms. These algorithms were used for finding optimal features, developing individual machine learning models, and were also combined using in two (2) different Stacking methods vis-à-vis (RF + NN + Bagging) and (kNN + RF + Bagging). The performances of all developed models were evaluated using accuracy, recall and precision metrics. Conclusively, the research’s proposed method (i.e. PCA feature selection and Stacking (RF + NN + Bagging)) produced the best accuracy of 97.4%

The research conducted by Abdulrahaman et al. [2] presented PWDMs based on Random Forest with Wrapper Feature Selection Method. The study presented a decision table PWDM of 93.24% accuracy with 0.75 false alarm rate (FAR), a sequential minimal optimization (SMO) for support vector classifier PWDM of 93.81% accuracy with 0.066 FPR, a Naïve Bayes (NB) PWDM of 92.98% accuracy with 0.076 FPR, and the proposed wrapper-based Random Forest PWDM of 97.259% accuracy with 0.03 FPR. The main limitation of the study is the performance comparison of an ensemble method model against single classifier models. Random Forest is an ensemble of decision tree, and it is expected to produce superior model against single classifiers. In addition, some of the models had high FPR values.

Ali and Ahmed [3] study hybridized Deep Neural Networks (DNN) and genetic-based feature selection with weighting methods. The DNN model achieved an accuracy of 88.77% with True Positive Rate of 85.83%. The proposed DNN with GA based on feature weighting had 91.13% accuracy and 90.79% TPR. The limitation of the developed model of this study is that the accuracy and TPR scores are relatively low which indicate a very high FPR value.

The research work of Zabihimayvan and Doran [9] presented a method for detecting phishing websites using machine learning-based strategy. Importantly to the publication is the usage of fuzzy rough set (FRS) algorithm for executing an efficient feature selection process as a data preprocessing method for enhancing phishing website detection models. The resulting features were used to generate a subset of the original phishing website dataset and served as input into three machine learning algorithms (1) Multi-layered Perceptron (MLP); (2) Random Forest; and (3) SMO. The performance of FRS was evaluated using the F-Measure metric while it was compared against other feature selection algorithms namely (1) Information Gain (IG); (2) Correlated Feature set (CFS); and (3) a hybridized decision tree and the Wrapper method (DW). The experimentation of these methods was conducted on three benchmark phishing datasets and further tested on 14,000 website samples. The best variation of FRS model (FRS algorithm used in conjunction with the Random Forest classification method) achieved 95% F-measure value.

Ferreira et al. [4] implemented the (MLP for developing PWDM. Their proposed PWDM produced a reported accuracy of 87.61%. In the same vein, Vrbančič et al. [7] used swarm intelligence approach (an evolutionary algorithm) for finding optimal parameter settings of Deep Learning Neural Network (TDLBA). The proposed model was fitted and evaluated on the UCI phishing datasets. TDLBA produced an accuracy of 96.5%. The performance of the developed model of this study was evaluated using only the accuracy measure. This limited the study as accuracy is neither the appropriate nor the only measure of evaluating a classification model whose data are highly imbalance.

Subasi et al. [13] in their study implemented PWMDs based on Artificial Neural Network (ANN), Classification and Regression Trees (CART), and Rotation Forest (RoF), respectively. From their experimental results, ANN had an accuracy of 96.91% with AUC-ROC score of 0.995, CART had 95.79% accuracy with AUC-ROC score of 0.981, and RoF had 96.79% accuracy and AUC-ROC score of 0.994. Although the models of this study are relatively high performing with terms of accuracy and AUC-ROC; however, the FPR of the models were not reported which will determine the viability of the models.

Summarily, from existing studies, various ML, DL, evolutionary algorithms, and feature selection techniques have been applied to develop viable PWDMs. However, the problem of high FAR still persists. In addition, the application of meta-learners for classification tasks has been proven to be effective as it reduces variance and bias in classification processes [15]. Consequently, this study proposes novel meta-learners (ForestPA-PWDM, Bagged-ForestPA-PWDM, and Adab-ForestPA-PWDM) based on ForestPA for detecting phishing websites.

3 Method

3.1 Dataset

There are existing standard datasets for conducting ML experiments for the development of phishing website detection models. Although, some researches chose to crawl the internet and compile a list of legitimate and phishing websites. In this study, we make use of the standard phishing website dataset created by [16]. The dataset is made available on the UCI data repository (https://archive.ics.uci.edu/ml/machine-learning-databases/00327/) for the sole purpose of developing ML-based phishing website detection models. The dataset contained comprehensive features cutting across four (4) different categories [11]. The categories of which the engineered and extracted features belong to are: (1) Address Bar-based features (2) Abnormal-Based features, (3) HTML and JavaScript-based features, and lastly (4) Domain-based features. These categories produce ranging numbers of independent features, more so, the availability of statistical reports on a URL from the reputable organization was made to into feature. The details of the dataset used by this study for experimentation are provided in Table 1.

Table 1 Description of Studied Phishing Website Dataset

As depicted in Table 1, the dataset consists of 31 attributes, of which only one (1) is the class variable (label). With a total of 11,055 instances, the dataset distribution is between two class labels vis–a–vis “-1” representing the legitimate website instances and “1” representing the phishing website instances. The total number of phishing website instances constitute the majority but does not totally dominate the data distribution as the legitimate website instances are over 44% of the data. Table 2 presents the attributes of the phishing website dataset.

Table 2 Phishing Website Data Attributes

3.2 Implemented Algorithms

This study proposes and implements three (3) novel phishing website detection models (PWDM) using the datasets discussed in the previous sub-section. The ForestPA algorithm was implemented and improved version of the same algorithm was carried out using meta-learning methods. The enhanced ForestPA via meta-learners is solely on improving the performance of ForestPA. Thus, three proposed phishing detection models were developed by implementing these algorithms vis–a–vis: (1) ForestPA, (2) AdaBoost, and (3) Bagging algorithms.

As described by Zhou et al. [18], the ForestPA algorithm promotes strong diversity by taking into consideration weight-related concerns which include but not limited to weight assignment strategy and weight increment strategy. It is a method that usually builds a set of highly accurate decision trees having exploited the strength that lies in all non-class attributes available in the given dataset. ForestPA had been previously used for developing IDS, a core feature in Network security, in a research carried out [17] and was also used with other heuristic techniques as carried out by [18].

Algorithmically, ForestPA randomly updates the weights of attributes that appear in the latest tree within a Weight-Range (WR) which is defined as

$$ {\text{W R}}^{\lambda } = \left\{ {\begin{array}{*{20}c} {\left[ {0.0000, {\text{e}}^{{ - \frac{1}{\lambda }}} } \right], \lambda = 1} \\ {\left[ {{\text{e}}^{{ - \frac{1}{\lambda - 1}}} + \rho ,{\text{e}}^{{ - \frac{1}{\lambda }}} } \right], \lambda > 1} \\ \end{array} } \right. $$
(1)

where ⋋ represents the attribute level and ρ ensures that WR for the different levels is non-overlapping. In the light of addressing the negative effect of keeping weights that are absent in the latest tree, ForestPA implements a method of systemic increment of weights of the attribute that has not been tested in the subsequent trees. For example, an attribute \( A_{i} \) is tested at level \( \rho \) of the \( T_{j - 1 } - {\text{th}} \) tree with \( \eta \) height and its weight is \( w_{i} \). Thus, calculating the weight increment value \( \sigma_{i} \) of \( A_{i} \) is:

$$ \sigma_{\text{i}} = \frac{{1.0 - {\text{w}}_{\text{i}} }}{{\left( {\eta + 1} \right) - \lambda }} $$
(2)

As such, the ForestPA is a viable method for producing reliable and robust ML models. Hence, ForestPA was used in this study to build a phishing detection model (ForestPA-PWDM).

In addition, AdaBoost which is a meta-learner method sequentially applies weak single classifier to training the re-weighted training data. As revealed by [19], AdaBoost executes a majority vote at the end of its training phase for making its final decision having integrated all the weak hypotheses developed by the weak single classifiers into one and final hypothesis. Originally, AdaBoost was developed for binary classification purposes and thus provides the justification for the selection of the algorithm for detecting a phishing website.

figure a

In this study, an extended version of AdaBoost meta-leaner (AdaBoost.M1) was considered as used by [20]. AdaBoost.M1 algorithm, as outlined in Algorithm 1, was used in this study to develop an enhanced variation of the ForestPA model (AdaB-ForestPA-PWDM).

The bagging meta-learner method is a method whose base-learners, during its training phase, learn from the original dataset by extracting different subsets from the original dataset for fitting different models [21]. Bagging meta-learner ensures that the variance of each developed model is being reduced while keeping the bias of the same models from increasing by applying aggregation technique on all the developed models. According to [22], bagging meta-leaner executes a random resampling of the original dataset, develops multiple base classifiers by fitting models on the resampled subsets and then aggregates the models into a single model for the sole purpose of making predictions. The Bagging meta-leaner is presented in Algorithm 2.

figure b

Accordingly, this study proposes an enhanced ForestPA based on bagging meta-learner for phishing website detection model (Bagged-ForestPA-PWDM). Bagged-ForestPA-PWDM creates multiple ForestPA models on random subsets of the selected dataset and then aggregates the same models to produce a final model for the detection of phishing websites.

3.3 Experimental Framework

Using the three (3) different machine learning algorithms discussed above, three predictive models were developed after fitting the algorithms on the aforementioned datasets. Since it is known that model development is the next stage after the dataset and algorithm selection process and method identification phases, the N-fold cross-validation model development method was implemented in this study.

In this phase, the proposed PWDMs are trained and evaluated accordingly as presented in Fig. 1. The proposed models were trained and tested using N-fold cross-validation method (in this case, N = 10). N-fold cross-validation simply divides a given dataset into N partitions, trains with N − 1 partitions of the data, and then tests the ensuing model with Nth partition. This process is iterative is repeated for N times until all parts of the data are being used for both training and test. At the end of the iteration, the models are aggregated and evaluated mostly using weighted or average metric values.

Fig. 1
figure 1

Experimental Framework

According to the experimental framework (See Fig. 1), the proposed PWDMs are implemented in the Phishing Detection Models module. The ensuing models are evaluated based on the Model Development module. Tenfold cross-validation technique is used for fitting each proposed PWDMs on the phishing data accordingly. The performances of the ensuing models on the test data were assessed using selected evaluation metrics. Conclusively, a comparative performance analysis of the developed PWDMs model is being carried as well as a comparison with existing state-of-the-art methods.

The proposed PWDMs models were implemented using the WEKA Data mining tool. The respective parameters settings of the proposed models are presented in Table 3.

Table 3 Parameter setting of the proposed PWDMs

3.4 Performance Evaluation Metrics

Following the model development process stage, the developed models are evaluated. As such, the performances of models were evaluated using popular evaluation metric for this kind of study

This section presents the performance evaluation metrics used for measuring the efficacy of the proposed PWDMs in this study. In accordance with existing and related studies, accuracy, TP-rate, FP-rate, Precision (P), Recall (R), F-Measure, ROC and cohen’s Kappa values were used for evaluating the performances of PWDMS [20, 23, 24]. The mathematical formulas for each metric are described as follows:

  1. (1)

    Accuracy: is the percentage of all correctly classified phishing websites.

    $$ {\text{Accuracy}} = \frac{TP + TN}{TP + FP + FN + TN} $$
    (3)
  2. (2)

    Recall: is the total number of phishing websites that are correctly classified.

    $$ {\text{Recall}} = \frac{TP}{TP + FN} $$
    (4)
  3. (3)

    Precision: is the number of predicted phishing websites that are actually phishing websites.

    $$ {\text{Precision}} = \frac{TP}{TP + FP} $$
    (5)
  4. (4)

    F-measure: is the weighted harmonic mean of the precision and recall of the test. The best value will be at 1 and worst at 0 value.

    $$ F - {\text{Measure}} = \frac{{2 \times {\text{Precision}} \times {\text{Recall}}}}{{{\text{Precision}} + {\text{Recall}}}} $$
    (6)
  5. (5)

    Cohen’s Kappa: is a chance-corrected measure calculated by taking the agreement expected by chance away from the observed agreement and dividing by the maximum possible agreement. A value greater than 0 means that the classifier is doing better than chance

    $$ \kappa = \frac{\Pr \left( a \right) - \Pr \left( e \right)}{1 - \Pr \left( e \right)} $$
    (7)

More so, the confusion matrix [25] was also used for evaluating the performances of the PWDMs as shown in Table 4. Also, the inherent metrics obtained through the confusion matrix were also used such as the true positive rate (TP rate) and the False Positive Rate (FP Rate). The confusion matrix is presented below

Table 4 Confusion Matrix
  1. (6)

    True Positive (TP) rate: refers to the rate at which actual phishing website instances are correctly classified as that phishing website.

    $$ TP = \frac{TP}{TP + FN} $$
    (8)
  2. (7)

    False Positive (FP) rate: is the value of the incorrectly classified legitimate websites as a phishing website

    $$ FP = \frac{FP}{FP + TN} $$
    (9)
  3. (8)

    Receiver Operating Characteristic (ROC) Curve: is not susceptible to the majority class bias and does not ignore the minority class during its evaluation. It plots the FP rate on the X-axis and plots the TP rate on the Y-axis.

4 Experimental Results

Having implemented the proposed framework of this research, the results are being reported for each developed model starting from ForestPA-PWDM to both of its enhanced variations (i.e., AdaB-ForestPA-PWDM and Bagged-ForestPA-PWDM). Reporting the results for the ForestPA model, Tables 5 and 6 present the performance scores of the model and its corresponding confusion matrix, respectively.

Table 5 ForestPA-PWDM evaluation scores
Table 6 ForestPA-PWDM Confusion Matrix

From Table 5, it is seen that the ForestPA-PWDM produced an accuracy of 96.26% while having a TP rate of 0.973 and an FP rate of 0.050. The model did highly better than chance with a kappa score of 0.92. Also, the F-measure score of 0.967, having scored a recall value of 0.973 and a precision of 0.961, and a ROC score of 0.994 strongly indicates that the ForestPA-PWDM possess highly strong predictive prowess for determining both the majority (phishing) and minority (legitimate) class without bias. More so, the confusion matrix (as shown in Table 6) revealed that 5989 of the 6159 phishing websites were correctly classified as well as 4653 of 4898 legitimate websites were also correctly classified by ForestPA-PWDM.

As previously mentioned, improving the performance of ForestPA-PWDM was sought after in this study. The result of Bagged-ForestPA-PWDM implementation is being discussed and presented in Tables 7 and 8, respectively. Once again, Bagged-ForestPA-PWDM is the implementation of the Bagging meta-learner algorithm which made use of ForestPA as its base learner.

Table 7 Bagged-ForestPA-PWDM evaluation scores
Table 8 Bagged-ForestPA-PWDM Confusion Matrix

From Table 7, it is seen that the Bagged-ForestPA-PWDM produced an accuracy of 96.58% while having a TP rate of 0.978 and ann FP rate of 0.049. The model did highly better than chance with a kappa score of 0.93. Also, the F-measure score of 0.97, having scored a recall value of 0.978 and precision of 0.962, and a ROC score of 0.995 strongly indicates that the Bagged-ForestPA-PWDM possess the stronger predictive capability for correctly classifying both the majority (phishing) and minority (legitimate) class without bias. More so, Table 8 illustrates the confusion matrix of the model which had 6019 of the 6159 phishing websites correctly classified as well as 4658 of 4898 legitimate websites being correctly classified by Bagged-ForestPA-PWDM.

Lastly, the result of AdaB-ForestPA-PWDM implementation is being discussed. Once again, AdaB-ForestPA-PWDM is the implementation of the AdaBoost.M1 algorithm that used ForestPA as its base learner. The evaluation scores of the model are being presented in Tables 9 and 10, respectively.

Table 9 AdaB-ForestPA-PWDM evaluation scores
Table 10 AdaB-ForestPA-PWDM Confusion Matrix

The AdaB-ForestPA-PWDM showed an excellent predictive strength with an accuracy score of 97.40% and a ROC score of 0.996. These scores indicate the massive categorization prowess of the model with respect to both classes without bias. The kappa score of 0.9473 also signifies that the predictive strength of this model was not made out of chance but of intensified learning of the input data by the model. The TP rate of 0.981 showed the great strength of the model in detecting the phishing website, likewise, the FP rate of 0.035 reflects the ability of the model to drastically abate the problem of false notification of legitimate website as a phishing website. Also, the f-measure value of 0.974 (having produced a precision score of 0.973 and recall score of 0.974) supports the high predictive capability of AdaB-ForestPA-PWDM to ascertain if a website is ether legitimate or phishing.

5 Discussion

In this section, the reported results of the results will be discussed. The discussion will compare the reported performance of this study among themselves and against existing methods was reviewed in the related work section. Table 11 provides a tabular comparative analysis of this study and the existing methods.

Table 11 Comparative Analysis of Existing Phishing Website Detection Model

As seen, Table 11 presents a tabular comparative analysis of the developed PWDMs of the study as well as with other reviewed related existing methods. Having implemented the ForestPA algorithm (i.e. both as a single classifier and with two (2) enhanced variations (1) Bagging and (2) Boosting methods), as the proposed PWDMs which were evaluated and reported, it is, therefore, necessary to discuss the results. Beforehand, it is important to comparatively analyse and discuss the PWDMs of this study. While the ForestPA-PWDM does not produce a sub-standard model on its own, without gainsaying the two (2) implemented meta-learner approaches produced better models when evaluated across all the performance evaluation metrics as depicted in Figs. 2 and 3.

Fig. 2
figure 2

Accuracies of the developed PWDMs of this study

Fig. 3
figure 3

Pictorial representation of some performance evaluate scores of all models

It is noteworthy to highlight that in the light of this study, machine learning algorithms are highly competent in ascertaining whether a website is either legitimate or phishing. Through this study, it is evident that the simple implementation of appropriate machine learning algorithms for a defined problem is better than implementing complex and or hybridized algorithms. Often time, the implementation of deep learning for finding solutions to some problem is inappropriate—as the case of this phishing website detection, where simple machine learning algorithms will outperform deep learning methods (as seen in Table 11) because deep learning methods mainly performs on big data with multi-dimensions and tens of thousands of instances. Also, the development of complex model through hybridization of various feature selection technique and various stand-alone machine learning algorithm by [14] produced an accuracy of 97% (values of other performance metrics were not reported) which extremely competes with of 97.404% accuracy. However, the computational cost of [14] high performing method puts it at loss against the AdaB-ForestPA-PWDM. More so, the problem of parametrization of each of the four (4) algorithms used by [14] model is yet another detriment as compared to this study’s AdaB-ForestPA-PWDM.

The simple implementation of ForestPA for PWDM (ForestPA-PWDM) produced an accuracy of 96.26%, 0.963 TPR, 0.04 FPR, and ROC of 0.994 outperformed various existing phishing website detection methods that implemented deep learning such as the study of [4] of 87.61% and the DNN implementation of [3] with 88.77% accuracy with TPR of 0.858, and also the [3] DNN with GA-based features weighting implementation which had 91.13% accuracy with 0.908 TPR. This evidence reinforced the high predictive capability of machine learning and the often-inappropriate usage of a deep learning algorithm. Also, ForestPA-PWDM outperformed quite a number of machine learning implementation such as the CART PWDM produced by [13] with an accuracy of 95.79% with 0.981 ROC, [2] decision table of 93.24% with 0.75 FPR, SMO of 93.804% with 0.936ROC, Naïve Bayes of 92.98% accuracy with 0.76FPR, and [7] logistic regression PWDMs of 94.01% accuracy. This as evidence strengthens the superior predictive prowess of ForestPA-PWDM over existing machine learning phishing website detection methods and also provides answer to the first and fourth research questions of this study.

In addition, the application of FRS feature selection and Random Forest classification algorithm by [9] produced an F-measure value of 95 (i.e. 0.95) which is outperformed by the simple ForestPA-PWDM implementation as well as its enhanced variations. While other existing methods such as [13] Rotation Forest PWDM of 96.79% accuracy with 0.994 ROC, [7] TDLBA/TDLHBA of 96.5% accuracy and [2] Wrapper-based Random Forest PWDM of 97.25% outperformed out ForestPA-PWDM of this study, the improved PWDMs, particularly the AdaB-ForestPA-PWDM, outperformed existing methods having produced an accuracy of 97.404%, TPR of 0.974, FPR of 0.028, and ROC curve value of 0.996. Also, this as evidence provides an answer to the second and third research questions of this study.

6 Conclusion

Paying full attention to the results and discussion sections, this research work revealed answers to several research questions. In response to the first question, the ForestPA algorithm was implemented and used to fit the ForestPA-PWDM. The result of which was able to detect phishing and legitimate website with a ROC curve value of 0.994, the accuracy of 96.26% and FPR of 0.04. This indicates that the ForestPA algorithm effectively detects either website types with very high accuracy with a bias to the majority class and with very little false alarm rate.

Answering the second research question, it was discovered that bagging meta-learner improves ForestPA and was also effective in detecting legitimate and phishing websites. The implementation of the Bagging meta-learner method by using ForestPA as base-learning produced the Bagged-ForestPA-PWDM whose performance did better than the ForestPA-PWDM. The effectiveness of the Bagged-ForestPA-PWDM is seen having produced a better accuracy of 96.581%, TPR of 0.966, PR of 0.037, and ROC of 0.995—all better than the ForestPA-PWDM performance.

In response to the third research question, this study revealed that as good as both the ForestPA-PWDM and Bagged-ForestPA-PWDM can be, the Boosting Meta-learner method surely provides superior performance. For the purpose of detecting phishing and legitimate websites, the Boosting meta-learner method (AdaB-ForestPA-PWDM) improved upon the ForestPA implementation by increasing its accuracy to 97.404%, TPR to 0.974, and ROC curve to 0.9966 while also further reducing the FPR to 0.028 which means that in real-time, false alarm notifications are next to zero.

Lastly, the answer to the fourth research question is extensively provided in the discussion section. Concisely, the phishing website detection models of this study comparatively outperformed various existing methods. With the ForestPA-PWDM outperforming more than half of the existing methods, the AdaB-ForestPA-PWDM classically outperformed all existing methods. Thus, the development and deployment of the developed PWDMs of this study as software for real-time detection of attack are considered as an important future work. More so, the hybridization of the methods with high performing feature selection method is considerable future work.