1 Introduction

As a fundamental component of the daily social activities, the Internet is in a ubiquity proliferation apart from the developers’ main goals, and due to the users are constantly exposed to online threats. Such threats may lead to the compromise of some important financial and personal data losses and identity in e-commerce (Mohammad et al. 2014b). Among various types of threats, phishing is referred to as deception in e-commerce with attempts to steal confidential information of the users through impersonating the target website. Typically, in a phishing threat, the photographs and contents of the fraudulent websites are similar to the legal websites (Basnet et al. 2008; Gupta and Shukla 2015). On the other hand, finding a solution to identify all phishing websites poses specific challenge due to considering the complexity of phishing procedure and developing ways regarding these attacks. Generally, the phishing detection methodologies can be divided into two categories: intelligent and traditional schemes. The intelligent methods (such as genetic algorithm, particle swarm optimization, harmony search, ant colony optimization, etc.) are eventually inspired from natural phenomena with decision-making ability (Mohammad et al. 2014b). While the heart of decision-making process in such intelligent algorithms are based upon training with some suitable data, the traditional approaches (such as rule-base methods, white-list, black-list, hash-list and extended black-list) require no training. In addition, traditional algorithms operate implicitly and require no classification, which leads to less execution time.

Modern browsers such as Firefox and Netcraft use generally use black-list databases, i.e., a comprehensive list of fraudulent websites in order to deal with phishing attacks. Accordingly, when a URL is requested through the browser, the system queries the database for the URL and if the entry exists, the webpage is blocked. Such methods might deem as inadequate-in-sole solutions, because the phishers can pass through some filters using fake addresses. As a result, improvements in those traditional methods realize through integrations with other solutions to decrease the risk of vulnerabilities (Abdelhamid et al. 2014). There are number of studies (Aburrous et al. 2008; Gupta and Shukla 2015; Mohammad et al. 2014b) conducted to introduce methods based on using features for identifying legitimate websites from those fraudulent. The feature are further used as the basic knowledge of meta-heuristic algorithms or neural networks (Aburrous et al. 2010). Some of the features include an IP Address within the URL, spelling error, and abnormal DNS record.

Meta-heuristic algorithms are higher-level procedures which are designed to find, generate, or select a heuristic (partial search algorithm) that may provide a good solution for an optimization problem. Over the past five decades, many algorithms have been developed to solve engineering optimization problems. Most of the developed algorithms are based on linear or nonlinear programming approaches. However, there are some complex problems with no solutions using either a linear or a nonlinear programming method. For instance, if the problem contains more than one local optimal solution, the pertaining method must start with different initial points. Meta-heuristic alternatives are able to find optimal solution in complex problems using their capabilities (combination of randomness and rules, high speed, etc.) (Lee and Geem 2005). General classification of meta-heuristics is shown in Fig. 1 based on their operational procedure. Some of the procedures have used a dataset to classify the features which are effective in the phishing detection (Hamid and Abawajy 2011; Mohammad et al. 2012; Montazer and ArabYarmohammadi 2013), and some other procedures have proposed heuristic algorithms to detect the phishing websites. One of the best solutions to detect fraud websites is the identification of the websites’ properties and modeling phishing websites based on their characteristic. According to this issue, there are various methods for modeling of the dynamic systems (Qiu et al. 2017; Wei et al. 2017). The phishing websites can be modeled by their properties which could lead to reduce the computational cost.

Fig. 1
figure 1

General classification of meta-heuristic method

In order to improve the accuracy and the efficiency of the phishing detection mechanism, the current paper proposes a detection solution based on a nonlinear regression method. In the study, we used a dataset from the UCI Database (Mohammad et al. 2015). The dataset consists of 11055 website instances (rows) and 31 features (columns). We used two feature selection methods, namely decision tree (DT) and the wrapper. The feature selection techniques were utilized to remove the irrelevant attributes and to reduce the train time. After feature selection, a model of nonlinear regression (NR) is suggested and then a modified harmony search (MHS) is used to find the optimal parameters of the proposed model. The nonlinear regression based on harmony search (NR-MHS) and support vector machine (SVM) are used to predict the fraudulent websites. This research shows that using of meta-heuristic algorithms confirms better performance in comparison with some other heuristic algorithms.

The rest of this paper is organized as follows. Section 2 is devoted to a literature review regarding the previous works. In Sect. 3, the proposed phishing detection method is presented. The experimental results and discussion are shown in Sect. 4. At last, Sect. 5 ends this paper with conclusions.

2 Literature review

In this section, some related studies about phishing detection are reviewed. Mohammad et al. (2014b) have used the artificial neural network to detect phishing websites. The applied neural network consists of 17 input neurons that show the number of the selected features. Their work indicated that the hidden layer can include one or more neurons. Furthermore, 80% of data has been used for train and 20% of data has been adopted for test. The testing accuracy of the prediction has been obtained 92.48% in 500 epochs. Hamid and Abawajy (2011) have used hybrid-feature selection method to detect the phishing E-mails. Seven features have been used to predict fraudulent websites, and the detection accuracy of about 93% has been obtained. Montazer and ArabYarmohammadi (2013) have prepared some questionnaires to access expert’s view point about the degree of importance of each features in Iranian’s e-banking. In their research, 40% of questionnaires have been returned and the results have been averaged. After gathering respondent data, they have used the exploratory factor analysis to determine the critical indicators which were effective on phishing detection in Iranian e-banking system. The average value of features has been divided into the same range between 5 and 8, which means the “Medium” and the “Much” importance. Some features have been selected as more important factors among all of 28 features. The selected features were: the server form handler (SFH), distinguished names certificate (DN), disabling right click, using hexadecimal character codes and abnormal cookie. In Pandey and Ravi (2012), data and text mining methods have been applied to detect the phishing E-mails. The dataset used in Pandey and Ravi (2012) consists of 2500 phishing and legitimate E-mails. The text mining has been used to select 23 features from email body. Then, the t-static method has been used to choose the most important features. They have used the multi-layer perceptron (MLP), decision tree, SVM, group method of data handling, genetic programming and logistic regression for classification. As shown in their results, the MLP confirms a better accuracy than the other methods. The accuracy has been obtained 98.12% for MLP. According to the prevalence of social media network like twitter, Jeong et al. (2016) have used a 2-phase clustering algorithm which is called PDT (phishing detector for twitter) to detect the phishers, scammers and spammers. The features which have been adopted in this research have been divided into the three groups: tweet features, user features and URL features. They have obtained a variable accuracy for phishing detection (between 0.88 and 0.99). The variable accuracy is the most important weakness of this approach. Forwarding-base features were used in Cao et al. (2016) to detect the malicious URLs in online social networks. The authors of Cao et al. (2016) have used the Bayes net, J48 and random forest to detect the phishing URLs. As shown in their results, the average accuracy reached to 83.21%.

However, most of the above-mentioned methods suffer from some restrictions which include: lack of stability to change within phishing tricks, disability to detect the phishing websites with constant accuracy, the lack of comprehensive dataset, difficulty in recognizing phishers before they are attracted the users and inefficiency of list-based methods for new phishing websites.

3 Proposed phishing detection method

Hypothetically, websites are presumed to contain plenty of information, from which reasonable sets of features can be extracted (Aburrous et al. 2010; Mohammad et al. 2014b; Montazer and ArabYarmohammadi 2013). As a side effect, excess number of features can lead to inaccurate decisions due to deterioration of the resources and thereby, degradation of detection performance. For example, the required CPU time (runtime) can relatively increase by increasing the number of features (Wang et al. 2014). The features are analyzed and evaluated with the DT and wrapper approaches. Finally, in the proposed phishing detection algorithm, a modified HS and the SVM techniques are used to detect and predict the phishing websites.

3.1 Phishing dataset

The phishing dataset used in this research is adopted from the UCI Datasets (Mohammad et al. 2015) and is comprised of 31 columns and 11055 rows, consisting of 30 features (see Table 1) with the value of each feature being \(-\,1\) (Phishing), 0 (Suspicious) or 1 (Legitimate). The last column of the dataset includes the results of each sample, with phishing denoted using the value \(-\,1\) and legitimate denoted using the value 1. Hence, each row represents a legitimate or a phishing website. The detailed description of the features can be found in Mohammad et al. (2012, 2014a, b).

Table 1 The features of the UCI dataset

3.2 Feature selection method

Prior to our phishing detection approach being employed, DT and wrapper methods are applied in two phases to achieve a clear penetration of the feature set and remove the noisy features from the dataset. DT is applied in the first phase. In this approach, once the elimination of the nodes in the sub-tree does not affect the root, the feature located in the root considered as an important feature (Fig. 2a, b). When the most important feature is found, it is removed from the DT list and the next important feature will be replaced in the root (Fig. 2c, d). This procedure continues until the accuracy of the DT is decreased significantly.

Fig. 2
figure 2

Feature selection with DT method. a Decision tree. b Eliminate the feature.3. c Eliminate the feature.1. d Eliminate the feature.3

In the second phase, the wrapper procedure with genetic algorithm (GA) search method is implemented to select the best feature subset (Rodrigues et al. 2014). The classification algorithms within the wrapper methods are considered as a black box. Therefore, the classification methods are used as an evaluator for the feature subset selection and the heuristic search methods are employed to find the optimal subsets for the classification methods (Song et al. 2017). The wrapper method of feature selection is performed with GA using the DT classifier as a black box. In the wrapper method, the original features are embedded into the GA algorithm applied to find the optimal feature subset with the high train accuracy which is gained by the DT.

Initially, the dataset is divided into 10 segments (folds) wherein the GA selects 9 fold as training sequence and 1 fold as test. At each iteration, the accuracy of the selected segments (fold) is evaluated through the DT. The procedure continues until the best sets are chosen for training. Figure 3 shows the procedure of the proposed wrapper method. The wrapper method of feature selection is implemented in Waikato Environment for Knowledge Analysis (WEKA) (Hall et al. 2009).

Fig. 3
figure 3

Feature selection with wrapper method

3.3 Proposed regression model

In this paper, we propose a nonlinear regression (NR) based on HS to detect the phishing websites using the extracted feature. The nonlinear regression attempts to find the functional relationship between the inputs and outputs(Fil et al. 2016). Here, the coefficients of the nonlinear regression are estimated by a modified HS (MHS). The proposed MHS is designed to minimize the mean-square-error (MSE) between the predicted and target outputs. The following model is used in the proposed approach as the cost function.

$$\begin{aligned} F({r})=\hbox {Sign}\left( \sum _{i=1}^N {\alpha _i x_i } +\sum _{t=1}^N {\sum _{j=t+1}^N {\alpha _{tj} x_t x_j +\beta } } \right) \end{aligned}$$
(1)

where, r denotes the row, N shows the number of selected features, \(\alpha \) is a harmony, \(\beta \) is a random number between \([-\,1, 1]\) and x denotes the input vector which shows the instances of websites and includes 20 features. The sign function of a real number x is defined as follows.

$$\begin{aligned} sign(\hbox {x})=\left\{ {\begin{array}{l@{\quad }l} 1 &{} x>0 \\ 0 &{} x=0 \\ -\,1 &{} x<0 \\ \end{array}} \right. \end{aligned}$$
(2)

Finally, MSE is calculated for each row (vector) of dataset matrix as follows.

$$\begin{aligned} \hbox {MSE}=\frac{\sum \nolimits _{r=1}^M {({F}({r})-{F}({s}))^{2}} }{M} \end{aligned}$$
(3)

where F(s) represents the desired output, M denotes the number of rows and F(r) is obtained from Eq. (1).

3.4 Main procedure of the proposed phishing detection approach

In this article, the nonlinear regression based on modified harmony search and the SVM classification are used for phishing detection. The methods are described in details as below.

3.4.1 Nonlinear regression based on harmony search

As mentioned earlier, the nonlinear regression is a regression analysis that uses a combination of the independent variables to solve the nonlinear problems. Most of the researches use optimization algorithms and neural networks to achieve the best weights for the NR model (He et al. 2016; Satapathy et al. 2012). In this study, the harmony search is used to estimate the best weights for the NR. Harmony search method is a meta-heuristic algorithm which is used for optimization problems. HS is inspired from the process of musical performances (Ameli et al. 2016; Wang et al. 2016). In this algorithm, a solution vector is similar to a harmony in music and searching for solution vector is the same as the process used by an orchestra (looking for the best harmony among all available modes for playing) (Manjarres et al. 2013). The advantages of HS in comparison with the other meta-heuristic algorithms are using the stochastic search based on the pitch adjustment rate and the harmony memory consideration rate (Kalivarapu et al. 2016). Figure 4 illustrates the modified harmony search flowchart. In order to increase the accuracy of the traditional HS and to give the ability of escaping from local optima, a modified HS for phishing detection is proposed in this paper as below.

Fig. 4
figure 4

Flowchart of the proposed modified harmony search

  • Step 1 Initialize the default parameter of HS: HMCR, PAR, HMS and BW. where HMCR is a probability of new harmony selection from harmony memory, PAR presents the probability of the new harmony obtained by adding a small random value between \([-\,1, 1]\), HMS and BW are the size of harmony memory (in this work obtained 30) and the bandwidth of decision value (between \([-\,1, 1]\)), respectively.

  • Step 2 Initialize harmony memory (HM) by a random matrix of containing values in range \([-\,1, 1]\).

    $$\begin{aligned} \hbox {HM}=\left( {{\begin{array}{c@{\quad }c@{\quad }c} {\alpha _{1,1} }&{} \ldots &{} {\alpha _{1,M} } \\ \vdots &{} \ddots &{} \vdots \\ {\alpha _{\mathrm{HMS},1} }&{} \cdots &{} {\alpha _{\mathrm{HMS},M} } \\ \end{array} }} \right) \end{aligned}$$
    (4)
Fig. 5
figure 5

New harmony generation with \(\hbox {GNH}= 3\)

  • Step 3 Generate a new harmony \((\alpha _\mathrm{new}^{\prime } )\) vector. The \(\alpha _\mathrm{new}^{\prime } \) can be chosen from HM with the HMCR probability:\(\alpha _{new,i}^{\prime } \in \{\alpha _{1,i} ,\alpha _{2,i} ,\ldots ,\alpha _{\mathrm{HMS},i} \}\), and with the 1-HMCR probability, it can be equal to a random number between \([-\,1, 1]\). If the new harmony is chosen from HM, with PAR probability, the \(\alpha _\mathrm{new}^{\prime } \) will be summed with a random number (DELTA) between \([-\,1, 1]\)\((\alpha ^{{\prime }}_\mathrm{new} =\alpha _\mathrm{new}^{\prime } +\hbox {DELTA})\).In the common harmony search, the decision is made separately for each element of new harmony and the number of selected elements in the new harmony is constant (equal to 1) but in the proposed harmony search the number of the new generated harmony elements can be changed between 1, 3, 5 and 7, in each iteration. Figure 5 illustrates an example of generating new harmony, where the number of the generated new harmony (GNH) is set to 3. The PAR is also changed dynamically in each iteration (Naik et al. 2016).

    $$\begin{aligned} \hbox {PAR}=\frac{(\hbox {PAR}_{\max } -\hbox {PAR}_{\min } )}{(\max \, \hbox {Itteration})\times \,\hbox {current}\,\hbox {Itteration}+\hbox {PAR}_{\min } } \end{aligned}$$
    (5)
  • Step 4 Replace the worst vector in the HM by the new vector, if the new vector is better than the worst one.

  • Step 5 Repeat Steps 2–4 until a termination criterion is obtained.

3.4.2 Support vector machine (SVM)

Support vector machine (SVM) is a supervised learning method that analyzes the data used for classification and regression (Cai et al. 2003). In linear separable cases, SVM constructs a hyperplane to separate two different classes. The hyperplane is constructed by finding vector w and parameter b that minimizes \(\Vert w\Vert ^{2}\) and satisfies the following conditions considering the training data as \(\{({x}_{1}, {y}_{1}), ({x}_{2}, {y}_{2}), {\ldots }, ({x}_{{n}}, {y}_{{n}})\}\):

$$\begin{aligned}&\hbox {Minimize}:\frac{1}{2}w^{T}\cdot {w} \end{aligned}$$
(6)
$$\begin{aligned}&\hbox {Subject to}:y_i (w^{T}\cdot {x}_i +{b})\ge 1 \end{aligned}$$
(7)

where w is the weight vector, x is the input vector, y is the classes label and b represents the bias term. To deal with cases where there may be not separable due to noisy data, the soft margin SVM is proposed in Xia et al. (2016). The SVM changes into the following model when the case consists of non-separable data due to some noises.

$$\begin{aligned}&\hbox {Minimized}:\frac{1}{2}w^{T}\cdot {w}+{C}\sum _{i=1}^N {\xi _i } \end{aligned}$$
(8)
$$\begin{aligned}&\hbox {Subject to}:y_i ({w}^{T}\cdot {x}_{i} +{b})\ge 1-\xi _{i} ,i=1,2,3,\ldots ,n\nonumber \\ \end{aligned}$$
(9)

where \(C\ge 0\) is a parameter that controls the amount of training error and \(\xi _i \)s represents the nonnegative slack variables which are misclassified. In this work the amount of C is chosen based on trial and error method. The solution procedure indicates that the best value for this parameter is 1.

Remark 1

According to above equations, x and y are feature vectors and classes, respectively. On the first step, LibSVM tries to find the optimal w vector which must satisfy the main condition (Eq. 8) and after that, w is substituted in Eq. 9 to obtain the test and train accuracies. Karush–Kuhn–Tucker (KKT) conditions are the first-order requirements for a solution to the nonlinear convex optimization problem(Jahn 2017). KTT can be investigated to guaranty the feasibility of the proposed algorithm.

4 Simulation results

In order to evaluate the performance of our approach, we conducted simulation analyses as described in this part. The feature selection and phishing detection methods were implemented in Weka3. 6. 0 and MatlabR2014a, respectively. All the simulation runs were implemented on 2.00 GHz processor with 6 GB of random access memory.

4.1 Results of NR-MHS and SVM

Both SVM and HS methods were initially structured with a random population in the range of \((-\,1, 1)\). The benchmark dataset which was used in this paper selected from UCI database, which consists of 11055 rows and 31 columns and three references (Mohammad et al. 2012, 2014a, b). Table 2 lists partial details of the dataset.

Table 2 Description of the dataset (Mohammad et al. 2012)

In the feature selection, two methods including the wrapper method (with genetic search method and 10-fold cross-validation) and the DT algorithm are compared with each other. As seen in Figs. 6 and 7, the 20 most important terms are chosen among 31 features.

Fig. 6
figure 6

Feature selection with decision tree

Fig. 7
figure 7

Feature selection with wrapper

Figure 6 depicts the features which have been selected using the DT method. Each value on top of the bars represents the accuracy of the decision tree when the specific feature is chosen as the root. For example, "SSL_finalstate" is the most important feature for two reasons: (1) it is placed in the root when the DT is plotted for all features, and (2) the elimination of the other features does not significantly affect the accuracy of the DT. As shown in Fig. 7, the blue bars represent the selected features and the red bars show unselected features. The number on the top of the bar shows the merit of each feature. The accuracy of feature selection methods is evaluated based on the values of precision and recall.

Precision is defined as the ratio of the number of correct phishing classes, toward of the phishing classes.

$$\begin{aligned} \hbox {Precision}=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FP}} \end{aligned}$$
(10)

where TP denotes the number of classes that are correctly labeled as phishing webpages and FP is the number of classes that are incorrectly labeled as phishing webpages.

Recall is defined as the ratio of the number of correct phishing classes to the sum of the corrected ones with the phishing websites which are misidentified as legitimated.

$$\begin{aligned} \hbox {Recall}=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}} \end{aligned}$$
(11)

where FN is the number of classes that are incorrectly labeled as legitimate webpage.

The F-measure is also defined as a measure of the test accuracy.

$$\begin{aligned} F\hbox {-measure}=2\times \frac{\hbox {Precision}\cdot \hbox {Recall}}{\hbox {Precision}+\hbox {Recall}} \end{aligned}$$
(12)

Tables 3 and 4 list the obtained results for the DT and wrapper methods. It can be seen that the F-measure of the wrapper method is better than of the DT approach.

Table 3 Decision tree outputs
Table 4 Wrapper outputs
Fig. 8
figure 8

Accuracy of DT in different number of features

Fig. 9
figure 9

Best-cost and mean costs of HS

As listed in Tables 3 and 4, the accuracy of the wrapper method is equal to 96. 3% considering 20 selected features. According to the results, by adding or removing one or more items from the features list, the accuracy is decreased. Hence, it can be concluded that feature selection using wrapper method has better performance in comparison with DT.

Figure 8 shows the accuracy of the DT evaluator for different feature numbers. As shown in this figure, the red point is the obtained highest accuracy of the DT, where the feature subset contains 20 members that this case gives a better result than the other subsets.

Fig. 10
figure 10

Details of best-cost and mean-cost

Once the best subset of the features is extracted, the NR based on the modified harmony trained the data to give the special target defined as the prediction of phishing websites with a high accuracy. The accuracy of the two methods, including NR and SVM, is then calculated using Eq. (13). The NR model outputs and a comparison to the SVM performance are illustrated in Table 4.

$$\begin{aligned} \hbox {Accuracy}\, (\hbox {train and test}) =\frac{N_{({\mathrm{predict}}={\mathrm{real}})} }{N_T }\times 100 \end{aligned}$$
(13)

where \(N_{({\mathrm{predict}}={\textit{real}})} \) and \(N_T \) are the number of instances that predicted class label which are equal to the desired class and total number of instance, respectively.

Table 5 Accuracy of HS and SVM algorithms
Table 6 Comparison of different methods with our proposed method

In the HS, the initial parameters can be listed as \(\hbox {HMS}= 30\), \(\hbox {maxItteration}=25000\), \(\hbox {PAR}_{\mathrm{min}}=0.1, \hbox {PAR}_{\mathrm{max}}=0.5\) and \(\hbox {HMCR}=0.995\). The PAR is changed dynamically in each iteration.

Figure 9 shows the best-cost and mean-cost of the harmony search. The best-cost is calculated by the best harmony and the mean-cost is evaluated using the average cost of all harmonies. According to this figure, the best-cost and mean-cost charts are not coincident. This fact is illustrated in Fig. 10 which is a large scale picture of Fig. 9.

The SVM algorithm is implemented in Matlab using the LIBSVM library (Li et al. 2016). LIBSVM is a free library which provides four basic kernels and implements a tool named "Cross-validation and Grid-search" to approximate the appropriate penalty parameters (C). In this library, the svmtrain function is used for training data and the svmpredict is used to predict the accuracy of testing and training data. The dataset is divided into three partition, and two-third of the dataset is used for training and one-third of it is used for testing. The members of the partitions are selected randomly. The results of Table 5 confirm that the NR-based HS algorithm introduces a better accuracy compared to the SVM in both train and test phases.

4.2 Comparative analysis

In order to verify the efficiency of the proposed method, some related researches are investigated for comparison. Mohammad et al. (2014b) have used 17 features which are evaluated with the self-structuring neural network. They have achieved 92.48% test accuracy and 93.45% train accuracy. Hamid and Abawajy (2011) examined 7 features generated using a hybrid-feature scheme as an indicator to specify the best classification method for phishing email detection. They have compared the accuracy of 4 classification methods in 3 datasets. The classification methods include bayes net (BN), decision tree, adaBoost and random forest (RF). As shown in their results, the adaBoost, RF and BN have a better performance in dataset 1, dataset 2 and dataset 3, respectively. Bottazzi et al. (2015) have presented a novel framework for phishing detection in mobile devices. The features used in their research are gathered from URL and HTML source of websites. They have used 4 classification methods for assessment of the accuracy of the framework. As shown in their results, the J48 method conducted a better performance in comparison with other algorithms. Here, the proposed detection method which is based on MHS and SVM is compared with the above-mentioned methods. The dataset which is used in our proposed method is similar to that of used in Mohammad et al. (2012). Table 6 confirms that the proposed method for phishing detection has high degree of efficiency than some of the previous mentioned methods.

5 Conclusion

In this paper, the procedure of phishing websites detection is investigated using feature selection methods and meta-heuristic algorithms. At first, the more efficient features are selected from the available dataset applying the feature selection methods. The mentioned dataset consists of 30 features which 20 of them are selected and used by two phishing detection methods. The detection methods are the modified harmony search based nonlinear regression and SVM. As shown in the results, the meta-heuristic algorithm confirms better accuracy in comparison with heuristic algorithms. Applying the meta-heuristic algorithm in phishing detection methods has not been analyzed yet. The main results of our study are listed below.

The main results of this research is fourfold as follows.

  1. 1.

    The decision tree (DT) is used as an evaluator for different number of features. As a result, the feature subset containing 20 members is better than the others.

  2. 2.

    The DT and wrapper methods are used to select the most important features. The wrapper method presents a better performance in comparison with the DT one. In both approaches, by adding or removing one or more items from the feature list, the accuracy is decreased.

  3. 3.

    Two algorithms (NR-HS and SVM) are employed to detect the phishing websites. It should be said that the nonlinear regression is used as a cost function for HS. This study establishes a comparison between NR-HS and SVM algorithms and as a result, the NR-HS has a greater amount of precision comparing SVM.

  4. 4.

    In the proposed NR-HS approach, the pitch adjustment rate (PAR) and generated new harmony (GNH) parameters are changed in each iteration and these variations prove a better accuracy in comparison to the traditional method’s accuracy with constant PAR and GNH.

In addition, there are several lines of research arising from this work which should be pursued. Firstly, it will be interesting to consider parallel memory for HS to reduce the runtime. Secondly, the reliability analysis (Qiu et al. 2017a) of the proposed HS can be investigated in future work. The third interesting suggestion is working on HS parameters. Parameters (PAR and HMCR) of the proposed method can be calculated intelligently using heuristic and meta-heuristic algorithms.