Introduction

Nowadays, the current machinery and the ease of communication have made fraudsters and criminals vulnerable to various attacks. It costs billions of dollars worldwide every year. Despite their efforts, several techniques can be realized to detect and investigate fraudulent websites due to the adversative effects of fraudulent websites. However, these methods have limited functionality, and keeping up with the growth and divergence of fraud sites is challenging. Fraudulent websites often masquerade as legitimate online data sources, goods, properties, and facilities [1].

Phishing works as a method to steal sensitive information from users, and phishing sites can be used to lure users away from the site. Attackers use these to gain access to an online service's website and steal sensitive information to earn money and reputation. Phishing works by impersonating website pages and tricking online users into providing confidential information. The term victim phishing comes from “fishing” for complex data. Phishing is one of the most potent and destructive attacks to deceive users. Additionally, sensitive activities such as passwords and credit card material can be used to compromise peculiar information [2, 3].

Figure 1 represents the basic structure of website fraud detection. These include the basic types of phishing techniques and types of attacks. Later, features of the URLs can be used to classify them as legitimate or phishing. Next, the fraudulent website can be detected using the ML framework.

Fig. 1
figure 1

Basic Architecture of Website Phishing Detection

A recent Anti-Phishing Working Group (APWG) report showed that APWG members detected 250,000 phishing attacks between 2015 and 2016 using 195,475 domains. Currently, phishing detection methods are divided into three main groups based on the visual comparison of web pages: blacklisting and whitelisting practices, URL-based systems, and attribute-based web content. But phishing concerns caused by spyware and email scams have led to non-profit industry groups working to combat impersonation and fraud. Phishing is a severe problem due to its widespread disruption to target industries such as payments, financial institutions, and email. However, phishing crimes are estimated to cost the US economy between $61 million and $3 billion in direct economic losses annually [4, 5].

This section proposes an integrated ML framework for fraudulent website detection. ANN, SVM, RF, and K-NN work as algorithms to accurately detect phishing websites. Also, they can be used to classify specific URLs as legitimate or phishing. Data from publicly available phishing websites can be collected from the UC Irvine ML repository for training and testing.

Literature Survey

The author investigates malicious user channel gain feedback falsification (CGFF) attacks using non-orthogonal multiple access (NOMA) approaches to generalize spectral performance and define new malicious threats. However, even a tiny amount of damage to the receiving channel significantly reduces the performance of NOMA, and detecting malicious users is an essential task in NOMA [6]. Long short-term memory (LSTM) method is proposed to obtain information by detecting malware based on attack detection. In addition, two attack models should be considered: correlated and noncorrelated. Often, when an attack occurs, one of the platoon members may use attack patterns to attack the platoon. However, the potential for malicious attacks on the Cooperative Adaptive Cruise Control (CACC) scheme to disrupt driving comfort, traffic flow, and fuel economy benefits is high [7].

A technique can be proposed to guarantee low power consumption and detect malicious attacks in a typical functional provider wake-up radio (WUR) mode. Later, these also defined operational procedures for responding to malicious attacks. However, malicious attacks trick the WUR receiver into accidentally activating it, such as waking up the main radio and putting it into sleep mode [8]. A malicious mobile can design and implement KAYO to differentiate malicious actions from web pages. Using KAYO, multiple iframes up to known invalid phone numbers can be resolved based on static page attributes. However, mobile web pages considerably change from desktop web pages' content, layout, and functionality [9].

The author presents phishing attacks involving stealing user data and downloading and installing malicious software. Similarly, attackers can create phishing emails that appear legitimate users but are challenging to detect. Attackers use social media sites and emails to trick users into sending false information. It takes place as part of a social engineering attack [10]. An approach such as K-nearest neighbor (kNN) and location-based service (LBS) is introduced to crawl all website items through an LBS interface efficiently. Additionally, crawling algorithms can be developed for two-dimensional and high-dimensional spaces. Overhead is defined by theoretical analysis as a function of the algorithm dimension and the number of objects crawled [11].

In the novel approach, various tests can be carried out in the detection mode using classification algorithms to verify the recommended convolutional neural network (CNN) performance and deep neural network (DNN) type. Although many techniques can be presented to detect malicious websites, it is challenging to achieve satisfactory results in a proven manner [12]. The proposed machine learning (ML) algorithms can be used to detect malicious websites and even get personal information to help malicious websites become available websites. These algorithms detect conflicting information hidden in high traffic volumes [13]. MalJPEG is proposed to provide the first ML-based solution tuned to detect unknown malicious JPEG images transparently and efficiently. MalJPEG can exploit this technique to systematically extract recognizable features from the JPEG file system with ten simple methods and identify benign and malicious JPEG images. However, some ideas contain malicious payloads when performing malicious operations [14].

A proposed artificial intelligence (AI)-based meta-learner can be installed on a dataset of phishing websites to define a performance evaluation. The given model can achieve high detection accuracy with a false positive ratio of less than 0.028. However, the consequences of phishing attacks are often dangerous and devastating problems [15]. Phishing detection programs, in particular, can provide software-based programs for systematic review. Reputational datasets, detection capabilities, detection techniques, and indicators can be learned through the taxonomy of phishing detection [16]. The multiobjective evolution/random forest (MOE/RF) approach offers a new phishing attack detection typically based on a revised objective MOE optimization algorithm. The MOE/RF model is designed to accurately detect and reduce phishing sites with a high probability of false positives [17].

A featureless method can be introduced by proposing normalized compression distance (NCD) to detect phishing websites. NCD is used to combine two websites to assess similarity and eliminate the need for feature extraction. A parameter-free similarity measure, in particular, removes dependencies between website feature sets [18]. They proposed that it could be implemented as a browser plug-in to detect phishing websites using a deep learning (DL)-based environment. It can detect in real-time if a user is at risk of phishing while viewing a web page and notify a warning message. However, stolen personal information, legitimate websites, and the rupture of trust in financial institutions are beyond illegal gain [19]. A multilayer stacking ensemble learning method with estimators in different layers is proposed to feed the current layer’s estimator predictions to the subsequent layers. The models were sequenced and evaluated using the UCI (D1), Mendeley 2018 (D2), and Mendeley 2020 (D3, D4) datasets. However, phishing uses fake or impersonates legitimate websites to trick online users into revealing delicate data [20].

A novel approach using particle swarm optimization (PSO) is proposed to effectively measure different website features and increase the revealing accuracy of phishing websites. Phishing web detection can be improved by introducing a weighted PSO function for phishing detection [21]. A design-based neuro-fuzzy framework (Fi-NFN) can provide similar resource location and network traffic capabilities in phishing websites. Based on fog computing (FC), a new approach developed an anti-phishing model that Cisco recommends to track and protect. However, fog users from phishing attacks are expensive generic hardware routines that work against different attacks [22]. Overfitting neural networks (OFS-NN) can propose effective phishing website exposure models based on OFS methods and NNs. However, NN models have many useless, low-impact features in the training data set, causing overfitting problems [23].

The novel uses ML and DL techniques to analyze URLs to compare how to detect phishing websites. Most modern solutions that handle phishing detection can offer a canonical class homepage without a login form. Additionally, the base model can be trained on the old dataset and tested with the new URL using datasets from different years [24]. They simplify the feature extraction process by considering URLs and domain names and reduce processing overhead by parsing HTML, DOM, and URL-based features. Among them, 12,134 non-phishing data and 20,614 phishing data could be coded according to 11 predefined attributes [25]. A proposed dataset of 11,000 websites can be combined into a phishing URL-based dataset in vector format pulled from a dataset repository accessible by Phishing and legitimate URL attributes. After pre-processing, multiple ML algorithms block phishing URLs and protect users. Various studies have highlighted research on phishing attack’s prevention, detection, and awareness. However, there must be a perfect and adequate solution to the present problem [26].

The proposed two methods can be offered based on generative adversarial networks (GANs). In addition, these methods can integrate phishing and legitimate models to inform real-world websites. Synthetic data can be generated from 10 publicly available phishing datasets used by adversarial autoencoder (AAE) and Wasserstein GAN (WGAN) to obtain information about real-world datasets [27]. Risky implements the domains classifier based on the risky websites (DOCRIW) structure and is based on two essential techniques that help identify domains that contain potentially fraudulent or malicious content. The first statement is that the pre-constructed knowledge base has information on risky websites. The second statement is that the system could be supplemented with a labelable binary classifier to classify a website as malicious or non-malicious based on its domain [28].

The paper claims that ML techniques can be applied to URL patterns, and a new linguistic URL classification approach can be proposed. Additionally, a system based on language processing natural abilities, word vector representations, and n-gram models of black-labelled words used as salient features can be introduced. However, it is ineffective against unknown attacks, most of the episodes are launched from malicious URLs, and attackers trick users into clicking on malicious URLs [29]. A detailed analysis of malicious URL detection techniques and a structural understanding of ML can be detected. Enabling proper malicious URL detection is an ML approach. In addition, literature studies addressing different aspects of the problem can be reviewed to categorize contributions (functional representation, algorithm design, etc.). However, deniers could be better, and detecting newly created malicious URLs is a complex process [30].

Phishing Approach Detection

Phishing can be considered a social engineering technique in this category. Although it looks legitimate, it tricks users into clicking on malicious links that contain malware. Some criminals use this technique to obtain sensitive information, such as credit card numbers and login credentials. Based on these, phishing techniques and phishing types can be well defined (Tables 1, 2).

Table 1 Assessment study of literature
Table 2 Category of Model based on contribution and disadvantage
  1. A.

    Types of phishing attacks

Figure 2 shows that phishing attacks mainly aim to trick the target into revealing personal information. However, different types of attacks take place among these. Phishing attacks are one of the main ones to watch out for.

Fig. 2
figure 2

Architecture types of phishing attack

Table 3 shows that spoofing phishing attacks are undesirable to appear more legitimate. For example, attackers can spoof phone numbers or email domains to make them appear more credible.

Table 3 Analysis of the types of phishing attack
  1. 2

    Phishing website detection approaches

This section introduces several anti-phishing methods that can detect and prevent phishing attacks. Phishing attacks are then divided into five groups based on different techniques.

Figure 3 shows that phishing website detection techniques analysis. It is classified as the five phases of approaches in this group.

Fig. 3
figure 3

Analysis of the website phishing detection techniques

Table 4 illustrates that phishing website detection techniques can be developed using signatures from phishing websites. Then this approach also helps to distinguish between phishing and simple websites. Further, internet detection techniques are divided into five categories.

Table 4 Primary conduct Phishing detection technique approach

Table 5 describes the phishing model for detecting Internet fraud and the data set used to understand the pattern of the data set model. Identification can also be done using a systematic sampling of the dataset.

Table 5 Performance comparison with Phishing dataset

Website Fraudulent Detection for Machine Learning (ML) Approaches

In this section, we identify several algorithms, SVM, ANN, RF, and K-NN, to accurately detect phishing websites and then describe some of these methods. It can activate URLs and classify them as legitimate or Phishing. A dataset of phishing sites retrieved from the UCI ML repository can be used for training and testing, improving the capabilities of the dataset to predict outcomes.

In this section, we initially collect a dataset from UCI (ML) website fraud detection to investigate the performance of website fraud detection described in Fig. 4. Then, several algorithms for SVM, ANN, RF and K-NN can be used to accurately detect phishing websites using the ML approach and describe some of these methods.

Fig. 4
figure 4

An Overview of Proposed Model in Website Fraudulent Detection

UCI (ML) Dataset

This section initially collects datasets from the UCI ML repository. Phishing is then seen as identity theft when a malicious website impersonates a legitimate website to obtain sensitive information such as passwords, account details, credit card numbers and more. Potential phishing sites can be identified by distinguishing legitimate sites. This dataset can identify critical features of phishing websites and learn website fraud detection by using this dataset to identify ten such features.

Artificial Neural Networks

In this sense, an ANN acts as a series of neurons of interconnected nodes and can be inspired by biological neural networks. Each neuron receives input from subsequent layers and exploits the behavioral transfer function to calculate weights and nonlinear output. Neuron weights can be randomly set at the beginning of training and gradually adjusted using gradient descent to provide an optimal solution. Different layers can be manipulated to change the information they contain. The power of NNs works due to the linear nature of hidden nodes. Thus, introducing nonlinearity into the network is essential to learn functions. Optimal inputs for the classifier can be identified using URL vectors. Then, the primary tasks of forward and backward propagation of the classifier can be handled.

  1. a.

    Forward propagation

The phishing dataset can be fed into ANN and selected with the best features. The input data set can be allocated into training and test sets. Among them, the training set of phishing websites can be implemented, and the optimal structure of ANN can be obtained. Choose a test suite and see the overall performance of the phishing website detection model.

The activation function detects contributions from the input layer. It computes the hidden layer neuron unit output matrix, the number of hidden layer neurons with randomly created weights and offsets. Let's assume y is the detecting result, G is hidden layer, z is weight, m is no of input points, d is activation function, and a is output level of the matrix.

$${\mathrm{G}}_{\mathrm{sum}}=\left[\begin{array}{c}\mathrm{d}\left({\mathrm{z}}_{1}*{\mathrm{y}}_{1}+{\mathrm{b}}_{1}\right)\dots \\ \vdots \\ \mathrm{d}\left({\mathrm{z}}_{1}*{\mathrm{y}}_{\mathrm{m}}+{\mathrm{b}}_{1}\right)\dots \end{array}\begin{array}{c}\mathrm{d}\left({\mathrm{z}}_{1}*{\mathrm{y}}_{1}+{\mathrm{a}}_{\mathrm{G}}\right)\\ \vdots \\ \mathrm{d}\left({\mathrm{z}}_{1}*{\mathrm{y}}_{\mathrm{m}}+{\mathrm{a}}_{\mathrm{G}}\right)\end{array}\right]$$
(1)

The output layer of an ANN is referred to as the randomly generated weight vector. The number of neuron units calculated by an activation function is the entire quantity of input data points, where P is output, i is neuron units in the hidden layer, weight, m is no of input points, f is activation function, \({a}_{i}\) is randomly no of neuron units, and \({\beta }_{i}\) is connecting the importance of the hidden layer.

$${\mathrm{P}}_{\mathrm{i}}=\sum_{\mathrm{i}=1}^{\mathrm{G}}{\upbeta }_{\mathrm{i}}\left({\mathrm{z}}_{\mathrm{i}}*{\mathrm{y}}_{\mathrm{i}}+{\mathrm{a}}_{\mathrm{i}}\right)\mathrm{ i}=\mathrm{1,2},3,\dots ..\,\mathrm{m}$$
(2)
  1. b.

    Backward propagation

The ANN outputs the identified data arguments in the training set in this section. If the ANN makes the correct prediction for that data point, the ANN will remain unchanged. Then calculate the mean square error of the data points of the ANN.

Equation 3 is the neural network outputs that can be calculated with the data points. Let's assume the \({\widehat{c}}_{i}^{l}\) is output neural network, i is hidden layer, d is function, and l is data points.

$${\widehat{\mathrm{c}}}_{\mathrm{i}}^{\mathrm{l}}=\mathrm{d}\left({\upbeta }_{\mathrm{i}}-{\mathrm{a}}_{\mathrm{i}}\right)$$
(3)

Equation 4 is the mean square error of the data argument. Let's assume, F is error and l is data points.

$${\mathrm{F}}_{\mathrm{l}}= \frac{1}{2}\sum_{\mathrm{i}=1}^{1}{\left({\widehat{\mathrm{c}}}_{\mathrm{i}}^{\mathrm{l}}-{\widehat{\mathrm{c}}}_{\mathrm{i}}^{\mathrm{l}}\right)}^{2}$$
(4)

Equations 5 and 6 update the data points' weights for accurate predictions, where i is hidden layer, z is weight vector, \(\gamma\) is rate, and \({c}_{1}-{\widehat{c}}_{l}\) is ANN that does not change.

$${\mathrm{z}}_{\mathrm{i}}\leftarrow {\mathrm{z}}_{\mathrm{i}}+\Delta {\mathrm{z}}_{\mathrm{i}}$$
(5)
$$\Delta {z}_{i}=\gamma \left({c}_{1}-{\widehat{c}}_{l}\right)$$
(6)

In this category, the primary functions of forward and backward propagation of the classifier can be handled. However, when calculating the activation function, we identify the output matrix of a unit as the contributions in the input layer and the neuron in the hidden layer. The output layer of an ANN is referred to as the randomly generated weight vector, and the data points for accurate prediction are updated.

Support Vector Machine (SVM)

In this segment, linear and nonlinear data can be classified using SVM. Additionally, given the original training data, a nonlinear graph can transform the algorithm into higher dimensions. In this dimension, optimal linear hyperplanes are used to separate any two types of data, and then, SVMs can be used for classification and numerical prediction. A simple form of SVM is a complex binary classifier in which the classes are linearly separated. Also, the data can be transformed into higher dimensions using an appropriate kernel function to implement a linear discriminant process. Segmentation is not possible using kernels, and the goal is to reduce the error rate of SVM.

Equation 7 is a parameter consisting of input vectors of input features that can be calculated to determine the size and Model’s bias-variance trade-off. Q is minimum, z is weight vector, e is parameter, j is class, a is scalar quantity,\({\xi }_{j}\) is positive slack variable, and m is no of the input vector.

$${}_{{z,a,\xi }}^{{\,\,\,\,\,\,\,\,\,Q}} \tfrac{1}{2}\parallel z\parallel ^{2} + e~\mathop \sum \limits_{{j = 1}}^{m} \xi _{j}$$
(7)

In this Eq. 8, derive the Lagrangian equation in its dual problem and calculate using Karush–Kuhn–Tucker conditions by substituting values, where o is maximum, \(\mathrm{\alpha }\) is Lagrangian vector, b and c is class, l is kernel function, \({d}_{b},{d}_{j}\) is no of feature mapping, and d is higher dimension.

$${}_{\alpha }^{o} z~\left( \alpha \right) = \mathop \sum \limits_{{{\text{b}} = 1}}^{{\text{m}}} \alpha _{{\text{i}}} - ~\frac{1}{2}\mathop \sum \limits_{{{\text{b}} = 1}}^{{\text{m}}} \mathop \sum \limits_{{{\text{c}} = 1}}^{{\text{m}}} {\text{y}}_{{\text{b}}} {\text{y}}_{{\text{c}}} \alpha _{{\text{b}}} \alpha _{{{\text{c~}}}} {\text{l}}\left( {d_{{\text{b}}} ,d_{{\text{j}}} } \right)$$
(8)

In Eq. 9, calculate the equivalently expressed dual in vectors. Let us assume the S is function, f is equivalently double vector, v represents the dual form vector, and Q is min.

$${}_{{\alpha }}^{{Q}} \tfrac{1}{2}\,\alpha ^{{\text{S}}} {\text{v}}_{\alpha } - {\text{f}}_{\alpha }^{{\text{S}}}$$
(9)

This indicates that partitioning is not possible in the kernel. This includes trade-offs between input feature vector size and model-dependent variance. These values can then be interpolated to achieve a vector form of equivalent representation in dual vectors for later calculation.

Random Forest (RF)

In this section, packing can be combined with random attribute selection to generate RF. These are simple decision trees adding inputs or checks at the top and ending with smaller subsets of the tree. RF follows an ensemble learning approach and can use strategies derived from these to improve performance. A clustering mechanism combines random subsets of different trees into RF. The accuracy of RF depends on the degree of dependence between classifiers and the strength of individual classifiers. RF does not need cross-validation or a separate test set to acquire a fair estimate of test set error.

Calculate the standard error in RF accuracy using Eq. 10. Let's assume the J is test set, E is error, a and b is the average number of votes, x is predictor vector, y is classification, and mh is margin function.

$$\mathrm{JE}*={\mathrm{P}}_{\mathrm{AB}}\left(\mathrm{mh}(\mathrm{AB}\right))<0$$
(10)

In Eq. 11, the margin function measures how much the average vote for the appropriate class is greater than the average calculated for the other categories, where Q is max, avk is average value, h is tree structure classifier, and k is sensitive parameter,

$$mh\left(A,B\right)=avk I\left({h}_{k}\left(A\right)=Y\right)-{Q}_{j}\ne b$$
(11)

In Eq. 12, the expected value of the edge function gives the RF intensity calculation. Let's assume, R is strength and E is error.

$$R={E}_{A,B}\left(\mathrm{mh}(\mathrm{AB}\right))$$
(12)

In Eq. 13, calculate the generalization error of the constrained ensemble classifier as a function of the average correlation between the base classifiers and their average strength, where JE is test set error and \(the \rho\) is mean value of correlation.

$${JE}^{*}\le \rho \left(1-{R}^{2}\right)/ {R}^{2}$$
(13)

In this section, a reasonable estimate of the test set error can be obtained using the degree of inter-classifier dependence and the strength of individual classifiers to determine the accuracy of the RF. Furthermore, ensemble classification can be constrained as a function of the average correlation between the base classifiers and their average strength to account for the generalization error.

K-nearest Neighbor (K-NN)

In this category, distance-based contrasts assign equal weight to each attribute, which can lead to noise or irrelevant data errors. However, editing and pruning techniques can be used to solve the problems of wasted data tuples and noisy data tuples, respectively. Each tuple can experimentally determine the optimal number of neighbors for a point in n-dimensional space.

Minkowski, Manhattan, and Euclidean distance functions can be used in slow classifiers because the entire training dataset must be optimized for classification. Three mathematical expressions of the algorithm can be found under Eqs. 14, 15 and 16. Let’s assume k-nearest neighbor, u is value, and \({c}_{\mathrm{u}}\, \mathrm{and}\, {d}_{\mathrm{u}}\) is attribute variable.

$$\sqrt{\sum_{\mathrm{u}=1}^{\mathrm{K}}{\left({c}_{\mathrm{u}}-{d}_{\mathrm{u}}\right)}^{2}}$$
(14)
$$\sqrt{\sum_{\mathrm{u}=1}^{\mathrm{K}}{\left|{c}_{\mathrm{u}}-{d}_{\mathrm{u}}\right|}^{2}}$$
(15)
$$\sqrt{\left[\sum_{\mathrm{u}=1}^{\mathrm{K}}{(\left|{c}_{\mathrm{u}}-{d}_{u}\right|)}^{2}\right]}$$
(16)

In this section, distance functions can be used in slow classifiers since the entire training dataset can be implemented in three mathematical expressions of the algorithm to be optimized for classification.

Result and Discussion

This section evaluates the model's performance for detecting phishing website datasets published from the UCI ML repository. The ML methods techniques can test and define each method's precision, F-measure, sensitivity, specificity, precision, and recall. Different data processing techniques can be used for tenfold cross-validation classification; TP, FP, TN and FN each have multiple bits.

Table 6 shows that various ML techniques, namely ANN, k-NN, SVM and RF, are used as classifiers for phishing detection, and the results are presented. Each method can be tested and defined using its TP, FP, TN and FN values.

Table 6 Evaluation of matrix

Table 7 demonstrates that confusion matrices are used to estimate the efficacy of ML methods for detecting website fraud. Valid and predicted values and contributions can be compared with a defined confusion matrix with phishing detection in percentage.

Table 7 A performance evaluation of phishing website detection

Sensitivity

In this category, Fig. 5 shows that ANN and SVM techniques have lower accuracy when compared to sensitivity analysis. Comparing these two approaches, the RF method achieves a higher accuracy of 79%.

Fig. 5
figure 5

Analysis of Performance in Sensitivity

Specificity

Figure 6 illustrates that, compared to specificity analysis, ANN and SVM methods are 65% and 69% less accurate, respectively. The RF method achieves a higher accuracy of 81% compared to the two performance methods in specificity.

Fig. 6
figure 6

Analysis of Performance in Specificity

Accuracy

In this section, Fig. 7 demonstrates that the RF method achieves 93% higher accuracy compared to the two performance methods in terms of accuracy. Compared to the precision methods, the ANN and SVM analysis methods obtained 71% and 81% lower accuracy, respectively.

Fig. 7
figure 7

Analysis of Accuracy Performance

In this category, in the precision and recall model shown in Table 8, comparing the two methods such as PSO and CBA, their accuracy has risen to 92 and 95.8%, and their number has reached the highest accuracy of 98.99% when dealing with another URL model F-measure method.

Table 8 Comparison of precision, recall, and F-measure model

Conclusion

In this section, the phishing techniques behind the classification work to automatically classify fraudulent website detection into predefined class values based on certain features and class variables. Phishing sites can be detected by relying on ML-based phishing techniques to gather information to help organize websites. Nevertheless, the damage can be mitigated by developing embattled anti-phishing programs and technologies and refining the public on spotting and identifying fraudulent phishing websites. Also, they include precision and F1 measures, sensitivity, specificity, accuracy, and recall that can be improved using algorithms. In this regard, their assessment achieved 91% accuracy in sensitivity and specificity. The precision and recall models outperformed PSO and CBA at 92% and 95.8% accuracy. This number was higher at 98.99% when dealing with the F-Measurement method for another URL model. In addition, research can be extended to generate more expansive network results and protect individual privacy.