1 Introduction

The majority of human-related activities in the world can be presented in the form of network and graphs, where the links signify the association between the various entities. The information networks that we currently see are pervasive in real-world networks like the Worldwide Web, protein-protein interaction network, airlines-transport network, author-citation network, real-world SN, and so on.

LP in SN is particularly a challenging problem because of its highly dynamic nature. SNs tend to quickly expand and transform over time due to inclusion and exclusion of nodes and/or edges. In order to forecast lost edges in an existing network and brand-new or fading edges in upcoming networks, expert systems adopt LP approaches would surely generate data and retrieve information. LP algorithms can help identify fake or fraudulent links. However, it is important to note that some links that may appear unexpected or surprising could be mistakenly classified as false links. Removing these links without caution might generate a distorted knowledge of the system’s architecture and behavior. An important question in the original environment may usually be mapped back to the network’s LP in general, and vice versa.

In the beginning, the researchers have studied the network as connectedness (interaction) between node pairs, node pair connectivity as triangle closure and similarity of interaction between node pairs. Later on, the same was treated as closeness of network nodes (CN) and gave rise to “Link Prediction problem”.

A fresh wave of applications for AI has been generated by recent developments in ML, which offer considerable benefits to a number of sectors. Recent successes in AI are mostly the result of current ML advancements that build models using the representations they have within themselves. They consist of SVM, DL, RL, RFs, and PGMs. Some models are challenging to understand despite having good performance. There can often be an accord between ML models performance, such as their expected accuracy, and their level of explainability. It is common for the most effective models, like decision trees, to be less explainable, while the most accurate ones, such as DL [1], may offer higher accuracy but lower interpretability.

The intention of an XAI system is to boost the understandability of its behavior by providing explanations. In order to develop potent and more interpretable AI systems, it is recommended that XAI systems be capable of describing its knowledge, skills, ongoing actions, future plans, and the most relevant information it considers. It is important to note that every explanation, whether comprehensive or incomplete, is contextual and relies on factors such as the task at hand, user expertise, and the expectations of an AI-based system. Therefore, interpretability and explainability are dependent on the specific domain and cannot be universally determined independently of it.

Table 1 contains the list of abbreviations and symbols used in the article along with their description.

Table 1 Table of abbreviations/symbols used in the article

1.1 Motivation and research gaps

LP and XAI both have separately been topics of interest among researchers. However, the usage of XAI in LP was not much observed. Figure 1 shows the number of Google searches on LP and XAI separately since past five years taken from Google Trends. Many emerging LP techniques fail to provide explainability of their results. This key motivation for conducting this research is to establish a novel approach to LP by combining XAI for clear and understandable decision-making in complicated networks.

Fig. 1
figure 1

Graph showing the worldwide Google searches on Link Prediction and Explainable AI in past 5 years taken from Google Trends

Studies carried out in [2,3,4] only discussed LP, similarity measures, ML approaches and challenges whereas [5, 6] discussed taxonomy, summary and research directions in XAI. No existing literatures implemented XAI in LP. This article discusses this major issue faced in existing literatures and provides a method to implement XAI with similarity metrics.

1.2 Contributions

Our previous works [7,8,9,10] lack some key properties in LP that we have contributed in this article. The following key contributions make this work more thorough and in-dept than previous studies:

  • A comprehensive exploration of phases of LP is conducted which also provides a basic idea on various evaluation metrics and their usage in LP.

  • A generic taxonomy stating Similarity Based and Learning Based LP techniques is provided along with their limitations and utilities based year-wise.

  • The evolution of LP methods proportional to the network types from 2013 to 2023 are picturized.

  • The inclusion of XAI and LP is a novel aspect of this survey. A taxonomy of XAI tools for LP is presented together with a case study of its use.

  • The challenges that could arise during the adoption of XAI tools and methodologies for LP are discussed.

The charm of this survey is that it makes it simple for the readers to gain insight into the considerations made for LP and XAI.

1.3 Research methodology

We adopted a basic methodology to conduct the survey as represented in Fig. 2. The steps comprise of selecting prime Scopus database libraries like Wiley, Elsevier, Springer, and Blackwell and then searching the research articles related to LP. The literatures were searched from years 2013 to 2023 using keywords: “Link Prediction”, “Similarity Metrics”, “Machine Learning”, and “Explainable Artificial Intelligence”. After obtaining the search results, we tried to filter the results. The filtration and pre-processing of the literatures was purely title restricted. The literatures consisting of “Link Prediction” in their title was then selected manually. Figure 3 represents a graphical overview of number of papers published on LP Strategies stated above in ScienceDirect database.

Fig. 2
figure 2

Pictorial representation of the research methodology conducted for the proposed study

Fig. 3
figure 3

Percentage of research articles available on ScienceDirect on Link Prediction with Similarity Metrics, Machine Learning and Explainable Artificial Intelligence

After the preprocessing was performed, we studied the literatures and summarized them by providing an exhaustive literature review and their gaps. Further, we discussed the phases of LP and proposed a taxonomy of LP comprising of Similarity based and Learning based methods. At last, we conducted an experimental exercise on the proposed method and obtained its result.

This paper is organized in following sections: Section 2 provides a Literature Summary; Section 3 provides an overview of the phases to solve a LP problem; Section 4 defines Experiments and Results; Section 5 is about Discussion with Limitations and Open Challenges discussed in Section 6 and Section 7 respectively; followed by Conclusion and Future Work in the last i.e., Section 8.

2 Literature review

After studying various literatures on LP and XAI, an exhaustive literature summary is generated (as shown in Table 2). This summary provides the literature summary of various research works performed from a timespan of 2013 to 2023 providing a general idea of LP methods and XAI tools used. XAI makes use of KG as a tool, which further needs much exposure.

Table 2 Exhaustive Tabular Survey of the related literatures studied

3 Phases of link prediction

The major steps performed in LP problem are: Data collection, network representation (optional), LP method application, performance evaluation and/or model explanation are shown in Fig. 4.

Fig. 4
figure 4

Phases of a Link Prediction Problem starting from Data Collection followed by Network Representation then application of LP method and evaluation using ML model or its explainability

3.1 Data collection

Data Collection is primarily performed in two ways: 1) downloading existing datasets from data repositories and libraries like SNAP, Kaggle, Github and others; and 2) dataset construction.

Data collection, data cleaning, and data labelling are the three essential steps in the dataset construction process. Data gathering involves finding datasets for ML model training. There are two main approaches: when there is small dataset for training, Data Generation is used whereas Data Augmentation is another approach to obtain data. The procedure involves adding recently acquired external data to existing datasets. Data production involves: crowdsourcing, a business model that connects huge groups of people online to complete activities; and synthetic data, manufactured by a machine, to increase our training data or add future data updates.

3.2 Network representation (NR)

NR encompasses various techniques for representing networks, each with its unique approach. To graph the network adjacency matrix is often used which utilizes similarity measures. Embedding-based methods represent network node properties or linkages and converts nodes, linkages, and their characteristics into a vector space while preserving graph structure. PGMs offer representation of graph probability distributions to show complex probability connections where nodes represent random variables and edges represent probabilistic linkages between variables. Finally, GNNs (also known as KG) are effective at understanding massive, dynamic graph datasets with billions of elements, especially complex network architectures.

3.3 Link prediction methods

LP methods can be broken down into two primary categories: those that are based on similarities i.e., Similarity Metrics, and those that are based on learning i.e., Learning Based Methods. The first type computes the likelihood of a link existing between two node pairs based on the assigned similarity score. Either the nodes’ attributes or the network’s topology can play a role in calculating the similarity score. There are three distinct types of learning-based approaches. Figure 5(a) depicts the workflow of LP problem adopting similarity metrics whereas Fig. 5(b) adopts learning-based techniques.

Fig. 5
figure 5

(a): A generic flowchart depicting process of solving Link Prediction problem using Similarity Metrics. (b): A generic flowchart depicting process of solving Link Prediction problem using Learning Based approach

3.3.1 Similarity metrics

Similarity-based algorithms first determine the probable strength of a connection between two nodes based on their resemblance, then select the “L” linkages with the most similarities. Network topology calculates the similarity score of two non-connected nodes. Local, global, and quasi-local scores can compare nodes. Local-based scores detect similar node pairings using local information. Global node similarity methods consider network architecture. Global information and computational complexity benefit them. Quasi-local similarities balance these techniques. They need more data than local indices but less time than global ones. Researchers use several similarity metrics to tackle LP problems, including neighbors, dataset similarity/dissimilarity, node closeness, and degree of connectivity. Studies carried out in [3, 4] have discussed various Similarity Metrics adopted widely.

3.3.2 Learning based methods

Learning-based techniques incorporate network architectural and non-architectural information into ML frameworks. This lets the techniques determine the likelihood of an edge between two nearby nodes. LP uses supervised learning methods including SVM, RF, KNN, Naive Bayes, Ensemble Learning, Logistic Regression, Radial Basis Function network, and others. Representation learning strategies can be classified as MF-based, Deep Neural Network-based, or Path and Walk-based based on the models’ loss function and decoder function (graph similarity metrics) [83].

3.4 Performance evaluation

Performance of LP methods are commonly evaluated using popular metrics like Accuracy, Precision, F1 Score, Recall, Receiver Operating Curve (ROC), AUC, HR@k, and MRR. Various authors have used some uncommon metrics for performance evaluation which are stated in Table 3.

Table 3 Some uncommon Evaluation Metrics with their uses in network/usage type and reference

Table 3 is generated based on various literatures studied in Table 2. The data in Table 3 provides a clear idea of the uses of evaluation metrics in terms of network types and/or system type which might help users in the future to gain knowledge about which metric to use in their work, depending upon network/system type. CD is one among the most well-known problems in LP which uses MAE, NMI, ARI and Average COND whereas for a multilayer complex network, TPR, Sensitivity, Specificity and MCC are used. For DR methods such as MF and embedding, RMSE and PCC are used.

3.5 Explainable artificial intelligence

AI that can be explained by a human expert is referred to as XAI. It contrasts with the idea of the “black box” in ML, where even the inventors of the AI cannot explain why the AI made a particular decision. The social right to explanation is implemented by XAI.

Generally, XAI is classified in two categories: 1) Global: It provides a general explanation of the concept and is based on universal operational principles, 2) Local: It provides an explanation of the rules that produce each individual piece of data.

Figure 6 represents XAI techniques applied Locally, Globally, both Globally and Locally, along with the Explainable tools improve LP result interpretation and user comprehension. Explainability strategies in ML are varied. Permutation Importance compares a model trained on the original data to randomly rearranged feature values to determine feature importance. Partial Dependence Plots help discover key features and understand their interactions by showing the relationship between a target variable and input features. Accumulated Local Effects is used for non-linear models with complicated input-output interactions, while Morris Sensitivity Analysis evaluates input feature superiority. Global Interpretation via Recursive Partitioning explains complex model behavior with decision trees. Anchors explain model workings, while Contrastive Explanation Method compares predictions to similar examples to find minimal input changes that affect predictions. Counterfactual Instances verify model stability and accuracy. Model behavior is explained by Integrated Gradients and LIME. Shapley values determine feature influence, Scalable Bayesian Rule Lists provide interpretable if-then rules, and Explainable Boosting Machine makes accurate, feature-selective predictions using Boosting and Generalised Additive Models.

Fig. 6
figure 6

Classification of XAI technologies based on their Local, Global, both Local and Global applicability and commonly used XAI techniques with Link Prediction to interpret the prediction results of LP model

4 Experiments and results

This section explains the experiment conducted for obtaining an Explainable LP. We have concluded our results based on accuracy and ROC curve. The experiment was conducted on a workstation with Intel Core i7 4770, 2.2 GHz GPU, 16 GB memory and Windows 10 pro operating system. The experiment was implemented in Python along with libraries: pandas, numpy, networkx, scikit learn, and seaborn.

4.1 Dataset and evaluation metrics

The dataset chosen for conducting this research is Facebook-Social-Network-Analysis dataset, which is used to predict future friend recommendation and consists of three entities: Node 1, Node 2 and Connection which represents the “from node”, “to node” and Connection type respectively. Connection is a Boolean value: 1 for connected node and 0 for unconnected node. Table 4 provides the general statistics of the dataset. The dataset was taken from https://github.com/abcom-mltutorials/Facebook-Social-Network-Analysis.

Table 4 General statistics of data

In order to evaluate the performance of the method, we opted Accuracy and ROC curve. Accuracy is measured using:

$$Accuracy=\frac{Correct\ Predictions}{Total\ Predictions}$$
(1)

5 Method

The proposed approach incorporates various similarity measures as parameters, utilizes the RF classifier as the ML model, and employs LIME as XAI technique. Figure 7 represents a complete systematic architecture of the proposed approach.

Fig. 7
figure 7

A systematic architecture of the proposed methodology

Firstly, the Facebook data was taken from the Github repository, then the data is preprocessed and similarity scores are computed post which we created a dataframe consisting of similarity scores and nodes which was then utilized for computing correlation and splitting into train-test data. After deciding the features and target, RF classifier was trained using train set and then predictions were made using test set. Lastly the performance was evaluated and results were interpreted using LIME.

6 Results

6.1 Preprocessing and parameters

In order to preprocess the collected data, sorting of the columns was performed to get two nodes as a tuple “edge”. Similarity Metrics CN, AA, RA, JC, and PA were calculated. The first 10 rows of the scores of these are mentioned in Table 5. The data generated in Table 5 is used to Train and Test the classifier (Fig. 8).

Table 5 Computed Similarity Score of first 10 edges (for reference)
Fig. 8
figure 8

Pearson Correlation computed of Similarity scores, nodes and Connection (data in Table 2)

The data in Table 5 has a strong linear association between two continuous variables as computed by:

$$\textrm{Pearson}\ \textrm{Correlation}\ (r)=\frac{\sum \left({x}_i-\overline{x}\right)\left({y}_i-\overline{y}\right)}{\sqrt{\sum {\left({x}_i-\overline{x}\right)}^2\sum {\left({y}_i-\overline{y}\right)}^2}}$$
(2)

New data consists of 5 feature variables (as shown in Table 5): CN, JC, RA, AA, PA and one target variable ‘connectivity’: exist (1) or not exist (0).

Data splitting impacts the performance and generalization of the model. We implemented a random splitting technique by randomly shuffling the preprocessed dataframe and subsequently dividing it into a training set and a test set in the ratio of 67:33. Table 6 provides a sample from dataset that is used for training and testing. RF Classifier model was build using ensemble learning.

Table 6 Sample from created dataframe for training and testing purpose

The computed similarity metrics are treated as primary parameters (features) for feeding input to LIME, no parameters were set empirically. The choice of parameters is completely based on the choice of ML model, XAI tool and type of results to interpret. For conducting this research, similarity metrics are chosen as parameters as they will help in the interpretation of results by supporting the existence or non-existence of links based on the values they generate for a specific node.

7 Experimental analysis

We tested our method via accuracy and ROC curve. Accuracy computed was 0.6678 whereas Fig. 9 shows the plotted ROC curve. Predictions were interpreted using LIME as shown in Fig. 10(a) for data at index 6985. LIME predicts with 91% confidence that the connection does not exist (actual connection not exist as shown in Table 7(row 1, column 2)). Parameters RA and CN increase the probability of not existence. Similarly, Fig. 10(b) and (c) represent LIME results for index values 9864 and 10,256 respectively which predict the inexistence of connection (actual connection also does not exist, shown in Table 7(row 2 and 3, column 2)).

Fig. 9
figure 9

Plotted ROC curve with ROC AUC Score for the obtained predictions

Fig. 10
figure 10

LIME results for (a) value =6985, (b) value = 9864, and (c) value = 10,256

Table 7 Reference for Connection value for LIME Interpreter

8 Discussion

After studying various literatures on LP, it is observed that LP methods are labelled as: Similarity Based and Learning Based methods (shown in Fig. 11). Further learning based methods are classified into: Algorithmic, DR and Probabilistic method.

Fig. 11
figure 11

A generic taxonomy of Link Prediction techniques which involves Similarity Based and Learning Based methods

The algorithmic approach to LP involves employing classification techniques and meta-heuristics. This entails extracting features from network data and utilizing them as inputs for training ML models. By discerning patterns and relationships within the network data, these models strive to predict links between nodes. Concurrently, DR serves as a method to transform larger datasets into more manageable forms, preserving crucial information. Applied to address classification and regression challenges, it aids in obtaining more accurate predictive models for LP. Methods combining DR with LP include MF and embedding-based techniques. Additionally, probabilistic LP utilizes statistical models like ERGM, SBM, or latent space models to estimate the likelihood of node connections. Maximum likelihood-based link prediction assesses the statistical model’s parameters for their chance to enhance observed data, encompassing network structures and other attributes.

Figure 12 shows the evolution of Similarity based (left branch) and ML (right branch) based LP approaches used from year 2013 to 2023 with their limitations and utilities. This figure helps novice to select and integrate Similarity based and ML based approach on the basis of their complementary features for increasing the effectiveness of LP methods.

Fig. 12
figure 12

Major Limitations and utilities of Link Prediction observed from year 2013 to 2023

Networks belong to various categories depending upon their structure (multi-layer, multigraph, simple, complex, bipartite), nature (heterogeneity, homogeneity), attributes (node attributes), and direction (unidirectional, bidirectional, no direction). Due to difference in structure, nature, attributes, direction of graphs, one type of LP method is not applicable to all graphs. Figure 13 shows year-wise the category of graph and LP methods applied on them in chronological order. With the help of this figure, it can be identified which type of graph and the LP method is deployed from the year 2013 to 2023.

Fig. 13
figure 13

Evolution of LP methodology with reference to network type from year 2013 to 2023

The application of XAI to LP is our main innovation. XAI reduces the cost of mistakes, finds their causes, and improves model efficiency by characterizing errors and decreasing biased predictions. The specific requirements for implementing XAI in Python can vary depending on the techniques and libraries. The minimum requirements include Python ML libraries like scikit-learn, TensorFlow, or PyTorch; ML models; Interpretability Libraries like SHAP, LIME, or InterpretML; preprocessed data; and the right XAI approach. Final steps include Documentation and Visualization.

With the complexity of AI technology, algorithms are hard to grasp and analyze. Researchers can create and improve methodologies. The need to minimize the model in many XAI algorithms makes performance prediction challenging. For more complex models, current explainability methods may not account for all factors that influence a choice, limiting their usefulness. Creating ethically sound and well explained XAI algorithms is the goal of new research.

The proposed approach preserves some properties that are its effectiveness, robustness, interpretability, and user-friendliness. It excels in effectiveness by making predictions on large dataset, the method is simple and fosters a seamless interaction between the method and its users. Its innovation lies in the incorporation of XAI in LP domain.

Our approach incorporates a diverse array of similarity measures, ensuring its adaptability to various graph types and guaranteeing its robustness across diverse datasets.

9 Limitations

While studying available literatures, some limitations were found which are specified as:

  • Searching of literatures with keywords provided some irrelevant literatures which required manual filtering and excessive time consumption.

  • Various articles did not explicitly mention the ML models used by them. Wherever the models were not specified, we did not mention the model’s name in our literature survey (Table 2).

  • Many researchers did not specify the drawbacks of their methods clearly. Mostly the drawbacks were drawn on the basis of results only.

  • Wherever the factors like node attributes, weights, network properties might be used, they were not utilized. We did not mention them explicitly in our article.

10 Open challenges for research

10.1 Challenges in LP

Scalability, complexity and computational expenses are a few problems faced by LP that have been quoted by other authors. However, some problems continue to go unreported:

  • Dynamicity: Different types of network dynamism exist, including nodes and edges being added and deleted at next timestamp. LP only handles one or two types of dynamicity; no existing technique covers all dynamicity.

  • Generalization of network: Each network has unique nodes and linkages; thus, they should be structured accordingly. Currently, there is no comprehensive and universal LP solution available for networks.

  • Timestamp missing: The dataset lacks timestamps for network-wide link or node formation for time period ‘t’. In such a network, separating training and testing datasets is difficult. Because some linked node pairs may be randomly assigned to the training set and others to the testing set. In this scenario, CN-based methods are unreliable.

  • Imbalance in dataset: The SN dataset includes mostly bad and some outstanding class. Unsupervised learning algorithms are indifferent to class distributions; therefore, they cannot balance data and focus on class boundaries. This problem can be solved with ensemble methods and data sampling.

10.2 Challenges in XAI

  • Blackbox resemblance: Experts have trouble understanding many ML algorithms’ decisions. Black box solutions for incomprehensible judgements may cause legal, ethical, and operational difficulties. Before implementation, black-box machines cannot be checked or audited, making behaviour assurances problematic. Why or how to rectify a ML system’s bad judgement is difficult.

  • Biasness: Keeping an AI programme from learning biases or unfair perspectives is difficult. Possible gaps in the training data, model, and objective function cause this challenge. For ethical and fair AI systems, these biases must be addressed and mitigated.

  • Fairness of results: XAI struggles to assess AI system fairness. This difficulty occurs because fairness perceptions vary depending on context and ML algorithm input.

  • Safety issues: AI reliability is assessed by examining its decision-making process. The fundamental generalisation in statistical learning theory requires organisations to make assumptions from unseen data to fill gaps, making this task difficult.

11 Conclusion and future work

This paper offers an exhaustive literature review on LP problem and XAI, accompanied by a thorough analysis and understanding of LP, its distinct phases, and the problem-solving techniques employed. The prime objective of this study is to establish a generalized concept of XAI and explore its applicability in LP. Among the myriad XAI tools and methods available, the experimental exercise focuses on LIME as it sheds light on the interpretation of link existence or absence between pairs of nodes. The experimental exercise conducted on Facebook, a real-world SN, demonstrates the potential for significant accuracy improvements using various similarity measures and interpretation of results using LIME.

As of our future perspective we will figure out more emerging techniques based on ML, DL, and ANN based LP methodologies. We plan to extend our study by incorporating various datasets to broaden the scope of our analysis. Additionally, we aim to enhance our method’s effectiveness by considering node attributes and conducting comparisons with existing methods for a more comprehensive evaluation.