Exploring the scope of explainable artificial intelligence in link prediction problem-an experimental study

Dwivedi, Mridula; Pandey, Babita; Saxena, Vipin

doi:10.1007/s11042-024-18287-9

Exploring the scope of explainable artificial intelligence in link prediction problem-an experimental study

1236: Explainable Artificial Intelligence Solutions for In-the-wild Human Behavior Analysis
Published: 25 January 2024

(2024)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Multimedia Tools and Applications Aims and scope Submit manuscript

Exploring the scope of explainable artificial intelligence in link prediction problem-an experimental study

Download PDF

314 Accesses
1 Citation
Explore all metrics

Abstract

The realm of SN has witnessed remarkable developments, capturing the attention of researchers who seek to process and analyze user data in order to extract meaningful insights for future predictions and recommendations. Among the challenging problems in SN analysis is LP, which leverages available data and network knowledge, including node characteristics and connecting edges, to forecast potential associations in the near future. LP is used in data mining, commercial and e-commerce recommendation systems, and expert systems. This research presents a thorough LP taxonomy, including Similarity Metrics and Learning-based approaches, and their recent expansion in numerous network environments. This article also discusses XAI, a method that helps people understand and trust ML systems. LP taxonomy based on XAI is also proposed. The research also examines LIME, a popular XAI approach that illuminates ML and DL models. LIME provides model-independent local explanations for regression and classification tasks on structured and unstructured data. The study includes an extensive experimental evaluation of incorporating XAI with LP, which shows the XAI approach’s ability to solve LP problems and interpret predictions. This research uses XAI to give users practical insights and a better knowledge of the LP problem.

Link Prediction by Network Analysis

Systematic Biases in Link Prediction: Comparing Heuristic and Graph Embedding Based Methods

Link Prediction via Factorization Machines

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The majority of human-related activities in the world can be presented in the form of network and graphs, where the links signify the association between the various entities. The information networks that we currently see are pervasive in real-world networks like the Worldwide Web, protein-protein interaction network, airlines-transport network, author-citation network, real-world SN, and so on.

LP in SN is particularly a challenging problem because of its highly dynamic nature. SNs tend to quickly expand and transform over time due to inclusion and exclusion of nodes and/or edges. In order to forecast lost edges in an existing network and brand-new or fading edges in upcoming networks, expert systems adopt LP approaches would surely generate data and retrieve information. LP algorithms can help identify fake or fraudulent links. However, it is important to note that some links that may appear unexpected or surprising could be mistakenly classified as false links. Removing these links without caution might generate a distorted knowledge of the system’s architecture and behavior. An important question in the original environment may usually be mapped back to the network’s LP in general, and vice versa.

In the beginning, the researchers have studied the network as connectedness (interaction) between node pairs, node pair connectivity as triangle closure and similarity of interaction between node pairs. Later on, the same was treated as closeness of network nodes (CN) and gave rise to “Link Prediction problem”.

A fresh wave of applications for AI has been generated by recent developments in ML, which offer considerable benefits to a number of sectors. Recent successes in AI are mostly the result of current ML advancements that build models using the representations they have within themselves. They consist of SVM, DL, RL, RFs, and PGMs. Some models are challenging to understand despite having good performance. There can often be an accord between ML models performance, such as their expected accuracy, and their level of explainability. It is common for the most effective models, like decision trees, to be less explainable, while the most accurate ones, such as DL [1], may offer higher accuracy but lower interpretability.

The intention of an XAI system is to boost the understandability of its behavior by providing explanations. In order to develop potent and more interpretable AI systems, it is recommended that XAI systems be capable of describing its knowledge, skills, ongoing actions, future plans, and the most relevant information it considers. It is important to note that every explanation, whether comprehensive or incomplete, is contextual and relies on factors such as the task at hand, user expertise, and the expectations of an AI-based system. Therefore, interpretability and explainability are dependent on the specific domain and cannot be universally determined independently of it.

Table 1 contains the list of abbreviations and symbols used in the article along with their description.

Table 1 Table of abbreviations/symbols used in the article

Full size table

1.1 Motivation and research gaps

LP and XAI both have separately been topics of interest among researchers. However, the usage of XAI in LP was not much observed. Figure 1 shows the number of Google searches on LP and XAI separately since past five years taken from Google Trends. Many emerging LP techniques fail to provide explainability of their results. This key motivation for conducting this research is to establish a novel approach to LP by combining XAI for clear and understandable decision-making in complicated networks.

Studies carried out in [2,3,4] only discussed LP, similarity measures, ML approaches and challenges whereas [5, 6] discussed taxonomy, summary and research directions in XAI. No existing literatures implemented XAI in LP. This article discusses this major issue faced in existing literatures and provides a method to implement XAI with similarity metrics.

1.2 Contributions

Our previous works [7,8,9,10] lack some key properties in LP that we have contributed in this article. The following key contributions make this work more thorough and in-dept than previous studies:

A comprehensive exploration of phases of LP is conducted which also provides a basic idea on various evaluation metrics and their usage in LP.
A generic taxonomy stating Similarity Based and Learning Based LP techniques is provided along with their limitations and utilities based year-wise.
The evolution of LP methods proportional to the network types from 2013 to 2023 are picturized.
The inclusion of XAI and LP is a novel aspect of this survey. A taxonomy of XAI tools for LP is presented together with a case study of its use.
The challenges that could arise during the adoption of XAI tools and methodologies for LP are discussed.

The charm of this survey is that it makes it simple for the readers to gain insight into the considerations made for LP and XAI.

1.3 Research methodology

We adopted a basic methodology to conduct the survey as represented in Fig. 2. The steps comprise of selecting prime Scopus database libraries like Wiley, Elsevier, Springer, and Blackwell and then searching the research articles related to LP. The literatures were searched from years 2013 to 2023 using keywords: “Link Prediction”, “Similarity Metrics”, “Machine Learning”, and “Explainable Artificial Intelligence”. After obtaining the search results, we tried to filter the results. The filtration and pre-processing of the literatures was purely title restricted. The literatures consisting of “Link Prediction” in their title was then selected manually. Figure 3 represents a graphical overview of number of papers published on LP Strategies stated above in ScienceDirect database.

After the preprocessing was performed, we studied the literatures and summarized them by providing an exhaustive literature review and their gaps. Further, we discussed the phases of LP and proposed a taxonomy of LP comprising of Similarity based and Learning based methods. At last, we conducted an experimental exercise on the proposed method and obtained its result.

This paper is organized in following sections: Section 2 provides a Literature Summary; Section 3 provides an overview of the phases to solve a LP problem; Section 4 defines Experiments and Results; Section 5 is about Discussion with Limitations and Open Challenges discussed in Section 6 and Section 7 respectively; followed by Conclusion and Future Work in the last i.e., Section 8.

2 Literature review

After studying various literatures on LP and XAI, an exhaustive literature summary is generated (as shown in Table 2). This summary provides the literature summary of various research works performed from a timespan of 2013 to 2023 providing a general idea of LP methods and XAI tools used. XAI makes use of KG as a tool, which further needs much exposure.

Table 2 Exhaustive Tabular Survey of the related literatures studied

Full size table

3 Phases of link prediction

The major steps performed in LP problem are: Data collection, network representation (optional), LP method application, performance evaluation and/or model explanation are shown in Fig. 4.

3.1 Data collection

Data Collection is primarily performed in two ways: 1) downloading existing datasets from data repositories and libraries like SNAP, Kaggle, Github and others; and 2) dataset construction.

Data collection, data cleaning, and data labelling are the three essential steps in the dataset construction process. Data gathering involves finding datasets for ML model training. There are two main approaches: when there is small dataset for training, Data Generation is used whereas Data Augmentation is another approach to obtain data. The procedure involves adding recently acquired external data to existing datasets. Data production involves: crowdsourcing, a business model that connects huge groups of people online to complete activities; and synthetic data, manufactured by a machine, to increase our training data or add future data updates.

3.2 Network representation (NR)

NR encompasses various techniques for representing networks, each with its unique approach. To graph the network adjacency matrix is often used which utilizes similarity measures. Embedding-based methods represent network node properties or linkages and converts nodes, linkages, and their characteristics into a vector space while preserving graph structure. PGMs offer representation of graph probability distributions to show complex probability connections where nodes represent random variables and edges represent probabilistic linkages between variables. Finally, GNNs (also known as KG) are effective at understanding massive, dynamic graph datasets with billions of elements, especially complex network architectures.

3.3 Link prediction methods

LP methods can be broken down into two primary categories: those that are based on similarities i.e., Similarity Metrics, and those that are based on learning i.e., Learning Based Methods. The first type computes the likelihood of a link existing between two node pairs based on the assigned similarity score. Either the nodes’ attributes or the network’s topology can play a role in calculating the similarity score. There are three distinct types of learning-based approaches. Figure 5(a) depicts the workflow of LP problem adopting similarity metrics whereas Fig. 5(b) adopts learning-based techniques.

3.3.1 Similarity metrics

Similarity-based algorithms first determine the probable strength of a connection between two nodes based on their resemblance, then select the “L” linkages with the most similarities. Network topology calculates the similarity score of two non-connected nodes. Local, global, and quasi-local scores can compare nodes. Local-based scores detect similar node pairings using local information. Global node similarity methods consider network architecture. Global information and computational complexity benefit them. Quasi-local similarities balance these techniques. They need more data than local indices but less time than global ones. Researchers use several similarity metrics to tackle LP problems, including neighbors, dataset similarity/dissimilarity, node closeness, and degree of connectivity. Studies carried out in [3, 4] have discussed various Similarity Metrics adopted widely.

3.3.2 Learning based methods

Learning-based techniques incorporate network architectural and non-architectural information into ML frameworks. This lets the techniques determine the likelihood of an edge between two nearby nodes. LP uses supervised learning methods including SVM, RF, KNN, Naive Bayes, Ensemble Learning, Logistic Regression, Radial Basis Function network, and others. Representation learning strategies can be classified as MF-based, Deep Neural Network-based, or Path and Walk-based based on the models’ loss function and decoder function (graph similarity metrics) [83].

3.4 Performance evaluation

Performance of LP methods are commonly evaluated using popular metrics like Accuracy, Precision, F1 Score, Recall, Receiver Operating Curve (ROC), AUC, HR@k, and MRR. Various authors have used some uncommon metrics for performance evaluation which are stated in Table 3.

Table 3 Some uncommon Evaluation Metrics with their uses in network/usage type and reference

Full size table

Table 3 is generated based on various literatures studied in Table 2. The data in Table 3 provides a clear idea of the uses of evaluation metrics in terms of network types and/or system type which might help users in the future to gain knowledge about which metric to use in their work, depending upon network/system type. CD is one among the most well-known problems in LP which uses MAE, NMI, ARI and Average COND whereas for a multilayer complex network, TPR, Sensitivity, Specificity and MCC are used. For DR methods such as MF and embedding, RMSE and PCC are used.

3.5 Explainable artificial intelligence

AI that can be explained by a human expert is referred to as XAI. It contrasts with the idea of the “black box” in ML, where even the inventors of the AI cannot explain why the AI made a particular decision. The social right to explanation is implemented by XAI.

Generally, XAI is classified in two categories: 1) Global: It provides a general explanation of the concept and is based on universal operational principles, 2) Local: It provides an explanation of the rules that produce each individual piece of data.

Figure 6 represents XAI techniques applied Locally, Globally, both Globally and Locally, along with the Explainable tools improve LP result interpretation and user comprehension. Explainability strategies in ML are varied. Permutation Importance compares a model trained on the original data to randomly rearranged feature values to determine feature importance. Partial Dependence Plots help discover key features and understand their interactions by showing the relationship between a target variable and input features. Accumulated Local Effects is used for non-linear models with complicated input-output interactions, while Morris Sensitivity Analysis evaluates input feature superiority. Global Interpretation via Recursive Partitioning explains complex model behavior with decision trees. Anchors explain model workings, while Contrastive Explanation Method compares predictions to similar examples to find minimal input changes that affect predictions. Counterfactual Instances verify model stability and accuracy. Model behavior is explained by Integrated Gradients and LIME. Shapley values determine feature influence, Scalable Bayesian Rule Lists provide interpretable if-then rules, and Explainable Boosting Machine makes accurate, feature-selective predictions using Boosting and Generalised Additive Models.

4 Experiments and results

This section explains the experiment conducted for obtaining an Explainable LP. We have concluded our results based on accuracy and ROC curve. The experiment was conducted on a workstation with Intel Core i7 4770, 2.2 GHz GPU, 16 GB memory and Windows 10 pro operating system. The experiment was implemented in Python along with libraries: pandas, numpy, networkx, scikit learn, and seaborn.

4.1 Dataset and evaluation metrics

The dataset chosen for conducting this research is Facebook-Social-Network-Analysis dataset, which is used to predict future friend recommendation and consists of three entities: Node 1, Node 2 and Connection which represents the “from node”, “to node” and Connection type respectively. Connection is a Boolean value: 1 for connected node and 0 for unconnected node. Table 4 provides the general statistics of the dataset. The dataset was taken from https://github.com/abcom-mltutorials/Facebook-Social-Network-Analysis.

Table 4 General statistics of data

Full size table

In order to evaluate the performance of the method, we opted Accuracy and ROC curve. Accuracy is measured using:

$$Accuracy=\frac{Correct\ Predictions}{Total\ Predictions}$$

(1)

5 Method

The proposed approach incorporates various similarity measures as parameters, utilizes the RF classifier as the ML model, and employs LIME as XAI technique. Figure 7 represents a complete systematic architecture of the proposed approach.

Firstly, the Facebook data was taken from the Github repository, then the data is preprocessed and similarity scores are computed post which we created a dataframe consisting of similarity scores and nodes which was then utilized for computing correlation and splitting into train-test data. After deciding the features and target, RF classifier was trained using train set and then predictions were made using test set. Lastly the performance was evaluated and results were interpreted using LIME.

6 Results

6.1 Preprocessing and parameters

In order to preprocess the collected data, sorting of the columns was performed to get two nodes as a tuple “edge”. Similarity Metrics CN, AA, RA, JC, and PA were calculated. The first 10 rows of the scores of these are mentioned in Table 5. The data generated in Table 5 is used to Train and Test the classifier (Fig. 8).

Table 5 Computed Similarity Score of first 10 edges (for reference)

Full size table

The data in Table 5 has a strong linear association between two continuous variables as computed by:

$$\textrm{Pearson}\ \textrm{Correlation}\ (r)=\frac{\sum \left({x}_i-\overline{x}\right)\left({y}_i-\overline{y}\right)}{\sqrt{\sum {\left({x}_i-\overline{x}\right)}^2\sum {\left({y}_i-\overline{y}\right)}^2}}$$

(2)

New data consists of 5 feature variables (as shown in Table 5): CN, JC, RA, AA, PA and one target variable ‘connectivity’: exist (1) or not exist (0).

Data splitting impacts the performance and generalization of the model. We implemented a random splitting technique by randomly shuffling the preprocessed dataframe and subsequently dividing it into a training set and a test set in the ratio of 67:33. Table 6 provides a sample from dataset that is used for training and testing. RF Classifier model was build using ensemble learning.

Table 6 Sample from created dataframe for training and testing purpose

Full size table

The computed similarity metrics are treated as primary parameters (features) for feeding input to LIME, no parameters were set empirically. The choice of parameters is completely based on the choice of ML model, XAI tool and type of results to interpret. For conducting this research, similarity metrics are chosen as parameters as they will help in the interpretation of results by supporting the existence or non-existence of links based on the values they generate for a specific node.

7 Experimental analysis

We tested our method via accuracy and ROC curve. Accuracy computed was 0.6678 whereas Fig. 9 shows the plotted ROC curve. Predictions were interpreted using LIME as shown in Fig. 10(a) for data at index 6985. LIME predicts with 91% confidence that the connection does not exist (actual connection not exist as shown in Table 7(row 1, column 2)). Parameters RA and CN increase the probability of not existence. Similarly, Fig. 10(b) and (c) represent LIME results for index values 9864 and 10,256 respectively which predict the inexistence of connection (actual connection also does not exist, shown in Table 7(row 2 and 3, column 2)).

Table 7 Reference for Connection value for LIME Interpreter

Full size table

8 Discussion

After studying various literatures on LP, it is observed that LP methods are labelled as: Similarity Based and Learning Based methods (shown in Fig. 11). Further learning based methods are classified into: Algorithmic, DR and Probabilistic method.

The algorithmic approach to LP involves employing classification techniques and meta-heuristics. This entails extracting features from network data and utilizing them as inputs for training ML models. By discerning patterns and relationships within the network data, these models strive to predict links between nodes. Concurrently, DR serves as a method to transform larger datasets into more manageable forms, preserving crucial information. Applied to address classification and regression challenges, it aids in obtaining more accurate predictive models for LP. Methods combining DR with LP include MF and embedding-based techniques. Additionally, probabilistic LP utilizes statistical models like ERGM, SBM, or latent space models to estimate the likelihood of node connections. Maximum likelihood-based link prediction assesses the statistical model’s parameters for their chance to enhance observed data, encompassing network structures and other attributes.

Figure 12 shows the evolution of Similarity based (left branch) and ML (right branch) based LP approaches used from year 2013 to 2023 with their limitations and utilities. This figure helps novice to select and integrate Similarity based and ML based approach on the basis of their complementary features for increasing the effectiveness of LP methods.

Networks belong to various categories depending upon their structure (multi-layer, multigraph, simple, complex, bipartite), nature (heterogeneity, homogeneity), attributes (node attributes), and direction (unidirectional, bidirectional, no direction). Due to difference in structure, nature, attributes, direction of graphs, one type of LP method is not applicable to all graphs. Figure 13 shows year-wise the category of graph and LP methods applied on them in chronological order. With the help of this figure, it can be identified which type of graph and the LP method is deployed from the year 2013 to 2023.

The application of XAI to LP is our main innovation. XAI reduces the cost of mistakes, finds their causes, and improves model efficiency by characterizing errors and decreasing biased predictions. The specific requirements for implementing XAI in Python can vary depending on the techniques and libraries. The minimum requirements include Python ML libraries like scikit-learn, TensorFlow, or PyTorch; ML models; Interpretability Libraries like SHAP, LIME, or InterpretML; preprocessed data; and the right XAI approach. Final steps include Documentation and Visualization.

With the complexity of AI technology, algorithms are hard to grasp and analyze. Researchers can create and improve methodologies. The need to minimize the model in many XAI algorithms makes performance prediction challenging. For more complex models, current explainability methods may not account for all factors that influence a choice, limiting their usefulness. Creating ethically sound and well explained XAI algorithms is the goal of new research.

The proposed approach preserves some properties that are its effectiveness, robustness, interpretability, and user-friendliness. It excels in effectiveness by making predictions on large dataset, the method is simple and fosters a seamless interaction between the method and its users. Its innovation lies in the incorporation of XAI in LP domain.

Our approach incorporates a diverse array of similarity measures, ensuring its adaptability to various graph types and guaranteeing its robustness across diverse datasets.

9 Limitations

While studying available literatures, some limitations were found which are specified as:

Searching of literatures with keywords provided some irrelevant literatures which required manual filtering and excessive time consumption.
Various articles did not explicitly mention the ML models used by them. Wherever the models were not specified, we did not mention the model’s name in our literature survey (Table 2).
Many researchers did not specify the drawbacks of their methods clearly. Mostly the drawbacks were drawn on the basis of results only.
Wherever the factors like node attributes, weights, network properties might be used, they were not utilized. We did not mention them explicitly in our article.

10 Open challenges for research

10.1 Challenges in LP

Scalability, complexity and computational expenses are a few problems faced by LP that have been quoted by other authors. However, some problems continue to go unreported:

Dynamicity: Different types of network dynamism exist, including nodes and edges being added and deleted at next timestamp. LP only handles one or two types of dynamicity; no existing technique covers all dynamicity.
Generalization of network: Each network has unique nodes and linkages; thus, they should be structured accordingly. Currently, there is no comprehensive and universal LP solution available for networks.
Timestamp missing: The dataset lacks timestamps for network-wide link or node formation for time period ‘t’. In such a network, separating training and testing datasets is difficult. Because some linked node pairs may be randomly assigned to the training set and others to the testing set. In this scenario, CN-based methods are unreliable.
Imbalance in dataset: The SN dataset includes mostly bad and some outstanding class. Unsupervised learning algorithms are indifferent to class distributions; therefore, they cannot balance data and focus on class boundaries. This problem can be solved with ensemble methods and data sampling.

10.2 Challenges in XAI

Blackbox resemblance: Experts have trouble understanding many ML algorithms’ decisions. Black box solutions for incomprehensible judgements may cause legal, ethical, and operational difficulties. Before implementation, black-box machines cannot be checked or audited, making behaviour assurances problematic. Why or how to rectify a ML system’s bad judgement is difficult.
Biasness: Keeping an AI programme from learning biases or unfair perspectives is difficult. Possible gaps in the training data, model, and objective function cause this challenge. For ethical and fair AI systems, these biases must be addressed and mitigated.
Fairness of results: XAI struggles to assess AI system fairness. This difficulty occurs because fairness perceptions vary depending on context and ML algorithm input.
Safety issues: AI reliability is assessed by examining its decision-making process. The fundamental generalisation in statistical learning theory requires organisations to make assumptions from unseen data to fill gaps, making this task difficult.

11 Conclusion and future work

This paper offers an exhaustive literature review on LP problem and XAI, accompanied by a thorough analysis and understanding of LP, its distinct phases, and the problem-solving techniques employed. The prime objective of this study is to establish a generalized concept of XAI and explore its applicability in LP. Among the myriad XAI tools and methods available, the experimental exercise focuses on LIME as it sheds light on the interpretation of link existence or absence between pairs of nodes. The experimental exercise conducted on Facebook, a real-world SN, demonstrates the potential for significant accuracy improvements using various similarity measures and interpretation of results using LIME.

As of our future perspective we will figure out more emerging techniques based on ML, DL, and ANN based LP methodologies. We plan to extend our study by incorporating various datasets to broaden the scope of our analysis. Additionally, we aim to enhance our method’s effectiveness by considering node attributes and conducting comparisons with existing methods for a more comprehensive evaluation.

Data Availability

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

References

Gunning D, Stefik M, Choi J, Miller T, Stumpf S, Yang GZ (2019) XAI-Explainable artificial intelligence. Sci Robot 4(37):7120
Article Google Scholar
Martínez V, Berzal F, Cubero JC (2016) A survey of link prediction in complex networks. ACM Comput Surv 49(4):1–33
Vital A, Amancio DR (2022) A comparative analysis of local similarity metrics and machine learning approaches: application to link prediction in author citation networks. Scientometrics 127(10):6011–6028
Article Google Scholar
Zhou T (2021) Progresses and challenges in link prediction. iScience 24(11):103217
Article Google Scholar
Borys K, Schmitt YA, Nauta M, Seifert C, Krämer N, Friedrich CM et al (2023) Explainable AI in medical imaging: an overview for clinical practitioners – saliency-based XAI approaches. Eur J Radiol 162:110787
Article Google Scholar
Saeed W, Omlin C (2023) Explainable AI (XAI): a systematic meta-survey of current challenges and future opportunities. Knowl Based Syst 263:110273
Article Google Scholar
Pandey B, Bhanodia PK, Khamparia A, Pandey DK (2019) A comprehensive survey of edge prediction in social networks: techniques, parameters and challenges. Expert Syst Appl 124:164–181
Article Google Scholar
Bhanodia PK, Khamparia A, Pandey B (2021) Supervised shift k-means based machine learning approach for link prediction using inherent structural properties of large online social network. Comput Intell 37(2):660–677
Article Google Scholar
Bhanodia PK, Khamparia A, Pandey B (2021) An efficient link prediction model using supervised machine learning. In: Recent studies on computational intelligence: doctoral symposium on computational intelligence (DoSCI 2020). Springer, Singapore, pp 19–27
Chapter Google Scholar
Bhanodia PK, Khamparia A, Pandey B (2021) An approach to predict potential edges in online social networks. In: Data science and security: proceedings of IDSCS 2020. Springer, Singapore, pp 1–6
Google Scholar
Sun Q, Hu R, Yang Z, Yao Y, Yang F (2017) An improved link prediction algorithm based on degrees and similarities of nodes. In: Proceedings - 16th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2017 13–8
Tang M, Wang W (2022) Cold-start link prediction integrating community information via multi-nonnegative matrix factorization. Chaos Solitons Fractals 162(112421):1–13
Google Scholar
Liu G (2022) An ecommerce recommendation algorithm based on link prediction. Alex Eng J 61(1):905–910
Article Google Scholar
Zhao P, Aggarwal C, He G (2016) Link prediction in graph streams. 2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016. 2016 Jun 22 553–64
Xu M, Yin Y (2017) A similarity index algorithm for link prediction. In: Proceedings of the 2017 12th International Conference on Intelligent Systems and Knowledge Engineering, ISKE 2017
Goswami S, Roy S, Banerjee S et al (2022) A profiling-based movie recommendation approach using link prediction. Innovations Syst Softw Eng. https://doi.org/10.1007/s11334-022-00472-4
Article Google Scholar
Li S, Cai N (2018) Construction of brand community overlap based on ensemble link prediction algorithm. In: Proceedings - 2018 3rd international conference on mechanical, control and computer engineering, ICMCCE 2018. Institute of Electrical and Electronics Engineers Inc., pp 438–441
Google Scholar
Ahmed C, ElKorany A (2015) Enhancing link prediction in Twitter using semantic user attributes. In: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 1155–61
Lv L, Jin CH, Zhou T (2009) Effective and efficient similarity index for link prediction of complex networks. Phys Rev E Stat Nonlinear Soft Matter Phys 80(4). arXiv preprint arXiv:0905.3558
Cheng HM, Ning, YZ, Yin Z, Yan C, Liu X, Zhang ZY (2018) Community detection in complex networks using link prediction. Mod Phys Lett B 32(01):1850004
Jiang H, Liu Z, Liu C, Su Y, Zhang X (2020) Community detection in complex networks with an ambiguous structure using central node based link prediction. Knowl-Based Syst 195:105626
Zhao D, Zhang L, Zhao W (2016) Genre-based link prediction in bipartite graph for music recommendation. Procedia Computer Science 91:959–965
Article Google Scholar
Cui Y, Zhang L, Wang Q, Chen P, Xie C (2016) Heterogeneous network linkage-weight based link prediction in bipartite graph for personalized recommendation. Procedia Computer Science 91:953–958
Article Google Scholar
Berahmand K, Nasiri E, Forouzandeh S, Li Y (2022) A preference random walk algorithm for link prediction through mutual influence nodes in complex networks. J King Saud Univ Comput Inf Sci 34(8):5375–5387
Google Scholar
Papadimitriou A, Symeonidis P, Manolopoulos Y (2011) Friendlink: Link prediction in social networks via bounded local path traversal. In: Proceedings of the 2011 International Conference on Computational Aspects of Social Networks, CASoN’11 66–71
Papadimitriou A, Symeonidis P, Manolopoulos Y (2012) Fast and accurate link prediction in social networking systems. J Syst Softw 85(9):2119–2132
Article Google Scholar
Shabaz M, Garg U (2021) Shabaz–Urvashi link prediction (SULP): a novel approach to predict future friends in a social network. J Creative Commun 16(1):27–44
Article Google Scholar
Shabaz M, Garg U (2022) Predicting future diseases based on existing health status using link prediction. World J Eng 19(1):29–32
Article Google Scholar
Yao L, Wang L, Pan L, Yao K (2016) Link prediction based on common-neighbors for dynamic social network. Procedia Computer Science 83:82–89
Li J, Zhang L, Meng F, Li F (2014) Recommendation algorithm based on link prediction and domain knowledge in retail transactions. Procedia Computer Science 31:875–881
Article Google Scholar
Malhotra D, Goyal R (2021) Supervised-learning link prediction in single layer and multiplex networks. Machine Learning with Applications 6:100086
Nguyen-Thi AT, Nguyen PQ, Ngo TD, Nguyen-Hoang TA (2015) Transfer AdaBoost SVM for link prediction in newly signed social networks using explicit and PNR features. Procedia Computer Science 60:332–341
Article Google Scholar
Stanhope A, Sha H, Barman D, Hasan M Al, Mohler G (2019) Group Link Prediction. In: Proceedings - 2019 IEEE International Conference on Big Data 3045–52
Nassar H, Benson AR, Gleich DF (2019) Pairwise link prediction. In: Proceedings of the 2019 IEEE/ACM international conference on advances in social networks analysis and mining, ASONAM 2019. Association for Computing Machinery, Inc, pp 386–393
Google Scholar
Zhang D, Yin J, Yu PS (2022) Link prediction with contextualized self-supervision. IEEE Trans Knowl Data Eng 35(7):7138–7151
Google Scholar
Xu X, Zhang P, He Y, Chao C, Yan C (2022) Subgraph Neighboring Relations Infomax for Inductive Link Prediction on Knowledge Graphs. IJCAI International Joint Conference on Artificial Intelligence 2341–7
Naravani M, Narayan DG, Shinde S, Mulla MM (2020) A cross-layer routing metric with link prediction in wireless mesh networks. Procedia Computer Science 171:2215–2224
Article Google Scholar
Agibetov A (2023) Neural graph embeddings as explicit low-rank matrix factorization for link prediction. Pattern Recogn 133
Zulaika U, Sánchez-Corcuera R, Almeida A, López-de-Ipiña D (2022) LWP-WL: link weight prediction based on CNNs and the Weisfeiler–Lehman algorithm. Appl Soft Comput 120
Weinzierl MA, Harabagiu SM (2021) Automatic detection of COVID-19 vaccine misinformation with graph link prediction. J Biomed Inform 124
Nasiri E, Berahmand K, Li Y (2021) A new link prediction in multiplex networks using topologically biased random walks. Chaos Solitons Fractals 151
Yasami Y, Safaei F (2018) A novel multilayer model for missing link prediction and future link forecasting in dynamic complex networks. Physica A 492:2166–2197
Article MathSciNet Google Scholar
Aghabozorgi F, Khayyambashi MR (2018) A new similarity measure for link prediction based on local structures in social networks. Physica A 501:12–23
Article Google Scholar
Muniz CP, Goldschmidt R, Choren R (2018) Combining contextual, temporal and topological information for unsupervised link prediction in social networks. Knowl Based Syst 156:129–137
Article Google Scholar
Chamberlain BP, Shirobokov S, Rossi E, Frasca F, Markovich T, Hammerla N, et al (2022) Graph Neural Networks for Link Prediction with Subgraph Sketching arXiv preprint arXiv:2209.15486.
Bastami E, Mahabadi A, Taghizadeh E (2019) A gravitation-based link prediction approach in social networks. Swarm Evol Comput 44:176–186
Article Google Scholar
Jiang Z, Tang X, Zeng Y, Li J, Ma J (2021) Adversarial link deception against the link prediction in complex networks. Physica A 577
Ghorbanzadeh H, Sheikhahmadi A, Jalili M, Sulaimany S (2021) A hybrid method of link prediction in directed graphs. Expert Syst Appl 165
Chao LJ, Ling ZD, Ge BF, Yang KW, Chen YW (2018) A link prediction method for heterogeneous networks based on BP neural network. Physica A 495:1–17
Article Google Scholar
Wang G, Wang Y, Li J, Liu K (2021) A multidimensional network link prediction algorithm and its application for predicting social relationships. J Comput Sci 53
Shakibian H, Charkari NM, Jalili S (2016) A multilayered approach for link prediction in heterogeneous complex networks. J Comput Sci 17:73–82
Article MathSciNet Google Scholar
Zhao Z, Gou Z, Du Y, Ma J, Li T, Zhang R (2022) A novel link prediction algorithm based on inductive matrix completion. Expert Syst Appl 188
Nasiri E, Berahmand K, Rostami M, Dabiri M (2021) A novel link prediction algorithm for protein-protein interaction networks by attributed graph embedding. Comput Biol Med 137
Bütün E, Kaya M (2019) A pattern based supervised link prediction in directed complex networks. Physica A 525:1136–1145
Article Google Scholar
Zhou Y, Wu C, Tan L (2021) Biased random walk with restart for link prediction with graph embedding method. Physica A 570
Kou H, Liu H, Duan Y, Gong W, Xu Y, Xu X et al (2021) Building trust/distrust relationships on signed social service network through privacy-aware link prediction process. Appl Soft Comput 100
Lee YL, Zhou T (2021) Collaborative filtering approach to link prediction. Physica A 578
Karimi F, Lotfi S, Izadkhah H (2021) Community-guided link prediction in multiplex networks. J Inf Secur 15(4)
Gao H, Li B, Xie W, Zhang Y, Guan D, Chen W et al (2021) CSIP: enhanced link prediction with context of social influence propagation. Big Data Res 24
Han J, Teng X, Tang X, Cai X, Liang H (2020) Discovering knowledge combinations in multidimensional collaboration network: a method based on trust link prediction and knowledge similarity. Knowl Based Syst 195:105701
Article Google Scholar
Bai S, Zhang Y, Li L, Shan N, Chen X (2021) Effective link prediction in multiplex networks: a TOPSIS method. Expert Syst Appl 177
Wu M, Wu S, Zhang Q, Xue C, Kan H, Shao F (2019) Enhancing link prediction via network reconstruction. Physica A 534
Chen X, Wu T, Xian X, Wang C, Yuan Y, Ming G (2020) Enhancing robustness of link prediction for noisy complex networks. Physica A 555
Li K, Tu L, Chai L (2020) Ensemble-model-based link prediction of complex networks. Comput Netw 166:106978
Mallek S, Boukhris I, Elouedi Z, Lefèvre E (2019) Evidential link prediction in social networks based on structural and social information. J Comput Sci 30:98–107
Article MathSciNet Google Scholar
Wang Z, Liang J, Li R (2018) Exploiting user-to-user topic inclusion degree for link prediction in social-information networks. Expert Syst Appl 108:143–158
Article Google Scholar
Zhang Z, Cui L, Wu J (2021) Exploring an edge convolution and normalization based approach for link prediction in complex networks. J Netw Comput Appl 189:103113
Liu S, Ji X, Liu C, Bai Y (2017) Extended resource allocation index for link prediction of complex network. Physica A 479:174–183
Article MathSciNet Google Scholar
Bütün E, Kaya M, Alhajj R (2018) Extension of neighbor-based link prediction methods for directed, weighted and temporal social networks. Inf Sci 463–464:152–165
Article MathSciNet Google Scholar
Kumar M, Mishra S, Biswas B (2022) Features fusion based link prediction in dynamic networks. J Comput Sci 57:101493
Chen G, Xu C, Wang J, Feng J, Feng J (2019) Graph regularization weighted nonnegative matrix factorization for link prediction in weighted complex network. Neurocomputing 369:50–60
Article Google Scholar
Nasiri E, Berahmand K, Samei Z, Li Y (2022) Impact of centrality measures on the common neighbors in link prediction for multiplex networks. Big Data 10(2):138–150
Article Google Scholar
Nasiri E, Berahmand K, Li Y (2023) Robust graph regularization nonnegative matrix factorization for link prediction in attributed networks. Multimed Tools Appl 82(3):3745–3768
Article Google Scholar
Niranjan K, Shankar Kumar S, Vedanth S, Chitrakala S (2023) An explainable AI driven decision support system for COVID-19 diagnosis using fused classification and segmentation. Procedia Comput Sci 218:1915–1925
Article Google Scholar
Le T, Le N, Le B (2023) Knowledge graph embedding by relational rotation and complex convolution for link prediction. Expert Syst Appl 214:119122
Article Google Scholar
Shi M, Zhao J, Wu D (2023) Convolutional neural network knowledge graph link prediction model based on relational memory. Comput Intell Neurosci 1–9
Safavi T, Koutra D, Meij Bloomberg E. Evaluating the Calibration of Knowledge Graph Embeddings for Trustworthy Link Prediction In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Ma J, Qiao Y, Hu G, Wang Y, Zhang C, Huang Y et al (2019) ELPKG: a high-accuracy link prediction approach for knowledge graph completion. Symmetry 11(9):1096
Article Google Scholar
Xiao H, Huang M, Zhu X (2016) From One Point to A Manifold: Knowledge Graph Embedding For Precise Link Prediction. IJCAI International Joint Conference on Artificial Intelligence 1315–21
Ranganathan V, Barbosa D, Lin X, Qin L, Zhang W, Zhang Y et al HOPLoP: multi-hop link prediction over knowledge graph embeddings. World Wide Web 25(9)
Rossi A, Firmani D, Merialdo P, Teofili T (2022) Explaining Link Prediction Systems based on Knowledge Graph Embeddings. In: Proceedings of the ACM SIGMOD International Conference on Management of Data 2062–75
Stoica G, Stretcu O, Platanios EA, Mitchell TM, Póczos B (2020) Contextual parameter generation for knowledge graph link prediction. Proc AAAI Conf Artif Intell 34(03):3000–3008
Google Scholar
Mutlu EC, Oghaz T, Rajabi A, Garibay I (2020) Review on learning and extracting graph features for link prediction. Mach Learn Knowl Extr 2(4):672–704
Article Google Scholar

Download references

Funding

This research received no external funding.

Author information

Authors and Affiliations

Department of Computer Science, Babasaheb Bhimrao Ambedkar University, Lucknow, UP, India
Mridula Dwivedi, Babita Pandey & Vipin Saxena

Authors

Mridula Dwivedi
View author publications
You can also search for this author in PubMed Google Scholar
Babita Pandey
View author publications
You can also search for this author in PubMed Google Scholar
Vipin Saxena
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Babita Pandey.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

ESM 1

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Dwivedi, M., Pandey, B. & Saxena, V. Exploring the scope of explainable artificial intelligence in link prediction problem-an experimental study. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18287-9

Download citation

Received: 07 August 2023
Revised: 30 November 2023
Accepted: 15 January 2024
Published: 25 January 2024
DOI: https://doi.org/10.1007/s11042-024-18287-9

Exploring the scope of explainable artificial intelligence in link prediction problem-an experimental study

Abstract

Similar content being viewed by others

Link Prediction by Network Analysis

Systematic Biases in Link Prediction: Comparing Heuristic and Graph Embedding Based Methods

Link Prediction via Factorization Machines

Explore related subjects

1 Introduction

1.1 Motivation and research gaps

1.2 Contributions

1.3 Research methodology

2 Literature review

3 Phases of link prediction

3.1 Data collection

3.2 Network representation (NR)

3.3 Link prediction methods

3.3.1 Similarity metrics

3.3.2 Learning based methods

3.4 Performance evaluation

3.5 Explainable artificial intelligence

4 Experiments and results

4.1 Dataset and evaluation metrics

5 Method

6 Results

6.1 Preprocessing and parameters

7 Experimental analysis

8 Discussion

9 Limitations

10 Open challenges for research

10.1 Challenges in LP

10.2 Challenges in XAI

11 Conclusion and future work

Data Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher’s note

Supplementary information

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation