An effective keyword search co-occurrence multi-layer graph mining approach

Bolorunduro, Janet Oluwasola; Zou, Zhaonian; Bah, Mohamed Jaward

doi:10.1007/s10994-024-06528-9

An effective keyword search co-occurrence multi-layer graph mining approach

Published: 02 April 2024

Volume 113, pages 5773–5806, (2024)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Machine Learning Aims and scope Submit manuscript

An effective keyword search co-occurrence multi-layer graph mining approach

Download PDF

223 Accesses
1 Altmetric
Explore all metrics

Abstract

A combination of tools and methods known as "graph mining" is used to evaluate real-world graphs, forecast the potential effects of a given graph’s structure and properties for various applications, and build models that can yield actual graphs that closely resemble the structure seen in real-world graphs of interest. However, some graph mining approaches possess scalability and dynamic graph challenges, limiting practical applications. In machine learning and data mining, among the unique methods is graph embedding, known as network representation learning where representative methods suggest encoding the complicated graph structures into embedding by utilizing specific pre-defined metrics. Co-occurrence graphs and keyword searches are the foundation of search engine optimizations for diverse real-world applications. Current work on keyword searches on graphs is based on pre-established information retrieval search criteria and does not provide semantic linkages. Recent works on co-occurrence and keyword search methods function effectively on graphs with only one layer instead of many layers. However, the graph neural network has been utilized in recent years as a branch of graph model due to its excellent performance. This paper proposes an Effective Keyword Search Co-occurrence Multi-Layer Graph mining method by employing two core approaches: Multi-layer Graph Embedding and Graph Neural Networks. We conducted extensive tests using benchmarks on real-world data sets. Considering the experimental findings, the proposed method enhanced with the regularization approach is substantially excellent, with a 10% increment in precision, recall, and f1-score.

Merit: multi-level graph embedding refinement framework for large-scale graph

Article Open access 31 August 2023

DNFS: A Digraph Neural Network with the First-Order and the Second-Order Similarity

GraphSAGE++: Weighted Multi-scale GNN for Graph Representation Learning

Article Open access 09 February 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Computers can comprehend language because it is the medium humans use for communication; hence, Search Engine Optimization (SEO) is optimizing websites to increase their visibility in Google’s natural ranking and other search engines. It can model how individuals acquire and discover information on practically any topic. Keyword search is finding the relevance of words, queries, and phrases to a website and its folios so that the user can find the best folio to answer their query on real-world applications, known as search intent see Fig. 1 for more details.

One of the most valuable uses of pattern recognition (PR), machine learning (ML), artificial intelligence (AI), social computing (SC), and recommender systems (RS) is to help make informed decisions and provide a more realistic representation of multiple relations that characterize an entity in the system. However, optimizing content or creating possible searches from search engines is possible if what people are searching for and what they want to see can be accessed easily (Han et al., 2022; Aggarwal, 2016). Yet another approach to finding Co-occurrence (CO) patterns is revealed through corpus linguistics and statistical analyses in which extensible Markup Language (XML) and graph structures in hypertext corpora extract specific data attributes.

Co-occurrence networks, sometimes called semantic networks, Segev (2021) are graphical methods for solving ambiguity problems and analyzing text, including potential relationships among entities, concepts, and organisms like bacteria (Freilich et al., 2010) using a graphic visualization. Co-occurrence networks are collections of terms that are connected together because they occur together in a certain text, concept, or structure. By linking words together according to a set of co-occurrence strategies and searching the format of scientific communication, co-citation analysis, multinomial model, and graph neural networks (Han et al., 2022; Aggarwal, 2016; Yang et al., 2021; Garg, 2021)networks are created, which have significantly improved the techniques nevertheless still have flaws. There is great interest in relational database keyword searches (Yang et al., 2021; Garg, 2021; Bast et al., 2016), and the most critical aspect of relational data access is a Structured Query Language (SQL). Accessing a significant volume of relational data has become more challenging for prospective users due to the requirement that relational data schema be well-known to use SQL. Graphs, also known as social graphs, are being used in social media for information organization, structure, storage, and retrieval, for node categorization, connections prediction, clustering, and visualization (Cai et al., 2018; Goyal & Ferrara, 2018). Graph clustering groups the nodes of a graph into clusters using the graph structure or node attributes. Numerous research works (Ma et al., 2021) in the node distribution approach are proposed, and the denoted nodes can be transformed into traditional clustering algorithms. Search Engine Optimization (SEO), such as Google, still represents an influential and trustworthy resource for discovering practical website information.

The context relevant of the user query and the search engines indexed folios were the primary factors used by early search engines to return pertinent folios for the user. The information retrieval (IR) techniques were directly implemented in the retrieval and ranking algorithms. Conventional information retrieval (IR) presumes that the fundamental unit of information is a document and that a vast array of documents can be accessed to create the text database. Researchers have used IR to extract knowledge from structured data for community identification and search. A list of keywords sometimes referred to as terms, is the most widely used query format. Information in the text is unstructured, whereas data in databases is highly structured and kept in relational tables; thus, information retrieval from text varies from retrieved data from databases using SQL queries. The primary goal of interest is retrieval and related activities that can increase the accuracy or efficiency of retrieval since text retrieval lacks a structured query language like SQL, and the IR community has not focused much on real-world data applications like false news.

Keyword research is the first and most crucial step in any search engine optimization strategic plan (Yang et al., 2021; Garg, 2021). The most popular approach to solving the keyword search problem is Graph-Based Keyword Search (GBKS), which identifies a set of closely linked nodes in the graph that may match a specific keyword based on the query (Bhalotia et al., 2002; Kacholia et al., 2005; He et al., 2007), BANKS-I (Bhalotia et al., 2002) considers the shortest route from a tree’s root to a node that contains keywords, BANKS-II (Kacholia et al., 2005) suggests using a forward search to approximate a solution, and BLINKS (He et al., 2007) tries to identify the set of all different sub trees with the best scores to improve the BANKS-II approach. These retrieval techniques are centered on nodes while using keyword search engines and semantic relationships (Wang et al., 2008) can link keyword inquiries and formal questions. Therefore, classical manual reading for information extraction and knowledge acquisition cannot keep up with the needs of the complex data age.

Researchers on machine learning (ML) and graph mining have used various branches of artificial intelligence, from recommendation systems, computer vision, natural language processes, and graph-based, for solving standard processes through graph-based machine learning. In conventional ML, researchers have been working on alternative clustering problems on graphs, and comparing the similarity of objects of the same kind is crucial in many applications (Han et al., 2022; Aggarwal, 2016). A sustainable cluster is designated as a collection of nodes in a multiplex network that is concurrently coupled to one another across all of the distinct layers (Baxter et al., 2016). Moreover, sustainability corresponds to several paths that connect the same pair of nodes in the feasible cluster, but each exists on a different multiplex layer. Therefore, understanding fundamental search co-occurrence correlation through multi-layer graph representations is an essential methodology from literature to intelligence analysis (Fig. 2).

Multiple layers are a feature of realistic systems. Multi-layer graphs (MLGs) are widely accepted as such (Boccaletti et al., 2014; Kivelä et al., 2014; Kumar et al., 2020) differ from single-layer graphs SLGs by their multi-relational structure that offers a range of resources for making good decisions, with an inter-relational corporation structure that provides various resources for decision-making, as well as entities that can have different types of relationships between them. When modeling several real-world applications among the same group of people, for example, MLGs provide an expressive method where layers represent various online and offline relations (e.g., following, co-authorship, co-working relations, and so on), keyword research is the first and most crucial step in any search engine optimization strategic plan where various academia and the business community have utilized it in helping users maximize network resources where Label Propagation (LP) (Nickel et al., 2015; Alimadadi et al., 2019) Random Walks (RW) (Bojchevski et al., 2018; Valdeolivas et al., 2019), E-Commerce Recommendation (E-CR) (Aggarwal, 2016) Multi-layer graph embedding (MLGE) (Rossi et al., 2021; Makarov et al., 2021), Deep Neural Network have been well studied to forecast the relational link between entities and keyword search on multi-layer graphs to represent complex relationships accurately (Wu et al., 2020; Perozzi et al., 2014). However, the common usage of MLG representations of various vertices, edges, and critical world search methods find relevant components in a network system. Current methods focus on specific multi-layer graphs, such as multiplex and heterogeneous structures of interconnected complex systems. At the same time, most affirmation approaches have their merit and demerits despite challenges like memory cost and time complexity, graph embedding known as representative of network learning offers (Grover & Leskovec, 2016; Hamilton et al., 2017) an effective solution by changing the representation form and mapping nodes into a low-dimensional space, maintaining consistent and enhancing understanding of network entities. The increasing accessibility of complex networks with billions of vertices and edges has significantly advanced network analysis, where Multi-layer Graph Embedding (MLGE) attempts to describe the vertices and edges in vector space while maintaining the structure of the graph and information within and across layers in overcoming the complex network representation and analysis challenges of the graph embedding network.

Diverse techniques have been put out to learn graph representations. Graph Neural Networks (GNN) (Battaglia et al., 2018), the most known network that Google recently introduced, extends popular networks like RNN and CNN to graph-structured data (Scarselli et al., 2008; Duvenaud et al., 2015; Niepert et al., 2016; Defferrard et al., 2016). One study area is building neural networks as an RNN variant that functions on graphs. (Li et al., 2015) extended the GNN model by proposing a brand algorithm of RNN in the original GNN model. A significant pull of works that have attracted fast-ripening goal is the GCNs (Kipf & Welling, 2016), centered on spectral graph theory, which was initiated (Bruna et al., 2013) and then extended by Defferrard et al. (2016) with fast localized convolution. Most neural networks transverse deep to get a unique performance. Recent GNNs that deal with node categorization on graphs are unable to achieve high performance on a variety of data sets because they are shallow networks and tend to concentrate on node-wise scores.

GNNs are becoming famous in multi-layer learning. Wu et al. (2020); Hamilton (2020) However, prior methods have yet to thoroughly investigate these graphical interactions since they have not combined information from several links concurrently. Researchers have proposed to utilize a multi-omics data analysis by embedding multiple knowledge into graph neural networks to solve this problem (Xiao et al., 2023) To buttress the benefit of structural diversity and deep GNN Architectures, GNN model a pipeline with two-stage novel space is proposed by Feng et al. (2023) which aim to generate high performance. In contrast, transferable deep GNN models in a block-wise manner are utilized, Liang et al. (2021) and He et al. (2021) make use of the multilevel embedding framework MILE and a distributed multilevel framework (Dist MILE) for scalable graph embedding. Our proposed keyword search co-occurrence multi-layer graph mining (EKSCOMLGs) considers implementing association based on multi-layer graph embedding and graph neural networks based on multiple knowledge for mining of features network. Thus, Its fundamental is to learn co-occurrence relations between real-world data sets.

Figure 3a considers a scenario where, in a certain community, there are researchers, and recommendations of individuals who have never cooperated seem more valuable. Suppose ten researchers are skilled in different fields and assume there is a talent hunt for a project requiring Mathematicians, Architecture, and Computer Analysis. Since a graph can be used as a pictorial drawing for easy illustration, a social graph mapping based on Co-membership can be used to indicate the model of bringing together information from two or more people who belong to the same community of researcher but different areas of expertise groups (G). Using Fig. 3a to illustrate, where (Red $\alpha$) represents researchers who are well skilled in Mathematician (G1), (Green $\beta$) represents researchers who are well skilled in Architecture (G2), and (Purple $\gamma$) represents researcher who is well skill in Computer Analysis (G3).

A graph neural network representation example is shown in Fig. 3b, where the circles indicate nodes and their functions on the data are represented by the edges, which represent weights or information passing along where certain layers may be hidden. The structural role of the circles of a node can be represented by Red, Green, Purple, and Gold color, When the layer is few, it is called a shallow neural network and when the hidden layer is many, they are called a Deep Neural Network. For the proper execution, there must be a mutual linkage or interest between nodes and edges in Fig. 1.

In this way, we are particularly interested in two research questions: (1) What is the relatedness between nodes and edges within the same community type or different community types using real-world data? (2) Whether the proposed model will perform better using our proposed model? To solve these questions, search engine optimization (SEO) based on content information properties using elements of Multi-layer Graph Embedding (Rossi et al., 2021; Makarov et al., 2021) and Graph Neural Networks have gained helpful information (Wu et al., 2020; Hamilton, 2020). However, a practical keyword search co-occurrence multi-layer graph mining approach (EKSCOMLG) is an NP-complete problem. Thus, the proposed EKSCOMLGs are driven by enhanced multi-layer graph embedding and graph neural networks, which could revolutionize practical keyword search co-occurrence tasks in real-world applications, fully utilizing the network’s capabilities to enhance user experience. The following is a novelty of this paper’s contributions:

An effective keyword search co-occurrence multi-layer graph mining approach is proposed. The proposed method is built on multi-layer graph embedding and graph neural networks with highly adaptive real-world processes to build intelligent solutions.
We performed extensive experiments using four evaluation metrics on distinct data sets against other benchmark methods. Our proposed model shows improved performance and offers the advantage of providing links that guide the classification process, which enhances existing techniques by examining and learning co-occurrence relations, social association, deformity prediction, and recommendation.

The remaining section of the manuscripts is sorted as follows: Sect. 2 describes the preliminary and problem definition, Sect. 3 denote the materials and methods, Sect. 4 denotes the experiment 5 denote the results and discussions, Sect. 6 represents the related works, Sect. 7 is the conclusion.

2 Preliminaries and problem definition

The preliminaries are introduced in this section, including the definitions and notations used (Table 1), and then the problem definition where directed or undirected edges can represent a graph’s real-world network. To introduce the terminology, for a graph G, the node-set is represented by N and the edge-set with E; thus $G=(N, E)$ where N is the vertex or node set of size $n =|N|$, E is the edge list of size $m =|E|$. Note N is defined as a subset $N_u= {\{u_1,u_2,...,u_n}\}$ and $N_v= {\{v_1,v_2,...,v_n}\}$ and a set of edges between this vertex $E = {\{e_{11}, e_{12},...,e_{nn}}\}$ where $e_{u v} ={u_i,v_j}\in E$, $1 \le i,j, \le n$.

Another way to describe graph G is as an adjacency matrix A with $A(u,v)=1$ if $(u,v)\in E$ and 0 otherwise. if $A(u,v) \ne A(v,u)$, G is a directed network, otherwise If the graph is undirected, the matrix $A(u,v)=A(v,u)$ for all nodes $u,v \in N$ is symmetric. If A(u, v) is weighted by $w(u,v) \in W$, $G=(N, E, W)$ is a weighted network; otherwise, it is an unweighted network. An improved graph with vital information from simple graphs can be created using attributed graphs, multi-relational graphs (Hamilton, 2020), and Multi-layer graphs (Kivelä et al., 2014).

Definition 2.1

Simple graphs are expanded into attributed graphs. The node attributes X, and the edge attributes $X^e$ are added to obtain them. For example, $X\in R^{n \times d}$ represents a node feature matrix, and $X^{e}\in R^{ m\times c}$ represents an edge matrix, with $x^e_{u_i,v_j}\in R^c$ representing the vector of an edge $e_{u,v}$.

Definition 2.2

An extension version of basic graphs with edges having many kinds of relations $\tau$ are called multi-relational graphs.$e_{uv}=(u_i,v_j)\in E \rightarrow e_{uv}=(u_i, \tau , v_j)\in E$ is the situation in question. One related adjacency matrix $A^{\tau }$ exists for each edge. It is possible to construct the complete graph as an adjacency tensor $A\in R^n\times r\times n.$ Heterogeneous and multiplex graphs are two sub-types of multi-relational graphs.

Table 1 List of notations

An effective keyword search co-occurrence multi-layer graph mining approach

Abstract

Similar content being viewed by others

Merit: multi-level graph embedding refinement framework for large-scale graph

DNFS: A Digraph Neural Network with the First-Order and the Second-Order Similarity

GraphSAGE++: Weighted Multi-scale GNN for Graph Representation Learning

Explore related subjects

1 Introduction

2 Preliminaries and problem definition

Definition 2.1

Definition 2.2

Definition 2.3

Definition 2.4

Definition 2.5

3 Materials and methods

3.1 Overview of keyword search

3.2 Keyword search using graphs embeddings and multi-layer graphs (MLGs)

3.3 Proposed method

3.3.1 Keyword search co-occurrence model

3.4 Computational complexity of the proposed EKSCOMLGs

3.4.1 Scenario: a keyword by typing a URL and searching the co-occurrence graphs

3.4.2 Multi-layer activity

4 Experiments

4.1 Experimental settings

4.1.1 Data acquisition and descriptions

4.1.2 Data description

4.1.3 Baseline methods

4.1.4 Model parameter settings and training

4.1.5 Hypothesis

4.2 Evaluation metrics

5 Results and discussion

5.1 Performance evaluation of EKSCOMLGs using Wilcoxon rank sum test

5.1.1 Test procedure

5.1.2 Wilcoxon rank sum test using dolphin data set

5.1.3 Wilcoxon rank sum test using kyphosis data set

5.1.4 Wilcoxon rank sum test using supermarket data set

5.1.5 Wilcoxon rank sum test using Zachary’s data set

5.2 Performance using graph machine learning and deep learning algorithm

6 Related work

6.1 Graph theory and vital application

6.2 Keyword search co-occurrence graph

6.3 Multi-layer graph embedding and graph neural networks

7 Conclusions

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval and consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation