Keywords

1 Introduction

In a social network, each user has a series of labels used to describe their characteristics called user attributes. However, for a certain type of attributes, they are not flat but hierarchical. The most existing methods [4, 5] mainly focus on the single-level attribute inference and it will bring some problems for hierarchical structures as shown in Fig. 1. Even though utilizing the same method for every single-level, the attributes of different level may be conflicted for the same user, attributes at the same level may be indeterminate, and the results of a certain layer may be missing.

Fig. 1.
figure 1

Problems of labeling in real social networks

In this paper, we propose a multi-level inference model named IWM to solve the problems mentioned above. This model can infer hierarchical attributes for unknown users by collecting attributes from nearby users under maximum entropy random walk. Meanwhile, we propose a correction method based on the predefined hierarchy of attributes to revise the results. Finally, we conduct the experiments on real datasets to validate the effectiveness of our method.

The rest of the paper is organized as follows. Section 2 defines the problem. Section 3 proposes the multilevel inference model. Algorithm is given in Sect. 4. The experimental results and analysis are presented in Sect. 5. The related works are introduced in Sect. 6. Finally, we conclude this paper in Sect. 7.

2 Problem Definition

2.1 Semantic Tree

The semantic tree T is a predefined structure which is semantically exists used to describe the hierarchical relationship between different user attributes. We use \(T_g\) to represent the user attributes at T’s gth layer.

2.2 Labeled Graph

Labeled graph is a simple undirected graph, denoted as \(G=(V,E,T,L)\), where V is the set of vertices and E is the set of edges. T is the semantic tree of attributes in G. L is a function mapping V to a cartesian product of the attributes in T defined as \(L:V \rightarrow T_1 \times T_2 \times \cdots \times T_m\), where m is the depth of T.

Problem Statement: Given a labeled graph G(VETL) and labeled vertices set \(V_s \subset V\), where \(V_s\) is the set of vertices with complete attributes. So for every vertex \(v_s \in V_s, L(v_s) = \{l_1,l_2,\cdots ,l_m\}\), where \(l_1 \in T_1, l_2 \in T_2, \cdots , l_m \in T_m \). The input of the problem is \(L(v_s)\) for every vertex \(v_s \in V_s\) and the output is \(L(v_u)\) for every vertex \(v_u \in V_u\), where \(V_u=V-V_s\).

3 Attribute Inference Model

Our attribute inference model can be divided into two parts. The first part is called the information propagation model. Based on the maximum entropy theory and one step random walk, vertices in \(V_s\) spread their own attributes to other vertices layer by layer. The second part is a correction model based on the semantic tree. This model realizes the mutual correction between different layers of attributes. These two models are described in detail below.

3.1 Information Propagation Model

The information propagation model is an extension of the model proposed in [7]. The main idea is that the higher the entropy value of the vertex, the stronger the uncertainty of its own user attributes, so more information should be collected. The attributes of \(v_j\)’s each layer can be represented by \(L_g(v_j)=\{l_x,w_x(v_j),l_x\in T_g\}\). Then the entropy value of \(v_j\)’s gth layer \(H_g(v_j)\) can be calculated as blow.

$$\begin{aligned} H_g(v_j)=-\sum _{l_x\in T_g}w_x(v_j) \times \ln w_x(v_j) \end{aligned}$$
(1)

If \(v_i\) is a neighbor of \(v_j\), then the transition probability \(P_g(v_i,v_j)\) from \(v_i\) to \(v_j\) at gth layer is computed as follows.

$$\begin{aligned} P_g(v_i,v_j)=\frac{H_g(v_j)}{\sum _{v_j \in N(v_i)}H_g (v_j)} \end{aligned}$$
(2)

Where \(N(v_i)\) is the set of neighbors of \(v_j\).

Next, we use the following equation to normalize the attribute probability obtained by different vertices.

$$\begin{aligned} w_x(v_j)=\frac{\sum _{v_i\in N(v_j)}P_g(v_i,v_j)\times w_x(v_i)}{\sum _{l_y\in T_g}\sum _{v_i\in N(v_j)}P_g(v_i,v_j)\times w_y (v_i)} \end{aligned}$$
(3)

\(L_g(v_j)\) will be updated through \(w_x(v_j)\). In this way, the attribute information is spread hierarchically in the graph.

3.2 Attribute Correction Model

The formal definitions of the concepts involved in this section are given below.

Definition 1

Define the following relationships in the semantic tree:

  1. (1)

    If \(x_2\) is a child node of \(x_1\), then \(x_1,x_2\) have a relationship called \(Child(x_1,x_2)\).

  2. (2)

    Say that \(x_1\), \(x_2\) have a descendant relationship called \(Descendant(x_1, x_2)\),if \(Child(x_1,x_2)\cup \exists x_3 (Child(x_1,x_3)\cap Descendant(x_3,x_2))\).

  3. (3)

    If \(x_2\) is a brother node of \(x_1\), then \(x_1,x_2\) have a relationship called \(Brother(x_1,x_2)\).

Definition 2

(Descendant vertex set). For a node \(x_1\), its descendant node set is defined as \(DesSet(x_1)=\{x|Descendant(x_1,x)\}\).

Definition 3

(Brother vertex set). For a node \(x_1\), its brother node set is defined as \(BroSet(x_1)=\{x|Brother(x_1,x)\}\).

For the attribute \(l_x\) in the middle layer of the semantic tree, its existence depends on both Parent(x) and DesSet(x), so \(w_x(v_j)\) can be corrected by Eq. (4).

$$\begin{aligned} w_x(v_j)=w_{Parent(x)}(v_j)\times \frac{(1-\alpha )\times w_x(v_j)+\alpha \times \sum _{y\in DesSet(x)}w_y(v_j)}{\sum _{z}(1-\alpha )\times w_z(v_j)+\alpha \times \sum _{y\in DesSet(z)}w_y(v_j)} \end{aligned}$$
(4)

where \(z\in BroSet(x)\) and \(\alpha \) represents a correction strength. When the value of \(\alpha \) is large, the result is inclined to the hierarchy of the semantic tree, otherwise, it is more inclined to the information collected by propagation.

There is another case that the highest layer attributes don’t have any child node, so they can be corrected as follows.

$$\begin{aligned} w_x(v_j)=w_{Parent(x)}(v_j)\times \frac{w_x(v_j)}{\sum _{z\in BroSet(x)}w_z(v_j)} \end{aligned}$$
(5)

4 Attribute Inference Algorithm

4.1 Algorithm Description

The detailed steps of the algorithm are shown in Algorithm 1. Firstly, we use Eq. (1) to calculate entropy \(H_g(v_u)\) for all \(v_u\in V_u \) layer by layer (line 1 to 3). Line 4 to 9 start inferring hierarchically. After all layers’ information are collected, correction can be performed by Eq. (4) or Eq. (5) (line 10 to 11).

figure a

The algorithm terminates when the convergence is satisfied. The condition of convergence is given by the following equation.

$$\begin{aligned} \sum _{v_u \in V_u} \sum _{l_x \in T} |diffw_x(v_u)| \le |V_u| \times |T| \times \sigma \end{aligned}$$
(6)

where \(diff(w_x (v_u))\) is the difference on \(w_x(v_u)\) after the inference algorithm is executed, and \(\sigma \) is a threshold to control the number of iterations.

4.2 Time Complexity

We assume that the labeled graph G has n vertices and p attributes, the semantic tree has m layers. So the time complexity of information propagation is \(O(m|V_u|+mnd+pnd)=O(mnd+pnd)\), where d is the average degree of all the vertices in G. After that, we need to modify every attribute for each user by the complexity of O(pn). To sum up, the total time complexity of our algorithm for one iteration is \(O(mnd+pn)\).

5 Experiment

The experiments are performed on a Windows 10 PC with Intel Core i5 CPU and 8 GB memory. Our algorithms are implemented in Python 3.7. The default parameter values in the experiment are \(\alpha =0.5\), \(\sigma =0.0001\).

5.1 Experimental Settings

Dataset. We will study the performance on DBLP dataset. DBLP is a computer literature database system. Each author is a vertex and their research field is used as the attributes to be inferred. We extract 63 representative attributes and define a 4-layer semantic tree in advance.

Baselines and Evaluation Metrics. We compare our method IWM with three classic attribute inference baselines which are SVM, Community Detection (CD) [6] and Traditional Random Walk (TRW) [7].

We use five commonly metrics to make a comprehensive evaluation of the inference results. The calculation method of these metrics are shown below.

$$\begin{aligned} Precison = \frac{\sum _{l \in T}|\{v_u|v_u\in V_u\wedge l\in Predict(v_u)\cap Real(v_u)\}|}{\sum _{l \in T}|\{v_u|v_u\in V_u\wedge l\in Predict(v_u)\}|}\end{aligned}$$
(7)
$$\begin{aligned} Recall = \frac{\sum _{l \in T}|\{v_u|v_u\in V_u\wedge l\in Predict(v_u)\cap Real(v_u)\}|}{\sum _{l \in T}|\{v_u|v_u\in V_u\wedge l\in Real(v_u)\}|} \end{aligned}$$
(8)
$$\begin{aligned} F_1 = \frac{2\times Precision \times Recall}{Precision + Recall} \end{aligned}$$
(9)
$$\begin{aligned} Accuracy = \frac{1}{|V_u|} \times |\{v_u|v_u\in V_u \wedge Predict(v_u)=Real(v_u)\}|\end{aligned}$$
(10)
$$\begin{aligned} Jaccard = \frac{1}{|V_u|} \times \sum _{v_u \in V_u} \frac{|Predict(v_u) \cap Real(v_u)|}{|Predict(v_u) \cup Real(v_u)|} \end{aligned}$$
(11)

where \(Predict(v_u)\) and \(Real(v_u)\) respectively represent the inference result set and real original attribute set of \(v_u\). For all metrics, the larger value means the better performance.

5.2 Results and Analysis

Exp1-Impact of Vertex Size. We conduct the first experiment in coauthor relationship networks with 5, 000, 10, 000, 20, 000, and 40, 000 vertices. The proportion of unknown vertices is \(30\%\) (Table 1).

Table 1. Inference performance on different vertex size.

It is obvious that our method shows the best performance on different evaluation indicators. For examplewhen it comes to a 20,000 vertices network, our model improves over the strongest baseline \(22.2\%\), \(35.1\%\), \(16.3\%\) and \(6.3\%\) on Precision, F1, Accuracy, and Jaccard index, separately. In terms of recall, our method does not have obvious advantages over TRW.

Exp2-Impact of the Proportion of Unknown Vertices. In Exp2 the vertex scale of the network is 20, 000 and we set the unlabeled scale \(10\%\), \(20\%\), \(30\%\), and \(50\%\) respectively.

Fig. 2.
figure 2

Inference performance on different proportion of unknown vertices

We can analyze the results to get that as the proportion of unknown vertices increases, the decline tendency of our method is much slower than other methods. It is interesting to see that the five evaluate indicators of our method are \(71.77\%\), \(72.17\%\), \(71.96\%\), \(64.21\%\) and \(65.43\%\) at the condition of \(50\%\) vertices lack of attributes which can show that it has a great value in practical applications.

Exp3-Real Case Study. In Table 2 we present partial results of the experiment which gives a clear comparison between our method and TRW. We use these examples to demonstrate the effectiveness of our method.

Table 2. Comparison of inference results by TRW and IWM.

For Chris Stolte, IWM can complement the missing information which can’t be inferred by TRW. For Marcel Kyas, our method modify the error information on Layer 3 and obtain the correct result. TRW causes indeterminacy problem on Layer2 of William Deitrick, while IWM can select more relevant attributes. However, for V. Dhanalakshmi, due to its special structure, when most of the collected information is interference, IWM can’t make correct inference either.

6 Related Work

There has been an increasing interest in the inference of single-layer user attributes over the last several years.

Firstly, based on resource content there are [1, 11] which utilize the user’s text content for inference. [3] constructs a social-behavior-attribute network and design a vote distribution algorithm to perform inference. There are also methods based on the analysis of graph structure such as Local Community Detection [6] and Label Propagation [12]. [10] discovers the correlation between item recommendation and attribute reasoning, so they use an Adaptive Graph Convolutional Network to joint these two tasks. However, these methods don’t explore the relationship existing in the attribute hierarchy, which will greatly reduce the effectiveness in our multilevel problem.

Another method is to build a classifier to treat the inference problem as a multilevel classification problem. [2] trains a binary classifier for each attribute. [8] trains a multi-classifier for each parent node in the hierarchy. [9] trains a classifier for each layer in the hierarchical structure, and use it in combination with [8] to solve the inconsistency. However, classifier-based approaches have a high requirement for data quality. It will make the construction of the classifier complicated and the amount of calculation for training is huge.

7 Conclusion

In this paper, we study the multilevel user attribute inference problem. We first define the problem and propose the concept of semantic tree and labeled graph. We present a new method to solve this problem. The information propagation model is proposed to collect attributes for preliminary inference. The attribute correction model is proposed to conduct a cross-level correction. Experimental results on real-world data sets have demonstrated the superior performance of our new method. In future work, we will improve our method for multi-category attributes and do more works on optimizing the algorithm to save more time.