1 Introduction

In recent years, with the rapid development of social networking sites (e.g., Twitter, Facebook, Flickr and YouTube), people are available to a large number of real-world networks. Network analysis has attracted considerable attention from more and more researchers in various fields, including sociology [1], biology [2], ecology [3] and engineering [4]. A network can be denoted as a set of nodes with connections between them. Most existing research focuses on a monodimensional network, that is, a network is composed of a single type of relationship (i.e., dimension). However, in many real-world networks, there are also many multidimensional networks where more than one type of relationship (i.e., dimension) exist between nodes. For example, in a social network, various types of relationships such as friend, colleague, schoolmate, romance, neighbor and family may coexist between people.

One fundamental task of network analysis is to explore network structure, that is, to find structural regularities in networks, including assortative structure, disassortative structure [5] and mixture structure [6]. The task of structural regularity exploration is to determine how to partition the nodes of networks into different groups (group partition) and how many groups in networks (group number). In the last decades, a large number of methods have been proposed for group partition or group number. Most of them focus on monodimensional networks, but only a small number of them focus on multidimensional networks. The main reason may lie in that methods for monodimensional networks are not applied to multidimensional networks directly.

A great challenge for structural regularity exploration in a multidimensional network is how to aggregate information of each dimension of the network. Recently, several methods have been proposed for community detection in multidimensional networks. For example, Zhu and Li proposed a unified network transferred from multiplex networks for community detection [7]. Tang and Liu proposed a two-phase strategy for community detection in multidimensional networks, called the principal modularity maximization (PMM) [8]. Boden et al. introduced a multilayer coherent subgraph (MLCS) model and a best-first search algorithm MiMAG to mine coherent subgraphs in multilayer graphs [9]. Mucha et al. presented an extended modularity framework to detect the community structure of arbitrary multiplex networks [10]. All these methods encounter a dilemma that they have to pre-assume the structure type (e.g., the community structure), but this information is, however, usually unavailable before the structures have been found successfully. Recently, several statistical inference-based models concentrate on network analysis and the parameters of models can be used for structure exploration. DuBois presented a generative method for modeling multidimensional networks [11]. Sinkkonen presented an infinite mixture model for multi-relational networks [12]. Xu et al. proposed a multi-relational Gaussian process model to learn multi-relational networks [13]. Although these models have the ability of handling multidimensional networks, they usually perform poorly on many real-world networks for structural regularity exploration.

In order to explore structural regularities in multidimensional networks well without pre-assuming which type of structure they have, in this paper, we propose a novel feature aggregation method based on a mixture model and Bayesian theory, called the multidimensional Bayesian mixture (MBM) model. To get the MBM model, we first extend the Newman’s mixture model (NMM) [5], a model for structural regularity exploration in monodimensional networks, for multidimensional networks, called the multidimensional Newman’s mixture model (MNM), and then extend the MNM model using Bayesian theory. Furthermore, to automatically determine the group number of multidimensional networks, we further extend the MBM model using Bayesian nonparametric theory [14] to a new model, called the multidimensional Bayesian nonparametric mixture (MBNPM) model. Experiments conducted on a number of synthetic and real multidimensional networks show that the MBM model outperforms other related models on most networks and the MBNPM model is comparable to the MBM model. As far as we known, the MBNPM model is one of the earliest models that can determine not only the group partition and but also group number of multidimensional networks.

The remainder of this paper is organized as follows. Section 2 briefly presents related work on the subject. Section 3 presents the MBM model and a Markov chain Monte Carlo (MCMC) method for approximate inference. Section 4 presents the MBNPM model and a MCMC method for approximate inference. Experiments are presented in Sect. 5. Section 6 draws conclusions.

2 Related work

2.1 Community detection in multidimensional networks

Community detection in multidimensional networks needs to aggregate the information of all dimensions. According to the information aggregation way, existing methods for community detection in multidimensional networks may fall into two categories: network aggregation and feature aggregation. The network aggregation methods first aggregate all dimensions into a unified network and then employ existing community detection methods on the unified network. Battiston et al. [15] presented a simple unified network in which two nodes are connected if there exists at least one connection between the nodes in all simplex networks. The method is easy to understand, but it clearly loses most of the multiplex information. Tang et al. [16] described another simple united network, called the network integration, via calculating the average interactions among nodes in multidimensional networks. The limitation of this method is that it treats each dimension as being equivalent, which may lose the difference of dimensions. Aiming at solving this limitation, Zhu and Li [7] proposed a new unified network which takes both the importance of simplex networks and node similarity into consideration. Berlingerio et al. [17] presented a mapping function considering the node neighbors to transform a multidimensional network into a monodimensional one. Wu et al. [18] proposed a co-ranking framework that makes full use of the mutual influence between relations and nodes to transform multi-relational networks to single-relational networks. The feature aggregation methods first extract features or clusters from each dimension and then aggregate them for structure detection. Tang and Liu [8] proposed a principal modularity maximization method to handle community detection in multidimensional networks via a two-phase strategy, where the first phase extracts the structural features from each dimension and the second phase concatenates these features to detect the representative communities. Boden et al. [9] first introduced a multilayer coherent subgraph model to cluster nodes of densely connected in a subset of the graph layers and then used a best-first search algorithm to select the most interesting, non-redundant final clusters. The focus of the method is to identify meaningful groups rather than to provide a group partition for all nodes. Mucha et al. [10] defined interslice connections as connections across different dimensions and extended the modularity to simultaneously model interslice connections and intraslice connections for community detection in a multiscale or multiplex network. This method aims at providing a partition for each dimension of the network rather than for the multidimensional network. Carchiolo et al. [19] further presented a method to improve efficiency of the extended modularity. However, both the network aggregation and feature aggregation methods focus on handling a network with community structure.

2.2 Statistical inference-based models for structural regularity exploration

Several statistical inference-based models have been proposed for handling networks, which may fall into two categories: (1) models special for structural regularity exploration and (2) models designed for network analysis. The models in the first category concentrate on exploring structural regularities in monodimensional networks. Newman and Leicht [5] first introduced a general definition of network structure—a set of nodes that have similar patterns of connection to others, and presented a probabilistic mixture model to explore structural regularities in complex networks. Shen et al. [20] proposed a stochastic block-based model to show clear information on structural regularities in complex networks. Based on the Shen’s model, Chai et al. [6] first proposed a popularity–productivity stochastic block model for general structure detection via introducing popularity and productivity two random variables and then extended it to identify structural regularities in text-associated networks. Chen et al. [21] presented a signed probabilistic mixture model to identify network structure in signed networks, where the model can identify assortative structure in a signed network with only positive links and identify disassortative structure in a signed network with only negative links. The main limitations of these models lie in that they cannot handle multidimensional networks in a direct way and need to pre-specify the group number. The models in the second category concentrate on network analysis, and the parameters of models can be used to explore structure and link prediction. DuBois and Smyth [11] presented a generative method for modeling dyadic events which can be treated as multidimensional networks as each event is denoted by a triplet (sender, recipient, type). Khoshneshin and Street [22] proposed a graphical model which use latent feature networks (LFN) as a framework for multi-relational network analysis. The two network analysis models also suffer from a limitation that specifying the group number is necessary. To allow a model to automatically determine the group number, researchers recently tried Bayesian nonparametric models. Kemp et al. [23] proposed an infinite relational model to learn systems of concepts in relational datasets, where object-feature data can be profitably viewed as a relation between two sets of entities. Sinkkonen et al. [12] presented a simple infinite model for general multi-relational data, where co-occurrences within a single categorical variable can be viewed as multiple relations between two sets of entities. Xu et al. [13] proposed a multi-relational Gaussian process model to learn multi-relational data. Although these models have the ability of handling multidimensional networks, they usually perform poorly on many real-world networks for structural regularity exploration.

3 Multidimensional Bayesian mixture (MBM) model

Generally, a multidimensional network with D dimensions and N nodes can be represented as \(A = \{ A^{(1)} , \ldots ,A^{(d)} , \ldots ,A^{(D)} \}\). For the dth dimension, \(A_{ij}^{(d)} = 1,(1 \le i,\,j \le N)\) if there is a link from node i to node j and 0 otherwise. And we use \(L_{i}^{(d)}\) to denote the out links of node i in the dth dimension (i.e., a set of neighbor edges of node i in an undirected network). In this section, we first extend the NMM model to the MNM model for handling multidimensional networks and then derive the MBM model from the MNM model using Bayesian theory. For convenience, we only introduce them on directed networks in detail.

3.1 Multidimensional Newman’s mixture (MNM) model

A directed monodimensional network with K groups (K is a predefined number) can be generated by the NMM model with two parameters \(u_{ij} \in E_{U}\) and \(\theta_{kj}\), where \(\pi_{k}\) denotes the probability of a node in group k (\(k \in \{ 1,2, \ldots ,K\}\)) and \(\theta_{kj}\) denotes the probability of a link from a node in group k connecting to node j, subjected to the normalization constraint \(\sum\nolimits_{j = 1}^{N} {\theta_{kj} } = 1\). The vector \(\theta_{k}\) represents the characteristic of nodes in group k linking to other nodes. According to \(\theta_{k}\), nodes connecting to other nodes in similar patterns are grouped together. A network is generated in the following way: For each node i and its links \(L_{i}\), (1) node i falls into a group \(z_{i}\) with probability \(\pi_{{z_{i} }}\) and (2) each link \(A_{ij} \in L_{i}\) selects node j with probability \(\theta_{{z_{i} j}}\). The probability of a network A with N nodes can be written as

$$p(A,z|\pi ,\theta ) = p(A,z|\theta ) \cdot p(z|\pi ) = \prod\limits_{i = 1}^{N} {\left( {\pi_{{z_{i} }} \cdot \prod\limits_{j = 1}^{{L_{i} }} {\theta_{{z_{i} j}}^{{A_{ij} }} } } \right)}$$
(1)

where \(z_{i} \in \{ 1, \ldots ,K\}\) is a hidden variable that needs to be inferred. The logarithm of Eq. (1) is as follows:

$$\ln p(A,z|\pi ,\theta ) = \sum\limits_{i = 1}^{N} {\left( {\ln \pi_{{z_{i} }} + \sum\limits_{j = 1}^{{L_{i} }} {A_{ij} \ln \theta_{{z_{i} j}} } } \right)}$$
(2)

Because of hidden variables \(z_{i} ,i = 1, \ldots ,N\), parameters \(\pi_{k}\) and \(1 \le i,\,j \le n\) cannot be estimated using likelihood maximization estimation. An expectation–maximization (EM) [24] algorithm is proposed to estimate them. In the E step, the expectation of Eq. (2) is:

$$\mathop L\limits^{ - } = \sum\limits_{i,k} {q_{ik} \left[ {\ln \pi_{k} + \ln \sum\limits_{j} {A_{ij} \ln \theta_{kj} } } \right]}$$
(3)

with

$$q_{ik} = \frac{{\pi_{k} \prod\nolimits_{j} {\theta_{kj}^{{A_{ij} }} } }}{{\sum\nolimits_{k} {\pi_{k} \prod\nolimits_{j} {\theta_{kj}^{{A_{ij} }} } } }} ,$$
(4)

where \(q_{ik}\) denotes the probability of node i belonging to group k. In the M step, the parameters \(\pi\) and \(\theta\) can be re-estimated by optimizing Eq. (3) as

$$\pi_{k} = \frac{{\sum\nolimits_{i} {q_{ik} } }}{N} ,$$
(5)
$$\theta_{kj} = \frac{{\sum\nolimits_{i} {A_{ij} q_{ik} } }}{{\sum\nolimits_{j} {\sum\nolimits_{i} {A_{ij} q_{ik} } } }} ,$$
(6)

The \(\theta_{kj}\) represents the link preference or feature of a group connecting nodes. In a community structure, nodes with more links connect the nodes of the same group, leading to a group with larger \(\theta_{kj}\) to choose nodes within the group than other nodes; in a bipartite structure, nodes with more links connect across groups, leading to a group with larger \(\theta_{kj}\) to choose nodes in the contrary group; in a mixture structure of community structure and bipartite structure, some groups with larger \(\theta_{kj}\) to choose nodes within the group and other groups with larger \(\theta_{kj}\) to choose nodes in the contrary group. So we use

$$M = \left[ {\begin{array}{*{20}c} {\theta_{11} } & \ldots & {\theta_{k1} } & \ldots & {\theta_{K1} } \\ \ldots & {} & \ldots & {} & \ldots \\ {\theta_{1j} } & \ldots & {\theta_{kj} } & \ldots & {\theta_{Kj} } \\ \ldots & {} & \ldots & {} & \ldots \\ {\theta_{1N} } & \ldots & {\theta_{kN} } & \ldots & {\theta_{KN} } \\ \end{array} } \right]$$

as the network structural features.

The above model has extracted structural features from a monodimensional network. For multidimensional networks, we follow the PMM model [8] to aggregate the structural features of all dimensions as follows

$$M = [M^{(1)} , \ldots ,M^{(d)} , \ldots ,M^{(D)} ] ,$$
(7)

and perform k-means on the M to get the final node partition.

In the NMM model, node i may not fall into group k because of \(u_{{p_{i} }}\) in Eq. (4) and will be wrongly assigned to another group. Specially, when \(q_{i1} = \cdots = q_{iK} = 0\), node i may not fall into any group, which results in that the NMM model fails to handle this network. To avoid this problem, we extend the MNM model using Bayesian theory.

3.2 Multidimensional Bayesian mixture (MBM) model

In a Bayesian setting, the model parameters will themselves be random variables, and prior distributions will be placed over the variables. Currently, the predominant choice is the use of conjugate priors. The conjugate prior for parameters \(\pi\) and \(\theta_{k}\) is the Dirichlet distribution

$$\pi \,\sim\,{\text{Dirichlet}}(\alpha ) ,$$
(8)
$$\theta_{k} \,\sim\,{\text{Dirichlet}}(\beta ) ,$$
(9)

where \(\alpha\) and \(\beta\) are the Dirichlet hyperparameters.

The generative process for a network is summarized as follows:

  1. 1.

    Draw \(\pi\) from \({\text{Dirichlet}}(\alpha )\);

  2. 2.

    For each group k in K groups;

    Draw \(\theta_{k}\) from \({\text{Dirichlet}}(\beta )\);

  3. 3.

    For each new node i:

    1. (a)

      Draw a latent group \(z_{i}\) from \(\pi\);

    2. (b)

      For each link in \(L_{i}\):

Draw an end node j from \(\theta_{{z_{i} }}\).

Then, the probability of a network A with N nodes is (refer to Eq. 1):

$$p(A,z|\alpha ,\beta ) = p(A,z|\pi ,\theta )p(\pi |\alpha )P(\theta |\beta )$$
(10)

Due to the conjugacy between the Dirichlet and Multinomial distributions, Eq. (10) can be simplified as:

$$p(A,z|\alpha ,\beta ) = p(A|z,\beta )p(z|\alpha )$$
(11)

with

$$p(z|\alpha ) = \int {p(z|\pi )} p(\pi |\alpha ){\text{d}}\pi$$
(12)
$$p(A|z,\beta ) = \int {p(A|z,\theta )} p(\theta |\beta ){\text{d}}\theta.$$
(13)

3.3 Inference

The latent variable z of the Bayesian model (Eq. 11) cannot be exactly inferred, but can be approximately inferred by a wide variety of approximate inference algorithms such as Laplace approximation, variational approximation and Markov chain Monte Carlo (MCMC). In this study, we follow Palla et al.’s work [25] to use MCMC for parameter estimation, which follows an iterative procedure to achieve posterior inference over the latent variable z by the Gibbs and slice sampling. The sampler iterates as follows:

3.3.1 Sampling z

Gibbs sampling [26] is used to obtain samples of z. For each node i, given the group assignment for all other nodes, the group probability of the omitted node i choosing group k is as follows:

$$p(z_{i} = k|z_{ i} ,A) \propto \prod\limits_{j = 1}^{{L_{i} }} {\frac{{m_{k, i}^{j} + \beta }}{{m_{k, i} + N\beta + j - 1}}} \cdot \frac{{n_{k} + \alpha }}{N + K\alpha }$$
(14)

where \(n_{k}\) denotes the number of nodes belongs to group k, \(m_{k, i}\) denotes the number of out links from nodes that belong to k except node i and \(m_{k, i}^{j}\) denotes the number of out links from nodes that belong to k except node i to node j. The probability \(p_{ij} \in E_{P}\) that has the same meaning of \(q_{ik}\)(see Eq. 4) avoids the zero problem as \(\frac{{m_{k, i}^{j} + \beta }}{{m_{k, i} + N\beta + j - 1}} > \, \frac{{m_{k, i}^{j} }}{{m_{k, i} + (N - 1)\beta + j - 1}} \ge 0\).

3.3.2 Hyperparameters

We use slice sampling [27] for hyperparameters \(\alpha\) and \(\beta\) and limit them in (0, 1).

The parameter \(m_{k}^{j}\) denotes the number of out links from nodes falling into group k to node j. It plays a similar role as \(\theta_{kj}\) in NMM. So we use

$$M = \left[ {\begin{array}{*{20}c} {m_{1}^{1} } & \ldots & {m_{k}^{1} } & \ldots & {m_{K}^{1} } \\ \ldots & {} & {} & {} & \ldots \\ {m_{1}^{j} } & \ldots & {m_{k}^{j} } & \ldots & {m_{K}^{j} } \\ \ldots & {} & \ldots & {} & \ldots \\ {m_{1}^{N} } & \ldots & {m_{k}^{N} } & \ldots & {m_{K}^{N} } \\ \end{array} } \right]$$

as network structural features.

Algorithm 1 shows the procedure in full. For each dimension, the initial group of each node is randomly assigned, and the data are sampled after the log-likelihood reaches a stationary state. While 20–30 iterations often appear sufficient for convergence, we run 2000 iterations for all networks in the following experiments. For each dimension, the sample with highest log-likelihood corresponds exactly to the expected “true” structural feature.

figure a

The MBM model can also be extended to handle undirected networks like the NMM model. The difference between directed networks and undirected networks just lies in that any link between two nodes is symmetrical in undirected networks while not symmetrical in directed networks. Therefore, in an undirected network, when node i in \(z_{i}\) connects to another node j, node j in \(z_{j}\) also connects to node i. Again, we use \(\varphi_{{z_{i} j}}\) to denote the probability that a node in group \(z_{i}\) choosing to connect to node j. \(A_{ij} \in L_{i}\) is generated according to the multinomial distribution \({\text{Multinomial}}(\theta_{{z_{i} j}} \theta_{{z_{j} i}} )\), where \(\theta_{{z_{i} j}} \theta_{{z_{j} i}}\) denotes the probability of an edge between node i and j, given nodes i and j in groups \(T_{ij}^{(t)}\) and \(z_{j}\), respectively. This probability is normalized by the constraint \(\sum\nolimits_{i = 1}^{N} {\sum\nolimits_{j = 1}^{N} {\varphi_{{z_{i} j}} \varphi_{{z_{j} i}} } } = \left[ {\sum\nolimits_{i = 1}^{N} {\varphi_{{z_{j} i}} } } \right] \cdot \left[ {\sum\nolimits_{j = 1}^{N} {\varphi_{{z_{i} j}} } } \right] = 1\). Since both \(z_{i}\) and \(z_{j}\) vary from 1 to K, \(\sum\nolimits_{i = 1}^{N} {\varphi_{{z_{j} i}} } = \sum\nolimits_{j = 1}^{N} {\varphi_{{z_{i} j}} }\). Thus, \(\sum\nolimits_{j = 1}^{N} {\varphi_{{z_{i} j}} } = 1\) exactly the same as in the directed case. The remainder of the generative process follows as before.

4 Multidimensional Bayesian nonparametric mixture (MBNPM) model

To allow the MBM model to determine the group number, we use a common Bayesian nonparametric method, called the Chinese restaurant process (CRP), for nonparametric distribution of latent groups. The CRP, a vivid metaphor for building a partition of nodes, assigns a new node (i.e., a new customer entering the restaurant) to a new group (i.e., table) or the existing groups (tables) with a probability below

$$p(z_{i} = k|z_{1} , \ldots ,z_{i - 1} ) = \left\{ {\begin{array}{*{20}l} {\frac{\alpha }{i - 1 + \alpha }} \hfill & {k{\text{ is a new group}}} \hfill \\ {\frac{{n_{k} }}{i - 1 + \alpha }} \hfill & { \, n_{k} > 0} \hfill \\ \end{array} } \right. ,$$
(15)

where \(n_{k}\) denotes the number of customers already assigned to table k and \(\alpha\) is a hyperparameter. The characteristics of the CRP are as follows: (1) nodes tend to fall into popular groups and make the popular groups more popular, and (2) new nodes always have a chance to fall into new groups. These characteristics allow the group number K to approach infinity, but only a finite number of groups are used to generate the observed network.

In MBNPM, the generative process for a network is summarized as follows:

  1. 1.

    Draw \(\pi\) from \({\text{CRP}}(\alpha )\);

    Draw a latent group \(z_{i}\) from \({\text{CRP}}(\alpha )\);

  2. 2.

    For each group k in \(\infty\) groups;

    Draw \(\theta_{k}\) from \({\text{Dirichlet}}(\beta )\);

  3. 3.

    For each new node i:

    1. (a)

      Draw a latent group \(z_{i}\) from \(\pi\);

    2. (b)

      For each link in \(L_{i}\):

Draw an end node j from \(\theta_{{z_{i} }}\).

Then, the probability of a network A (refer to Eq. 11) is:

$$p(A,z,K|\alpha ,\beta ) = p(A|z,\beta )p_{\rm CRP} (z,K|\alpha ).$$
(16)

The hyperparameter \(\alpha\), the priori probability of \(\pi\), impacts on the number of groups. Although the group number is potentially infinite, the CRP gives an extremely uneven distribution over groups and ensures that the number of groups K is much smaller than the number of nodes N with an appropriate small value \(\alpha\). The hyperparameter \(\beta\), the prior probabilities of \(\theta\), describes the degree distribution of a node within groups.

In the process of inference, the group probability of the omitted node i choosing group k is changed as (refer to Eq. 14):

$$p(z_{i} = k|z_{ i} ,A) \propto \prod\limits_{j = 1}^{{L_{i} }} {\frac{{m_{k, i}^{j} + \beta }}{{m_{k, i} + N\beta + j - 1}}} \cdot \frac{{F(n_{k} ,\alpha )}}{N + K\alpha }$$
(17)

where \(F(n_{k} ,\alpha ) = n_{k}\), if \(n_{k} > 0\); otherwise \(F(0,\alpha ) = \alpha\), meaning that a new group is generated.

Algorithm 2 shows the procedure in full. The initialization of K can be any positive integer. For the aggregated structural features M, we adopt the density peaks (DP)-based method [28] instead of K-means to automatically determine the group number and group partition.

figure b

5 Experiments

To investigate the effectiveness of the MBM and MBNPM models for structural regularity exploration in multidimensional networks, we compare them with other state-of-the-art methods on nine synthetic and real multidimensional networks, including a synthetic network and a real network with community structure, a synthetic network with tripartite structure, a real network with bipartite structure, and a synthetic network and four real networks with mixture structure. The nine networks are all multidimensional networks with gold standard structures we can collect from other studies. The detailed information of these networks is shown in Table 1.

Table 1 Detailed information of the nine networks used in our study

To test the performance of the MBM and MBNPM models, we compare them with seven models for multidimensional networks, including three network aggregation methods (i.e., the Unify-Infomap (UI), Unify-OSLOM (UO) and Unify-Louvain (UL) proposed by [7]), a feature aggregation method (i.e., the PMM proposed by [8]), two statistical inference-based models (i.e., the Marginal Product Mixture Model (MPMM) proposed by [11] and the simple relational component model (SRCM) proposed by [12]) and MNM. The source code of the PMM model has been released by the authors. We use the released source codes in this study and implement UI, UO, UL, MPMM, SRCM and MNM models by ourselves. For the PMM, MNM, MPMM and MBM models that require a pre-specified group number, we adopt the gold standard. As the PMM, MNM and MBM may converge to local optima, we run each model ten times and take the solution giving the highest evaluation.

The performance of all models is measured by the normalized mutual information (NMI) [29], which is widely used for evaluating the structure detection:

$$P_{nmi} (G,G^{\prime } ) = \frac{{2MI(G,G^{\prime } )}}{{H(G) + H(G^{\prime } )}} ,$$
(18)

where \(G = (G_{1} ,G_{2} , \ldots ,G_{K} )\) are defined groups in a network, \(G^{\prime } = (G_{1}^{\prime } ,G_{2}^{\prime } , \ldots ,G_{K}^{\prime } )\) are groups detected by an algorithm, \(H(G)\) and \(H(G^{\prime } )\) are the entropies of G and \(G^{\prime }\) and \(MI(G,G^\prime )\) is the mutual information between them. A high P nmi means a good detection. Specially, P nmi  = 1 means a perfect detection.

Before the comparison, we analyze the convergence of the Bayesian mixture model and nonparametric Bayesian mixture model on a monodimensional network. Figure 1 illustrates examples of the two models on the Syn-com four dimensions shown in Fig. 2a–d (described in the following section). As can be seen, the two models converge to a stationary state within a few iterations on all networks, which means they can quickly explore the structural features. Compared with the nonparametric Bayesian mixture model, the Bayesian mixture model shows more stable log-likelihoods on the corresponding four dimensions, which means it explores clearer structural features with pre-specified group numbers. In addition, both the two models show more stable log-likelihoods in the former one dimension than the latter three dimensions as it is composed of clearer structure.

Fig. 1
figure 1

The log-likelihood of the Bayesian mixture model and nonparametric Bayesian mixture model on the Syn-com four dimensions. ad Bayesian mixture model, eh nonparametric Bayesian mixture model

Fig. 2
figure 2

The adjency matrix of multidimensional networks. ad Syn-com network, eh Syn-tri network, il Syn-mix network

5.1 Multidimensional networks with assortative structure

Syn-com and Ckm, two multidimensional networks with community structure, are used to test the capability of our models on assortative structure detection. The Syn-com is a common undirected synthetic dataset for community detection in multidimensional networks [7, 8]. The network has three groups, with each having 50, 100 and 200 nodes, respectively. It is composed of four dimensions. For each dimension, edges within groups are randomly generated with an interaction probability and noise edges randomly connecting two nodes are generated with low probability. As shown in Fig. 2a–d, each dimension demonstrates partial communities and four dimensions demonstrate the whole communities. Ckm [30] is a real undirected multidimensional network concerning the impact of 246 physicians adoption of a new drug. The network contains three dimensions which are generated according to three sociometric questions for each doctor: To whom did he most often turn for advice and information? With whom did he most often discuss his cases in the course of an ordinary week? Who were the friends, among his colleagues, whom he saw most often socially? All physicians come from four towns in Illinois: Peoria, Bloomington, Quincy and Galesburg, which form a community structure. It is a sparse multidimensional network without any edge noise. Table 2 shows the P nmi results when applying nine models on the nine networks, where the numbers in bold are the best P nmi s on the networks. As can be seen, on the Sym-com network, both MBM and MBNPM models achieve a perfect P nmi of 1, that is all nodes are partitioned into the correct groups, which are the same as the MNM model and outperform other models; on the Ckm network, the MBM achieves a P nmi of 0.9385 and the MBNPM achieves a P nmi of 0.7568, which are inferior to the UI model but outperform other models. We further analyze the UI model which achieves a P nmi of 0 and 0.9628 on the Syn-com and Ckm networks, respectively. The UI is a model based on multilevel compression of random walks, which is sensitive to network noise. The Syn-com is accompanied with edge noise in each dimension and the Ckm is a network without any edge noise. So, it is reasonable to us that the UI model makes a tremendous difference on the two networks. Overall, the MBM and MBNPM models show comparable performance with other four models on the multidimensional networks with assortative structure.

Table 2 P nmi s identified by nine models on the nine multidimensional networks

5.2 Multidimensional networks with disassortative structure

Syn-tri and Elite, two multidimensional networks with multipartite structure, are used to test the capability of our models on disassortative structure detection. The Syn-tri is an undirected synthetic multidimensional network with tripartite structure to test the MBM model. The network has 150 nodes, equally dividing to three groups. It is composed of four dimensions, three of which demonstrates a bipartite structure and one of which demonstrates a tripartite structure shown in Fig. 2e–h. Elite [31] is a real undirected multidimensional network describing the relation between 3810 persons and 937 administrative bodies in the Dutch government. The network contains four dimensions characterizing four major relations: chairman, vice-chairman, treasurer and lid/member. It is a bipartite network with two groups: person and administrative body. As shown in Table 2, on the Syn-tri network, both MBM and MBNPM models achieve a perfect P nmi of 1, which are the same as the MNM model and outperform other models; on the Elite network, the MBM achieves a P nmi of 0.3467 and the MBNPM achieves a P nmi of 0.1248, which are higher than other models by at least 0.3144 and 0.0925, respectively. The reason why the UI, UO, UL, PMM and MPMM models are inferior to the MBM and MBNPM models is that they have pre-assumed the community structure on the two networks, whereas the MBM and MBNPM models have not.

5.3 Multidimensional networks with mixture structure

In addition to assortative structure and disassortative structure, other types of structures also exist in multidimensional networks. Syn-mix, Lazega [32], Gd99 [33], Bay [34] and Email [35], five multidimensional networks with mixture structure, are used to test the capability of our models on mixture structure detection. The Syn-mix is an undirected synthetic multidimensional network with a mixture of community structure and bipartite structure. It contains 150 nodes, equally dividing to five groups, two of which form a bipartite structure and other three of which form a community structure. The network is composed of four dimensions as shown in Fig. 2i–l. The Lazega is a real directed multidimensional network collected from a study of corporate law partnership between 71 attorneys which was carried out in a Northeastern US corporate law firm. The network is composed of three dimensions, including strong-coworker network, advice network and friendship network. They are, respectively, generated from the following three questionnaires: Would you go through this list and check the names of those with whom you have worked with? Think back over the past year, to whom did you go for basic professional advice? Would you go through this list, and check the names of those you socialize with outside work? All attorneys are divided into two statuses: partner and associate, forming a mixture structure. The Gd99 is a real undirected network characterizing 234 characters and their relations in the long-running German soap opera called “Lindenstrasse.” The network contains four dimensions describing four relations: family, business, friendship and partner. All characters are labeled by eight colors: yellow, green, blue, pink, white, orange, gray and black. The Bay is real directed foodweb networks describing a 125-component budget of the carbon exchanges occurring during the wet and dry seasons in Florida Bay. Nodes in these networks represent major components of the ecosystem, and edges represent the transfers of material or energy between the components. All the 125 nodes are categorized into seven species: primary producers, invertebrates, fishes, birds, reptiles, mammals and detrital compartments. The Email is a real undirected networks extracted from the emails of the authors of this article. It is described by four different types of relations: authorship, recipient, copy and sameday. All emails are categorized into 13 groups: administration, archives, articles, corbeillesauvegarde, enseignement, inbox, mailinglists, personnel, projets, recherche, sent, spam and trash. Table 2 illustrates the results identified by the nine models. As can be seen, the MBM model achieves the best P nmi s on the Syn-mix, Lazega, Bay and Email networks and the MBNPM model achieves the best P nmi s on the Syn-mix and Email networks. Specifically, On the Syn-mix network, both MBM and MBNPM models achieve a perfect P nmi of 1, which are the same as the MNM model and outperform other models; on the Lazega network, the P nmi of the MBM model is 0.5699, which is higher than other models by at least 0.0405, and the P nmi of the MBNPM model is 0.2521, which is inferior to the UO and MBM models but is superior to UI, UL, PMM, MNM, MPMM and SCRM models; on the Gd99 network, the P nmi s of the MBM and MBNPM models are 0.1942 and 0.1728, respectively, which are inferior to the UI, UL and MNM models but are superior to other models; on the Bay network, the P nmi of the MBM model is 0.4726, which is higher than other models by at least 0.0283, and the P nmi of the MBNPM model is 0.3419, which is inferior to the UL, PMM, MNM and MBM models; on the Email network, the P nmi of the MBM is 0.4078, which is higher than other models by at least 0.0599, and the P nmi of the MBM is 0.3905, which is higher than other models by at least 0.0426. Overall, the MBM and MBNPM models show comparable performance with other four models on the multidimensional networks with mixture structure.

5.4 Robust analysis

We also compare the nine models on synthetic multidimensional networks except a certain dimension for robust analysis. (C1, C2, C3, C4), (T1, T2, T3, T4) and (M1, M2, M3, M4) are used to denote the Syn-com, Syn-tri and Syn-mix networks except the first, second, third and fourth dimension, respectively. Table 3 illustrates the results identified by the nine models on the 12 multidimensional networks, where numbers in bold are the best P nmi s in the networks. It can be seen that the MBM model achieves the best P nmi s on all networks except the C1 and M1, and the MBNPM model achieves the best P nmi s on all networks except the C4 and T1. On the C1 and M1, the MBM model is inferior to the MBNPM but superior to other models; on the C4 and T1, the MBNPM is inferior to the MBM but superior to other models. The reason why the MBM fails to achieve a P nmi of 1 on the C1 and M1 is that the structure of these networks is not clear; for example, the bipartite structure is nonexistent in M1. Overall, the MBM and MBNPM models outperform other models on all networks.

Table 3 P nmi s identified by nine models on the three synthetic multidimensional networks except a certain dimension

6 Conclusion

In this paper, we propose a novel feature aggregation method based on a mixture model and Bayesian theory for multidimensional networks without pre-assuming the structure type, called the multidimensional Bayesian mixture (MBM) model. We also further extend the MBM model using Bayesian nonparametric theory to automatically determine the group number, called the multidimensional Bayesian nonparametric mixture (MBNPM) model. Experiments conducted on a number of synthetic and real multidimensional networks show that the MBM and MBNPM models (1) show comparable performance with other state-of-the-art methods in networks with community structure or mixture structure and (2) outperform other methods in networks with disassortative structure. As future work, we will apply our models to structural regularity exploration on real networks and seek possible applications.