Keywords

1 Introduction

In the recent years, with the outbreak and epidemic of coronavirus disease 2019 (COVID-19), early detection of infectious diseases with COVID-19 becomes a research hotspot in computational epidemiology and complex network science. The outbreak of COVID-19 has the characteristics of unpredictability, rapid spread, wide infection range, and difficulty to control. It caused a huge loss of life and property in human society. Usually, models of infectious diseases include considering the overall behavior of the population, or interpersonal relationships in the contact network. In our work, we intend to apply the social network analysis on contact networks and combine the contact network with the epidemic to discuss how to build a mode to detect the susceptible, and it is a meaningful and useful way to reduce the damage of COVID-19.

COVID-19 is a pandemic respiratory illness spreading from person-to-person caused by a novel coronavirus and poses a serious public health risk, it is available to be modeled as an SEIR model (which is a type of the disease transmission model) [1]. SEIR model contains susceptible (S), exposed (E), infectious-asymptomatic (IA), and recovered (R). Based on the existing study, the susceptible state transfers to the latent state with the infection probability per unit time β, and the exposed state transfers to the infection state with the infection probability per unit time γ [2]. Therefore, designing effective methods for the detection of susceptible is an effective means of preventing and controlling COVID-19. Figure 1 is an explanation of SEIR in contact network.

Fig. 1
A network of disease transmission. A virus carrier at the center is connected to 4 direct contact or susceptible nodes, which connect to some common nodes and 1 susceptible node.

Simulation of SEIR model in a contact network

In this work, we focus on designing a mode of detecting some communities in the hospital contact network and mining some key nodes in these communities, and we utilize the social network analysis method in this work.

2 Related Work

Social network analysis is an analytical method for the detailed study of human communication, interaction, and other complex behavioral patterns [3]. It is used to study individuals/organizations and their connection structures. Social network data is usually presented in the form of a social graph (shown in Fig. 1), where nodes represent individuals and edges represent relationships between individuals, where relationships may be personal friend relationships, contact records with other individuals, and so on. At present, there is a lot of research work on social networks and diseases. Ref. [4] applied social network analysis to the informative interactions of doctors, nurses, health professionals, and clerks in the social network of the Australian Metropolitan Teaching Hospital. This research suggests that social network analysis can be used to examine complex medication advice-seeking interactions among hospital ward staff, providing useful quantitative baseline data for comparing the impact of interventions on interactions. Ref. [5] applied a modified SEIR compartmental mathematical model for prediction of COVID-19 epidemic dynamics incorporating pathogen in the environment and interventions. The results of the study show that quarantine of contacts and isolation of cases is useful for halting the spread of COVID-19.

3 Model for Detection of the Susceptible in Hospital Contact Network

3.1 Problem Statement

Usually, there are several roles are included in a hospital contact network, such as: doctors, nurses, managers, and patients. The patient will carry the virus of some infectious diseases (COVID-19), so some nurses, doctors, and other staff are possible to become virus carriers (exposed, susceptible) in the process of multiple contacts (or prolonged contact) with the patient, and will carry the virus to more staff who don’t contact with patients directly (other nurses, doctors, or managers). Therefore, the purpose of this paper is to design a model that can detect staff who have high contact frequency with patients, long contact time, and more contact with other staff in their respective office areas. For these kinds of staff, conduct stricter epidemic prevention tests on them, to reduce the possibility of virus transmission.

3.2 Detection of the Susceptible in Hospital Contact Network

To understand our method well, at the beginning of this subsection, we will briefly introduce some basics of social network analysis methods that we will apply in our model.

Cliques Percolation Method (CPM) [6]. The CPM is used to discover overlapping communities, and a clique is a collection of nodes where any two nodes are connected, i.e., a complete subgraph. The nodes in the community are closely connected, the edge density is high, and it is easy to form a clique. Therefore, edges within communities are more likely to form large complete subgraphs, while edges between communities are almost impossible to form large complete subgraphs, so that communities can be discovered by finding cliques in the network. A k-clique represents a complete subgraph with k nodes in the network. If a k-clique overlaps with another k-cliques by k − 1 nodes, we will regard the two k-cliques as connect to each other. The set of all connected k-cliques is a k-clique community.

PageRank [7]. PageRank algorithm, also known as web page ranking algorithm, is a technology that is calculated by search engines according to the mutual hyperlinks between web pages (nodes) to reflect the relevance and importance of web pages (nodes). The main idea is: (1) if a web page is linked to by many other web pages, it means that this web page is more important, that is, its PageRank value will be relatively high. (2) if a page with a high PageRank value links to other pages, the PageRank value of the linked page will increase accordingly.

In a social network, we replace the concept of a “web page” with a “node” and calculate the importance of that node in the network. The formula is as follows:

$${\text{PR}}\left( u \right) = \mathop \sum \limits_{{v \in B_{u} }} \frac{{{\text{PR}}\left( v \right)}}{L\left( v \right)}$$
(1)

Among them, \(u, v\) represent the nodes in the graph, \(B_{u}\) represents the node connected to the node \(u\), and \(L_{v}\) represents the number of neighbor nodes of the node \(v\).

Our Method

In our method, in order to detect the community and the high-influence nodes inside, we use CPM method to mining overlapping community, then we apply the PageRank method on four different roles contact network (nurse, patient, doctor, and manager), respectively, get the top-N node list, then to find the nodes which not only belong to top-N node list but also community.

Figure 2 explains our goal in detail. At first, we will detect the communities in the contact network; then we will rank the nodes in sub-contact networks with different roles (nurse, doctor, patient, and manager); finally, we will search the nodes (from top-N node lists) in the communities which contains patients.

Fig. 2
An illustration of susceptibility detection in a community. Communities are detected in the contact network for virus carriers and then nodes influencing ranking in different networks is used to search top n nodes in the community.

Overview of our model

The procedure of our method is described as the following pseudocode:

Algorithm

  1. 1

    Input total contact graph G = (V, E), nurse contact graph Gn = (V, E), patient contact graph Gp = (V, E), doctor contact graph Gd = (V, E), manger contact graph Gm = (V, E), k, n, RS

  2. 2

    Initial Community Set C, Top-n node lists of contact network AllNodeList, nurse network NurseList, patient network PatientList, doctor network DoctorList and manager network ManagerList

  3. 3

    C ← CPM (G, k)

  4. 4

    NurseList, PatientList, DoctorList, ManagerList, AllNodeList ← PageRank(Gn, n), PageRank(Gp, n), PageRank(Gd, n), PageRank(Gm, n), PageRank(G, n)

  5. 5

    for each c in C:

  6. 6

    \(\quad \text{for each node in c:}\)

  7. 7

    \(\qquad \text{ if node }\in\) NurseList || Patient || DoctorList || ManagerList:

  8. 8

    \(\qquad \quad\text{RS} \leftarrow\text{node}\)

  9. 9

    Return RS

4 Experiment and Results Analysis

4.1 Dataset

This dataset is a contacts network between patients, patients, and healthcare workers (nurse, doctor, and manager) and among healthcare workers in a hospital ward within 4 days in Lyon, France [5]. The study included 46 healthcare workers and 29 patients. In our work, we only focus on the first day data and take every 1 min as a count unit. In our experiment data, there are 18 patients and 36 healthcare workers.

4.2 Results and Analysis

The results of k-cliques community with different k are given as Table 1. It is observed the highest value of EQ will be obtained when k = 6. So in our work, we will select the 6-cliques community for our study.

Table 1 Results of k-cliques community

The node set of 6-cliques community is given as Table 2, and the top-8 node lists in different networks are given in Table 3.

Table 2 Node list of 6-cliques community
Table 3 Top-8 node lists

According to the results in Tables 2 and 3, we can get a group with the highest probability of spreading the disease, which contains 5 patients, 1 nurse, and 1 doctor, and the nurse and doctor are top 1 in their respective role networks. Hence, No. 33 and No. 35 should be paid more attention and strictly controlled. The detail shows in Fig. 3.

Fig. 3
An illustration of communities with disease transmission probabilities. The circles in a group are labeled D 33, P 44, P 49, P 48, P 51, P 46, and N 35.

Group of the highest probability of spreading COVID-19

5 Conclusion

Overall, since COVID-19 can be modeled as a SEIR model, an effective way to stop the spread of the virus is to detect and prevent susceptible individuals as early as possible. Based on this premise, this paper uses CPM community detection algorithm to detect the contact network of hospitals to obtain susceptible communities and analyzes the independent contact network of different roles by PageRank to obtain high-influence communities and rank the influence of nodes. In the experimental part, an experiment is conducted on a real hospital contact data and the most suitable K-cliques community k parameter is found. The results show that the model can effectively solve our problem.