1 Introduction

Networks are a ubiquitous part of real world. Different systems in nature can be depicted by a network model like food webs, biological networks of proteins or hierarchies in an organization. The examples are infinite because numerous physical and non-physical systems or processes can be portrayed as networks. Networks can be assorted into different types and corresponding definitions. Shi et al. [1] define an information network informally as: “An information network represents an abstraction of the real world, focusing on the objects and the interactions among these objects.” In terms of social network analysis, environments are expressed as patterns in relationships among interacting units [2]. Chambers et al. [3] define Social Network Analysis as “It maps and measures formal and informal relationships to understand what facilitates or impedes the knowledge flows that bind interacting units, viz., who knows whom, and who shares what information and knowledge with whom by what communication media.” It has been observed since the last decade that the networks in real life are not simply random, but follow a structure in their evolution. These are termed as complex networks and are used to model many distinct real life associations [2, 4]. Complex networks came into picture with the ideas of small world and scale free networks [5, 6]. World Wide Web (WWW) is such a kind of network or collection of web pages. Similarly, publications of different authors and their collaborations can be represented by co-authorship networks. Social networks are also complex networks representing the links between people who might be an acquaintance, friends or family. These networks facilitate the study of different patterns prevalent in different interactions. The network structure and interactions among the nodes of a network have led to important research problems. For example, analysis of navigation data of users on the WWW is carried out so as to rank the pages to improve searching or for development of recommendation systems. Co-authorship networks predict the future possibility of co-authors for collaboration. Social networks help to comprehend the spread of ideas and innovations, which in turn help in prediction of future links. Network Analysis is rapidly emerging as a valuable technique for efficient analysis because it has the capability to scrutinise diverse and large datasets to solve multitude of real life problems. The various kinds of real-world networks used for Network Analysis applications are presented in Fig. 1.

Fig. 1
figure 1

Various networks used for Network Analysis

1.1 Networks in Healthcare

Among multiple disciplines, healthcare requires significant practices for management of its medical data due to its considerable area of impact and complex nature. Thus, medical datasets are often depicted as network visualisations due to their interdependent nature, making Network Analysis a suitable technology for their analysis. Network Analysis is currently an element of various medical applications for example, it is employed for predicting associations among different entities such as drugs, diseases, genes based on factors like age, symptoms or diet. These unknown associations aid in healthcare solutions like drug repositioning and personalized medicine. It is also utilized for finding disease associated modules from gene or protein interaction networks. These applications constitute diverse forms of data like health surveys, medical prescription, real time monitoring data, medical literature, medical history of patients and data from social networks. Future predictions are determined from the accumulated data, which is known as predictive analytics. The convergence of accurate computational methods like Network Analysis and efficient tools yield predictive analytic models for healthcare data. The benefits of these models are manifold. Firstly, such analytic models facilitate personalized and preventive inferences which are the foundation for developing customized healthcare solutions and lead to P4 medicine. P4 medicine refers to medical solutions which are Preventive, Predictive, Participatory and Personalized [7]. Its focus is wellness for each individual owing to the strategies devised for predicting disease onset or its progression. Secondly, apart from the medical benefits these technology-oriented solutions are cost effective, accurate and time saving. Their use has reduced cost of lab experiments and setup as well as minimized medical errors. The solutions are useful for different stakeholders like patients, pharmacists, doctors, medical companies, medical researchers and institutes.

There is a rapid growth in the use of Network Analysis for predictive analytics in healthcare and other domains because of the following reasons:

  • Network visualization provides an intuitive view of large real-world problems.

  • Analysis of network structure and patterns is significant for comprehension of future predictions.

  • It handles diverse, heterogeneous and dynamic real-time datasets effectively.

  • Network Analysis merged with other techniques is vital for developing effective computational models.

1.2 Motivation for Review

Rapid growth of data and the need to interpret such intricate data has led to the evolution of computational techniques and technologies. Several computational techniques like artificial neural network, support vector machines and metaheuristic optimization algorithms have been devised for understanding complex nature of data and their influence on one another. Network Analysis approach is rapidly becoming a significant computational technique for such solutions because it takes the interdependent nature of datasets into account. It utilizes paths, patterns or features obtained from network structure for drawing significant inferences from them. Combination of imperative technologies with Network Analysis is providing a foothold for developing ideal predictive analytics frameworks. Networks have been discussed for diverse and generic data in literature, but healthcare perspective has never been surveyed. The motivation behind this review is to uncover the possible dimensions of Network Analysis as an emerging computational technique so that it can be employed for developing a predictive framework for different healthcare solutions.

1.3 Organization of the Paper

This survey firstly discusses Network Analysis and its conjunction with other technologies/techniques for predictive applications so as to develop a base for the readers. Then, it reviews the role of Network Analysis and techniques for realising healthcare informatics. The review is organized as follows: Sect. 2 provides a brief background of Network Analysis and its role in healthcare domain along with related works. Section 3 thoroughly discusses the research methodology used for this review. Section 4 provides a brief description of most common techniques of Network Analysis as found in literature along with its combination with other techniques and technologies which are helpful for developing predictive frameworks. Section 5 is aligned towards discussion of Network Analysis and how it is being used for predictive healthcare by focussing on its current applications in the domain. This section reviews its major research contributions at different layers of networks. It also covers contribution of Network Analysis for predicting future associations of diseases based on different healthcare factors, along with its role in dynamic networks. This review concludes by providing the open challenges in current applications of Network Analysis for predictive healthcare analytics along with possible solutions in Sect. 6, followed by conclusions in Sect. 7. A graphical abstract of the survey overview is presented in Fig. 2.

Fig. 2
figure 2

Graphical abstract of review

2 Background

The accumulation of large volumes of data led to the invention of varied technologies leading to better information retrieval, analysis and storage. Various technologies like Text mining, Data mining, Cloud computing and Machine learning have accelerated the development of innovative solutions in multiple domains. Network Analysis is a recent technology for inferring predictions designed for such solutions. It enables more comprehensive study by providing features like interactive visualization and integrative data analysis of heterogeneous datasets. With the essence of vital features, Network Analysis exhibits manifold real-world applications. This section provides a brief introduction of Network Analysis by exploring its basics and its applications in predictive healthcare domain.

2.1 Overview of Network Analysis

Networks or graphs consist of nodes which are connected by edges. The edges represent relationship between the nodes, for example in journal citation networks, the edge between two journals represent that an article in one of the journals cites an article in another journal. A network consisting of one particular set of actors is called as a one-mode network for example, friendships among residents of a neighbourhood. Network which contains two sets of actors is referred to as a two-mode network. In this, actors from one set have interactions with those in another set. The major characteristics of complex networks include the following [4]:

  1. 1.

    Complex networks have a feature of following power law and thus are also termed as scale-free networks. In power law distribution, a very few number of nodes cover quite large number of links and vice versa.

  2. 2.

    The networks are called small world networks because any node in such a network has a small diameter, following the six degrees of separation principle.

  3. 3.

    The networks have high clustering coefficient or transitivity.

The topological features which are important for the understanding of networks are mainly of two types: Global and Local [4, 8, 9]. Global features focus on the overall pattern of the network while local features focus on the neighbour’s influence. There are numerous topological features of networks described as follows:

  1. 1.

    Degree Centrality It is defined as the number of relations of a node with other nodes. It can be represented as dc(i) and defined as:

    $$dc(i) = \sum_{j} e_{ij}$$

    where eij represents a connection between i and j which is either 1 (link present) or 0 (link not present).

  2. 2.

    Closeness Centrality This measure defines how close a node is to other nodes. The normalized closeness centrality is represented as:

    $$c(j) = n - 1\bigg/ \sum_{i} d_{ij}$$

    where dij is the number of connections from node i to j and n is the total number of nodes.

  3. 3.

    Betweenness Centrality It is defined as the number of shortest paths that pass through a given node. It is represented as:

    $$b(i) = \sum_{i,j} s_{i,j} \left( k \right) /s_{i,j}$$

    where si,j(k) represents number of shortest paths between i and j passing through k and si.j represents number of shortest paths between i and j.

  4. 4.

    Diameter Diameter of a graph is the maximum distance needed to be traversed on the shortest path to reach from one node to another. It can be represented as d(i,j) where i, j are nodes in the graph.

  5. 5.

    Clustering coefficient It defines the degree to which neighbours of a node are connected. Clustering coefficient of node v can be represented as:

    $$C_{v} = 2e_{v} /n_{v} \left( {n_{v} - 1} \right)$$

    where ev represents number of edges between all neighbours of v and nv represents number of neighbours of v.

  6. 6.

    Network motifs It is a local property of networks and can be defined as small subgraphs in a network that are recurrent and statistically significant.

  7. 7.

    Clique A clique is a complete subset of a graph which implies that its vertices are subset of the vertices in graph and it covers all the edges present among those vertices.

2.2 Network Analysis for Healthcare Predictions

Network Analysis is most recent and emerging trend for predictive healthcare. Networks are recognized in various states in biology and healthcare, for example biological networks, genome scale metabolic networks, drug function networks and protein interaction networks. Such networks are studied to realize the dynamics of diseases, drugs and genes. Thus, Network Analysis is an imperative component of computational biology. Apart from using biological data for predictions, networks have also been used to analyze hospital data so as to improve quality of medical care. Social networks realized from such resources are useful for healthcare predictions. The integration of Network Analysis techniques and healthcare domain is a constructive combination for envisaging personalised and predictive medical solutions. It has been used extensively for understanding disease progression by predicting disease associations based on various factors.

2.3 Related Surveys

Network Analysis has recently emerged as a computational technique; therefore, there are very few surveys in this regard. In one of the surveys, Aggarwal et al. [10] discussed Network Analysis in the context of evolutionary networks. They focused on the evolution analysis of dynamic graphs. It also summarizes evolutionary network analysis in different application domains. In another survey, Shi et al. [1] discussed Network Analysis for heterogeneous data. They presented different datasets and data mining techniques for analysing heterogeneous networks. A scoping review has been conducted by Chambers et al. [2] to evaluate use of Social Network Analysis in healthcare. For this, networks were constructed using hospital data and influential doctors were identified from the survey. This helped in speeding up ordering of drugs in advance by specifying those that the doctors frequently specify.

This paper is different from the other surveys as it is aligned towards understanding the predictive role of networks for healthcare domain. It illustrates a comprehensive picture of Network Analysis with a healthcare perspective and covers its future prospects.

3 Research Methodology

This paper intends to summarize the various ways in which Network Analysis has contributed towards predictive solutions. It mainly aims to realize the role of such solutions in healthcare domain. The survey methodology used in this paper is discussed in this section.

3.1 Survey Methodology

The methodology for study in this paper as shown in Fig. 3 has been defined in different stages. It is similar to the methodology suggested by Kitchenham [11]. The first stage began with framing of relevant questions based on a defined motivation. Proper search keywords were extracted from the research questions in the next phase. Different combinations of these words were used to make final search strings. The search was designed such that it covers work related to Network Analysis, its applications and its role in healthcare domain. In the next phase, inclusion and exclusion criteria were defined so that only relevant papers are included in the review. The strings were searched thoroughly from different sources of knowledge, followed by reading of title and abstracts in order to eliminate the ones which were not relevant or were purely related to biology. It was realized from reading the papers that Network Analysis has been used increasingly for finding disease associations based on different factors, so search keywords were added for it at this stage. The steps were repeated again until all the sources were exhaustively explored and relevant papers extracted. The extracted papers were reviewed and categorized carefully in the last stage. Thus, a total of 110 papers were included in this survey after scrutiny.

Fig. 3
figure 3

Survey Methodology

3.2 Research Questions

The research questions (refer Table 1) are designed so as to cover different dimensions including common techniques of Network Analysis and its role in healthcare predictions. In the initial stages of survey, it was realized that Network Analysis has mostly been used in literature to extract unknown associations between diseases and other factors, so the research questions regarding this area were then added. The research questions used in this review technique along with its motivation are as described in Table 1.

Table 1 Research questions

3.3 Sources of Knowledge

The various search platforms used in this review as sources of knowledge include:

  1. 1.

    Web of Science (<www.webofknowledge.com/>).

  2. 2.

    Science Direct (https://www.sciencedirect.com/).

  3. 3.

    IEEE Xplore (<www.ieeexplore.ieee.org>).

3.4 Search Keywords

The search keywords included different combinations of words so as to extract all the relevant papers. The purpose was to cover diverse fields including Network Analysis and its role in predictive healthcare solutions. The search words were based on the discussed research questions. The final search strings included an “AND” operation of primary, secondary and additional keywords as shown in Fig. 4. Primary keywords include basic search words related to Network Analysis domain whereas secondary keywords consist of words related to healthcare domain. Additional keywords were added to ensure complete coverage in search strings.

Fig. 4
figure 4

Various combinations of search keywords

3.5 Study Selection Criteria

A large number of papers were retrieved when the final search strings were used, but a very less proportion was actually relevant for the survey. Using the methodology discussed above, a total of 3659 papers were curated, out of which 110 papers were included in the survey based on the study selection criteria. This is due to the following reasons:

  1. (i)

    The keyword “Network” has been used in different contexts in research papers. It included papers which were concerned with subjects like computer networks, wireless sensor networks, Bayesian networks and Neural networks, but not covering Network Analysis.

  2. (ii)

    Some of the research works were eliminated because their focus was simply network visualization. Our review is aimed to cover the analysis and inferences drawn from a network, and not only just visualization of graphs.

  3. (iii)

    Research work in healthcare domain also included papers that used networks for managing tasks in hospital settings. Such papers do not directly contribute to healthcare, and thus were excluded from this review.

  4. (iv)

    Duplicate articles were removed.

The study selection criteria which was followed in this study is depicted in Fig. 5. The papers were searched using the keywords and repeated papers were eliminated. The remaining papers were read by the authors and filtered firstly on the basis of title, then abstract and finally full text using inclusion and exclusion criteria as described in Table 2. The number of papers selected at the end of each step is also shown in Fig. 5.

Fig. 5
figure 5

Study selection criteria

Table 2 Inclusion and exclusion criteria for selection procedure

3.6 Results

Network Analysis, its techniques and role in healthcare have been identified from the papers using a systematic research methodology. A total of 3659 papers for applications of Network Analysis in healthcare domain were curated using the methodology. Out of these, a total of 110 papers were finalized for survey. It includes themes like Network Analysis techniques, predictive healthcare applications, its use for finding disease associations and dynamic networks in healthcare. The number of papers in different categories is as follows:

  1. (i)

    14 papers were extracted for Network Analysis techniques and related applications for predictive healthcare.

  2. (ii)

    Healthcare analysis has been performed for different levels of networks. 12 papers were categorized for applications in disease networks, 18 for social networks and a major portion of 36 papers were extracted for use of biological networks (including Gene, Drug and Protein interactions) for healthcare predictions.

  3. (iii)

    The number of publications for finding disease associations varied for different factors with 8 papers for miRNa based associations, 3 for phentotypes based, 4 for microbe based and 8 for drug-based associations. A total of 23 papers were used for this area.

  4. (iv)

    It also discusses the emerging role of dynamic networks in healthcare for which 7 papers were extracted.

Network Analysis for healthcare can be categorized in three different levels according to the network in question. The three different types (biological, social and disease) of networks, their subtypes and corresponding publications used in this review are depicted in Fig. 6. Networks have a major role in extracting disease associations using various parameters like drugs, microbes, miRNA and phenotypes or symptoms. Figure 7 represents distribution of papers published for extracting disease associations based on these four parameters.

Fig. 6
figure 6

Number of publications for different network layers

Fig. 7
figure 7

Distribution of papers published for extracting disease associations for different parameters

4 Network Analysis Techniques and Collaborations

This section provides a brief description of the major techniques of Network Analysis. It also provides a brief discussion of collaborations comprising of Network Analysis and other computational techniques and technologies useful for predictive applications.

4.1 Network Analysis Techniques

Various analysis techniques have been constructed to extrapolate relevant data from networks. The major techniques identified from literature can be assembled into four categories namely Link prediction, Community detection, Ranking and Subgraph detection as shown in Fig. 8. These algorithms have been used in diverse networks such as food webs, co-authorship graphs and social media to extract valuable knowledge with the help of their properties and structure.

Fig. 8
figure 8

Major network analysis techniques

4.1.1 Link Prediction

Link prediction has been used in various networks to predict links which might occur in future or links which are not known yet. Link prediction dates back to year 2000 with Markov Chain as its oldest technique. Applications of link prediction incorporate prediction of links in biological networks and saving a lot of time and cost, which would have otherwise incurred in performing the lab experiments. Link prediction is used for creating recommendation systems helpful for e-commerce websites too. It is also employed for predicting participation of actors in events like email or co-authorship. Link prediction provides the possibility of association between nodes in a graph even when new edges add up with time (also called dynamic graphs). Hence, it is valuable for predictions in online social networks [12]. In such networks, individuals in a group become vertices and their associations represent edges.

4.1.2 Community Detection or Clustering

Graph clustering refers to vertices grouped into clusters such that the number of links in a cluster exceed greatly than the number of links between clusters. Various algorithms and similarity measures have been devised to be used for grouping similar or well-connected vertices together [13]. Community detection techniques have been used in literature for various purposes like predicting future co-authorships, exploring criminal organizations’ structure or for developing recommendation systems.

4.1.3 Ranking

Ranking is referred to as categorization of objects in a network based on their similarity, influence, importance or distance [14]. The rank of an object is influenced by other objects in the network according to their proximity. Numerous algorithms and methods have been devised to rank objects like page rank algorithm, HITS algorithm or ranking based on similarity, centrality and prestige

4.1.4 Subgraph Detection

Dense components occurring frequently in a graph are beneficial to understand vital parts of a complex network. Finding these dense components or frequent subgraphs is called as subgraph detection. Subgraph discovery is classified into different categories derived from nature of input, search strategy and completeness of output [15].

4.2 Network Analysis Collaborations

It is essential to construct a framework consisting of interactive technologies and techniques for developing predictive models. A general framework entails technologies for data collection, storage, transfer and analysis. For designing a framework by means of Network Analysis, it requires other underpinning technologies like Cloud, Machine learning, Text mining and IoT. Traditional Network Analysis softwares cannot handle graphs containing more than 1000 vertices resulting in poor performance and thus becoming a major hindrance in complex computation and analysis. Cloud Computing is rapidly becoming a preference for processing in such networks, like in [16] similarity between 4219 diseases was measured and 74,888,051 edges were obtained. There have been very few recent works regarding the combination of IoT and Network Analysis. The union of social networks and IoT is also named as Social Internet of Things (SIoT) [17]. It can be applied in various real-world scenarios so as to benefit people through analysis of their networks formed via IoT devices like in [18]. The combination of network analysis and text mining is used for various healthcare predictions. In [19], a glaucoma database was constructed and text mining was utilised to mine genes and their associations. A network was generated from this and unknown gene interactions were predicted from its analysis. Similarly, in [20], a PPI network was generated from text mined proteins and its analysis assists in finding top most important proteins. Data mining techniques have been extensively employed in networks of biomedical data for retrieving useful associations between diseases and other factors. The related literature has been surveyed in the next section.

5 Predictive Healthcare using Network Analysis

The complex and heterogeneous nature of medical data demands robust techniques so as to process it for inferring future predictions. Network Analysis facilitates comprehension of relationships between different datasets and role of individual nodes in the overall structure as seen in previous sections. Due to these reasons, Network Analysis is being increasingly used as an effective computational technique for healthcare predictions also. This section provides an outline beginning with the evolution of computational techniques and technologies for predictive analytics in healthcare, and then discussing the current role of Network Analysis as a computational technique for healthcare predictions. It was realized while searching for Network Analysis that it has a major contribution in literature for finding disease associations based on different factors like microbes, symptoms etc. This section also covers the role of Network Analysis for predicting disease associations based on different factors.

5.1 Healthcare Analytics: Evolution and Emerging Trends

Healthcare predictive analytics has witnessed remarkable improvements owing to the invention of several technologies and computational approaches. Computational intelligence techniques such as Artificial Neural Network, Fuzzy methods, Support Vector Machines, metaheuristic optimization algorithms, ensemble learning approaches, Bayesian approaches and Markov models have been employed for diagnosis of prostate cancer as surveyed in [21]. Various machine learning techniques devised for assessing heart failure, predicting its presence, estimating the subtype and other aspects of heart failure management have been discussed at length in [22]. Different applications of machine learning technologies in healthcare field have been explored in [23, 24]. Herland et al. [25] provided a multi-dimensional view of healthcare big data analytics by exploring data at multiple scales such as population, patient, tissue and molecular levels. Similarly, W. Raghupathi et al. [26] outlined big data analytics in healthcare by reviewing general architecture frameworks, methodology, its advantages and challenges. Costa [27] reviewed major institutions and breakthroughs accountable for application of clinical data in personalized medicine. It is evident from these reviews that computational techniques like big data analytics and machine learning have experienced a steep development in healthcare perspective in a very short span of time. The techniques have evolved by incorporating and improvising underlying tasks according to medical data needs. Tasks like handling noisy data, validation techniques, feature selection and extraction have undergone significant improvements. Meanwhile, the technological improvements were taking place simultaneously. The idea of body area networks and other wearable sensors has grown gradually to monitor healthcare data and communicate it to healthcare providers. Similarly, after realising the potential of mobile technologies for healthcare, researchers have begun utilizing it for varied medical tasks.

5.1.1 Evolution of Computational Techniques and Technologies in Predictive Healthcare

As technology evolves, there is always a need to cope up with the data it generates. This data is the basis for predictive models, thus it requires equivalent computational techniques. Due to this, the evolution of technology and computational techniques is almost always parallel. The technological innovations for healthcare triggered around 1960s with the advent of Electronic Health Records (EHR) [28]. These records are electronic versions of patient’s medical records stored on computers. This offers advantage of remote access allowing people to access and share their records at anytime and anywhere. The linking of EHR with clinical labs has improved data storage as it acts as a central repository for medical history, lab tests and prescriptions. This data serves as a reference and eases the interactions between patients and doctors. With the accumulation of diverse medical data like scanned images, text and handwritten prescriptions, there was a growing need of algorithms and techniques to analyze these records so as to provide predictive and quality solutions.

EHRs gradually became the basis for formulation of next stage systems, Clinical Decision Support Systems (CDSS). These facilitate decision making by minimizing medical errors and providing doctors with alerts for specific situations. This became possible by utilizing the knowledge base extracted from EHRs or by applying machine learning techniques to it [29]. Using these methods, associations or if–then rules are generated to provide future solutions for specific medical decisions.

In around early 1970s, extensive research was carried out for the application of artificial intelligence to CDSS so as to design expert systems. Expert systems are designed to imitate human intelligence without human interference. The system learns from the knowledge base using intelligent techniques and updates it as per need. Examples of expert systems include MYCIN [30] which was the first inference engine in medical domain for the treatment of blood borne diseases. It was gradually realized that analysis of the accumulated data using computational techniques like machine learning and artificial intelligence can improve accuracy and hence quality of diagnosis and prognosis decisions [31]. Figure 9 depicts the progression of technological innovations in healthcare from EHR, CDSS to Expert systems aided by evolving computational techniques like machine learning and artificial intelligence. Such transition is evident in different stages in [32] which aims to analyse emerging trends in medical informatics.

Fig. 9
figure 9

Progression in predictive healthcare technologies through computational techniques

To aid efficient computations, techniques and technologies such as Cloud Computing, Internet of Things and Network Analysis came later into the picture. They are currently emerging as powerful techniques for integration, visualization, storage and analysis of real-world datasets. They are being used for healthcare solutions which involve predictions based on complex and heterogeneous datasets.

5.1.2 Emerging Trends for Predictive Healthcare

Predictive analytics requires interplay of multiple techniques and technologies to develop a framework for modelling. A general framework covers different phases like extraction of data, storage of data, computations and analysis, which requires numerous technologies and computational techniques. Technologies like Cloud Computing and Internet of Things (IoT), along with computational techniques like Network Analysis, data mining and text mining are playing key roles in generating and realising the benefits of progressive healthcare models. Tasks of data extraction and storage are managed by Cloud computing and IoT technologies. Analytics is handled by different techniques of data mining, Network Analysis or text mining. This section discusses the role of these techniques in predictive modelling particularly for healthcare applications. The various emerging trends are discussed as follows:

  • Cloud Computing for Predictive Healthcare

Cloud Computing is one such technology which is increasingly being used to accelerate the computational efficiency of predictive medical tasks. It has proved to be a cost-effective solution for various medical problems like monitoring applications. The treatment of chronic diseases is otherwise quite costly and if a patient waits a long time for hospital visits, the disease might become more intense. Hence, it was realised that monitoring of health parameters in real time could facilitate early prediction. Various health parameters like heart rate, blood pressure and ECG needed to be monitored for a large number of patients. Various healthcare monitoring applications were developed to monitor and store this data which usually amounted to petabytes of data per year, requiring an effective storage and manipulation. With the adoption of wireless and body sensors, large amount of heterogeneous data is accumulated and delivered to Cloud through internet. Cloud provides a pay-as-you-use model for delivering services on demand, thus becoming a good choice for a reliable and cost-effective solution.

  • Data Mining for Predictive Healthcare

Data mining is a computational technique which supports decision making by discovering unknown and useful patterns from massive data [33]. Data mining has been applied in numerous healthcare applications for example; data of important healthcare parameters like BP, heart rate and cholesterol is collected for patients in a home monitoring system so that it can be mined to predict dangerous clinical events in near future. Similarly, classification or clustering of epidemic data is done using machine learning algorithms to predict risk of reoccurrence or outbreak of diseases like diabetes, hepatitis or heart disease. Electronic Health Records are also mined in order to explore unknown patterns from population data.

  • Text Mining for Predictive Healthcare

Text mining, which is a branch of data mining, is also used extensively for healthcare applications. It is applied to clinical data like radiology reports which contain unstructured data. These are mined in order to extract constructive data from clinical imaging information, which helps radiologists and clinicians in decision making [34, 35]. It has also been applied to online social data for predictions to better understand the healthcare systems. For example, online reviews by patients in countries like China and U.S. were extracted from websites like RateMDs.com [36]. This data was mined to extract differences of experiences across nations. Major topics for positive and negative reviews were text mined to better understand the healthcare systems in different countries. Patients in China focussed on bedside manners while Americans reviewed doctors. This information helps in developing a patient centric healthcare environment. In another work [37], clinical data was used for predictions. Clinical notes and past records of diabetic patients were used to extract parameters like number of surgeries and the chances of their readmission in hospital were predicted based on these using text mining.

5.2 Current Role of Network Analysis in Healthcare Predictions

The core of a predictive model remains with the computational techniques in use for analysis. Network Analysis is increasingly being used for healthcare solutions, thus proving to be a constructive combination for predictions.

5.2.1 Applications of Network Analysis for Predictive Healthcare Solutions

Networks Analysis has been employed in various healthcare related predictive applications or research work. Network medicine, as discussed in [38] can be viewed in three layers ordered from molecular to social level. Figure 10 depicts the different network layers in healthcare and corresponding databases used in such applications.

Fig. 10
figure 10

Various network layers in predictive healthcare

  • Biological Networks

A major portion of research has been dedicated to Network Analysis at the molecular level. There are various biological networks which have been used for analysis at the molecular level as discussed in [39,40,41]. Such networks aim to derive inferences from associations between biologically related nodes. Examples include Protein–Protein Interaction (PPI) networks, genetic interactions, co-expression data etc. The different biological themes covered are as follows:

  1. (i)

    PPI networks have been extensively studied for exploring disease associations. There is one such example [42] in which the interactome has been studied to understand relationships between diseases. Similarly, interactome and biological networks are analysed for studying the pathways of endocrine diseases so as to relate it with disease progression [43]. Relationship between diseases has been quantified based on the distance of their proteins and then compared with other datasets. PPI have also been investigated at different levels using network properties in [44,45,46]. Clustering a network of protein interactions has been improved in [47] which helps in identifying functional modules.

  2. (ii)

    Many research works focus on exploring the drug target interactions so that it can be used for drug repurposing or drug discovery [48]. Drug-target interactions, disease genes and molecular pathway were used to identify drugs effects on disease genes in [49]. Similarly, new targets for drugs were predicted using a novel inference method from drug-target network in [50]. Relationship between cell lines and drugs were used to predict drug response for samples in [51]. In this work, combination of molecular and network data has been depicted as a link prediction problem in the network and evaluated with 86% accuracy. Supervised prediction method has been used in bipartite graphs to predict drug target links [52]. Drug target links were predicted using network substructures and scores in [53]. Many different network methods were explored for inferring drug targets like random walk with restart, Monte Carlo simulated annealing and similarity based inferences in [54]. Drug target predictions were performed through kernel based methods which also used network based similarity in [55]. In another work, a semantic network was generated by integrating biomedical data and topological features were derived using meta-paths. Link prediction and ranking were performed using Random Forest algorithm to predict drug target interactions [56].

  3. (iii)

    Genetic data has proved to be an aid for understanding disease and gene associations inferred using multiple network scenarios. Biological networks combined with gene expression data has been used to explore disease and gene associations in [57]. In [58], an association network was constructed which consisted genes along with their functional associations. Context sensitive networks generated using relationships between disease and phenotypes were integrated with genetic network to infer disease gene association using ranking [59]. Similarly, biological pathways have been used to infer disease associations [60]. A network based approach has been utilized for prioritizing genes using differential gene expression data in [61]. They used machine learning kernel based techniques for analyzing network. Similarly, network topology has been explored using NP, Random Walk with Restart (RWR) and SP algorithms to prioritize genes for diseases [62]. Random walk approach has also been used in [63]. Gene phenotype prediction was performed using network weights as features for SVM classifier in [64]. The network was generated by integrating multiple biological databases and weights were defined using bayesian inference. These predictions might help in learning disease mechanisms. Eronen and Toivonen [65] proposed a system Biomine, which integrates biomedical data from various sources and used different network measures for predictions. Random walk with restart, supervised or unsupervised prediction methods have been used for disease-gene prioritization. Leiserson et al. [66] describe different algorithms and network tools to identify cause of a particular disease. This is possible by analysing the genes or proteins interaction networks so as to identify genetic variations which ultimately lead to a disease. Similarly, in [67] heart disease development is studied from gene network analysis.

  • Disease Networks

The other layer evident in research is Disease network layer. The associations between diseases have been studied in literature to understand disease progression. There is one such example [68] in which insurance claims data has been used to identify disease–disease associations. Clinical drug trial information has been used in [69] to develop disease relationship network. Administrative healthcare data was obtained and trajectory of diseases in patients was modelled as network in [70]. Two comorbidity networks were constructed for patients with and without Type 2 diabetes, and explored to understand high risk diseases and patterns. Different similarity measures including standard methods and a proposed graphlet measure were used to predict associations between diseases based on biological network in [71]. Such visualizations and predictions are helpful in understanding diseases and aid in biological research.

  • Social Network Analysis

Social Network Analysis is the third layer identified in research works. It has been used in literature for assessing the role of healthcare parameters with the help of human interactions [3], thus named as Social Networks. Apart from sole medical purposes, networks have allied with healthcare data for other tasks. A network is generated from medical expenditure data and insights are derived from drug purchases [72]. It successfully derives drugs which are usually taken together despite having same compound. This would help in customising drug prescriptions and improving drug costs. In [73], multiple aspects of school, social networks and neighbourhoods of students were utilized for constructing networks to understand their role towards health. Various social network methods have been used to study co-authorship networks for health. Analysis of co-authorship graphs has been undertaken in [74] for identifying leading researchers in different domains of health research. Collaboration networks for influenza virus vaccine area were generated and Social Network Analysis was used to explore the research done in this field in [75]. Social network was used to understand interruptions in the role of people involved in an ICU in [76]. They explored inter dependencies and factors for interventions from the network so that clinical workflow can be improved. Similarly, to improve the provision of care providers for patients suffering from life threatening disease, a social network was constructed in [77] and optimal providers were selected with minimum cost using different network algorithms. A similar network was constructed in [78]. Outbreak networks in hospitals were also studied in [79]. A network using clinical data was constructed to comprehend other co-occurring conditions for patients diagnosed with depression in [80]. They used statistical methods and network metrics to compare the networks and infer associations. Network parameters were studied in brain networks to compare subjects which were healthy, or suffered from different stages of dementia in [81]. SNA was performed over research articles of nursing care in [82] to explore its role for delirium patients. Subgroups were identified and network structure was studied so as to understand the relations and improve the provision. There are many other works, which have used SNA for improving medical care [83,84,85]. Discharge of patients was planned by analysing communication patterns as networks in hospital system [86].

The intricacies of networks are of great use for proposing medical solutions like predicting associations or finding disease associated modules. Table 3 presents some other applications of Network Analysis techniques for predictions in healthcare domain which have not been discussed above. It describes different Network Analysis techniques and their use for predictive healthcare solutions.

Table 3 Applications of network analysis in healthcare

5.2.2 Network Analysis for predicting disease based associations

As discussed in Sect. 3, Network Analysis is increasingly being used for studying disease based associations. This is possible using data mining and machine learning techniques for biological data. Network Analysis has been used extensively in combination with data mining so that associations between different healthcare parameters can be predicted. Various machine learning algorithms have been devised using features from network to predict future links based on different factors [97, 98].

Numerous works regarding this are discussed as follows:

  • Drug based associations The development of new drugs for any disease is a costly and time-consuming task. Therefore, various computational techniques have been devised which utilize known disease-drug associations so as to predict possible unknown relations of drugs to diseases. This helps in repurposing of approved drugs, thereby assisting the process of drug discovery. Understanding the drug and disease relationships is important so as to gain knowledge about disease progression. The various works of drug based associations are depicted in Table 4. It describes the types of similarity measures, networks and databases used in different works. The Area Under Curve (AUC) parameter in the table depicts which model performs better with a value near 1 depicting better performance.

    Table 4 Drug based disease associations
  • Microbe based associations The microorganisms comprising of bacteria, viruses, fungi etc. which reside in human body are called microbes. They form a healthy relationship with host organs and are infact significant for their physiology. For example, fermentation of food components is done by gut microbiota to help in digestion by the host. Microbes’ functions include developing the immune system, maintaining drug metabolism and protecting from pathogens. Hence, the study of microbes and their relation to diseases would play a significant role for gaining a better perception of disease mechanisms and therapies. The traditional approaches which involved cultivation of microbes were time consuming and laborious, so computational approaches are being employed to minimize costs and time. The various works of microbe based associations are depicted in Table 5.

    Table 5 Microbe based disease associations
  • Phenotype associations Phenotypes include symptoms or side effects observed in a patient. The knowledge of relation of symptoms or side effects with molecular processes can aid in understanding personalized treatment. Also, similar drugs can be used for diseases having similar symptoms or side effects. Symptoms based disease associations can help in identifying targets of infectious diseases and prioritization of genes because it has been known that genes located in the close neighbourhood of targets in the PPI network also exhibit high symptom similarity with viral infections. Networks have been used to identify symptom dynamics in psychopathology using symptom networks [108]. Symptom network based on diseases has been used by the author to extract useful information like finding the central symptom in a person. The various works of symptom or side-effect based associations are depicted in Table 6.

    Table 6 Phenotype based disease associations
  • MicroRNA based associations A group of small non protein RNA molecules are termed as microRNA. They are essential part of many biological processes including cell growth and tissue development. Hence, discovery of associations of miRNAs with diseases will benefit the understanding of disease mechanisms and progression. Various computational approaches have been proposed to prioritize miRNA candidates. The various works of drug based associations are depicted in Table 7.

    Table 7 MiRNA based disease associations

Discussion The different methods for finding associations were compared and it was realised that majority of associations have been found using random walk and markov chain methods. The next popular technique has been the use of traditional similarity measures. Few research works have focused on machine learning techniques of clustering, classification and semi-supervised learning. Other methods included depth-first search, ranking and probabilistic methods. The distribution of techniques has been depicted in Fig. 11. For extracting associations of diseases with different factors, different datasets needed to be used. Table 8 describes the various existing datasets which have been used in literature for extracting associations as evident in the research papers.

Fig. 11
figure 11

Distribution of methods used for extracting association

Table 8 Existing Datasets

5.2.3 Dynamic Networks

Exploring the network dynamics is an interesting and upcoming field. It is used to comprehend flow in a network as well as changes in topology [124]. It is increasingly being used in biological networks to study disease progression. In one of the works, due to lack of data, a static network has been studied dynamically to understand disease progression. The correlation of diagnosis of diseases in first two visits is compared to that in next two visits, in order to understand the dynamics of the network [125]. Comorbidity scores were calculated for quantifying the distance between diseases in network, and further used to compare patients of different ethnicities and gender. Temporal networks have been studied in [126] to detect the risk areas for spread of mosquito borne diseases. The network was created in Gephi at different time intervals and used to explore temporal properties. Six metrics including temporal correlation coefficient, path hops and betweenness centrality were calculated using network properties to predict the transmission of disease. Another work [127] discusses the importance of SNA for studying hospital care networks which originate from real time and continuously evolving data. Similarly, transition probabilities were evaluated to explore transition networks of service deliveries so as to improve medical care [128]. Instead of SNA, Dynamic Network Analysis (DNA) has been used in [129] to analyse communication networks in patient care units which change over time. Various network metrics were obtained using Organization Risk Analyzer (ORA) tool to understand relationship between communications and patient safety. Table 9 describes various works of healthcare analytics using dynamic networks.

Table 9 Healthcare analytics using dynamic networks

Discussion Most of the medical applications capture real time data, which is ever evolving. Thus, it requires a dynamic approach to understand the progression in networks. Dynamic Network Analysis is a promising technique to study transitions in real time. Temporal properties and transition probabilities can be used to devise efficient algorithms for the study. Such approaches are still in their infancy but if designed properly, these will prove to be beneficial for developing predictive healthcare solutions.

6 Challenges

Although Network Analysis and its techniques have evolved since the 20th century, yet there are many challenges as evident from literature. This section presents the challenges in the application of Network Analysis for healthcare predictions. Some of the challenges analyzed in this survey are:

  • One of the challenges arising is the failure to focus on dynamic networks, although most of the complex healthcare networks are dynamic in nature. For example, protein–protein or other biological networks of individuals can facilitate personalized medical solutions by understanding the patterns in their evolution over time which requires expertise in dynamic networks. Several novel techniques are being devised to handle such transient nature of networks.

    • Link prediction along with machine learning techniques for temporal data would be a great solution in such scenarios.

  • Some techniques do not perform well with large or sparse medical graphs. Moreover, with the graphs growing gradually with time, there is a need to store the data effectively.

    • One of the solutions can be the provision of a Cloud based framework. Cloud platform provides resources on the fly as per requirement which can be scaled up or down. This is a cost effective and time saving solution for large, heterogeneous and dynamic healthcare datasets. Network Analysis is already being used in conjunction with Cloud and IoT for monitoring healthcare applications for efficient computations in real time.

  • It has been observed that most of the Network Analysis techniques have been based on a particular case study or medical dataset. This is another drawback because for efficient functioning of a technique, it needs to be validated for several case studies or for a comprehensive dataset.

    • In case of heterogeneous dataset, a suitable Network Analysis technique must be opted for manipulation because networks are fit for interpreting integrative data. A combination of Network Analysis technique with big data analysis would serve the purpose.

  • In the domain of Network Analysis for extracting unknown associations, the major challenges are regarding the similarity measures and mining algorithms essential for analysis. Although, there are numerous traditional similarity measures, but there is a strong need of dedicated similarity measures so that expedient networks can be constructed.

    • The application of suitable machine learning algorithms over these networks will aid in automation of task of exploring unknown associations. Use of appropriate measures and algorithms will enrich the network based frameworks leading to better healthcare decisions for doctors.

    • The global and local topological parameters of networks like betweenness, centrality etc. can be used as features to further apply machine learning algorithms so as to extract unknown associations or perform clustering.

  • Although there have been advances in the other techniques and technologies used in conjunction with Network Analysis, yet their application is limited because they are still evolving. Moreover, use of these techniques specifically for healthcare applications is in its infancy.

    • There is a need to comprehend suitable combinations of techniques and technologies with Network Analysis in order to be able to perform valuable analysis and computations.

  • The data in medical applications is sensitive and demands confidentiality. Privacy is a major concern for associated stakeholders, but Network Analysis applications have not focused on this area.

    • For this purpose, efficient security checks must be incorporated in healthcare applications.

  • Real world complex networks not only consist of multiple kinds of nodes, but also multiple types of links. Bipartite networks are commonly observed in drug-target or recommender applications where two types of nodes are present.

    • A new framework has been devised for predicting connections in such networks in [130] by developing local community based topological model for bipartite graphs.

7 Conclusions and Future Directions

With many emerging Network Analysis techniques and their useful combinations with other technologies, a large number of applications have already utilized such frameworks for proposing prediction models for analysis. This review scrutinizes Network Analysis in multiple dimensions by initially extracting 3659 papers from different areas using search terms and then eliminating and finalizing 110 papers. It aims to survey Network Analysis and its significant techniques for predictive healthcare applications. Networks are evident in different layers for healthcare predictions, namely biological, disease and social networks. This survey aims to cover significant works done in the three layers for predictive healthcare. Network Analysis is also increasingly being used for understanding different associations of diseases with drugs or symptoms and disease progression. It is realised from this review that envisaging Network Analysis and other technologies as an effective computational framework for predictive healthcare models is still in its infancy. The review summarizes few issues in network based healthcare solutions like handling large, heterogeneous and dynamic medical data securely. It proposes possible solutions, like development of Cloud based network frameworks for handling large datasets, combination of Network Analysis technique like link prediction with machine learning for temporal datasets and use of topological parameters as features for analysis algorithms. Exclusive work in this domain is going to facilitate healthcare tasks like drug repurposing or personalized medicine. Carefully chosen Network Analysis techniques along with the aid of other underpinning technologies might prove to be a giant leap for the much awaited P4 medicine.