Keywords

1 Introduction

As the healthcare systems across the world grapple with the grim realities of changed landscape of disease burden in the post-covid era, it is becoming imperative to develop fruganomic healthcare technologies capable of gathering evidence from the community to augment the hospital based registries in providing reasonably accurate estimates of the disease burden which will definitely help the health policy administrators to get a unequivocal narrative in a nuanced manner to develop niche specific disease surveillance and forecasting systems. This will indeed help the Healthcare administrators to develop and deploy novel, affordable and more importantly accessible instruments such as policies and programmes aimed at evaluating the health status of the community at large particularly in resource limited healthcare systems prevalent in low-and lower-middle-income countries (LLMICs) such as the Indian sub-continent. India endowed with unique geological relief structures and divergent genetic base provides a unique landscape of disease burden necessitating the need to develop tools capable of being ported into mobile platforms (iOS/HTC/Android), since mobiles have a good penetration in the rural milieus of the Indian sub-continent. This is further compounded by the fact that Non-communicable Diseases (NCDs) disproportionately affect people living LLMICs [13], accounting for three quarters of the mortalities within LLMICs [4]. The relationship among NCDs, poverty, social and economic development [5], is likely to pose a major challenge to development as well as attainment of Sustainable Development Goals (SDGs) by 2030 [6, 7]. The marginalized sections of the society in LLMICs are vulnerable to NCDs for many reasons, including socio-economic constraints, psychosocial stress, higher levels of risk behavior, unhealthy living conditions, limited access to high-quality health care along with reduced opportunity to prevent complications. The prevalence of unhealthy risk behaviors such as consumption of tobacco and alcohol products along with sedentary lifestyle will make these population vulnerable to the ravages of “NCD Epidemic” which has been hitherto underacknowledged and unaddressed until the advent of COVID pandemic. The fact that India is projected to experience more deaths from NCDs than any other country over the next decade, primarily due to the size of the population and worsening risk factor profile will significantly impact economic growth. The deeply entrenched social and economic disparities, with lack of affordable and accessible healthcare presents a pre-emptive scenario for the emergence of several epidemics/ pandemics such as COVID-19 with devastating consequences.

In the last two decades, the translation of the fundamental concepts of precision medicine at a community level to understand the patterns and processes associated with the landscape of disease burden in the LLMICs having fractious and fractionated health-care ecosystems such as the Indian sub-continent necessitates the need to develop novel, cutting edge, and disruptive fruganomic community empowering solutions aimed alleviating the healthcare disparities. In the last two decades India has witnessed an epidemiological transition from communicable diseases to NCDs in the last two decades with cardiovascular diseases and Cancer accounting for a significant proportion of morbidities and mortalities, the un-finished agenda of communicable diseases has led to emergence of recent pandemics such as COVID-19 [810]. Although the World Health Organization (WHO) has recognized the outbreak of COVID-19 in January 2020 and declared it as a pandemic in March 2020, the statutory impact on the economy of all the countries including India has clearly affected the health outcomes of the populace belonging to the weaker socio-economic strata.

COVID-19 associated co-morbidity was observed in patients who had underlying risk factors of hypertension, diabetes, and chronic respiratory problems. Chronic respiratory problems account for 8% of the mortalities and India has 18% of the global population with an ever-increasing burden of chronic respiratory diseases including chronic obstructive pulmonary disease (COPD), asthma, pneumoconiosis, interstitial lung diseases, and pulmonary sarcoidosis [1113]. Environmental factors such as air pollution, water pollution, and soil pollution to name a few are known to significantly contribute to premature mortality and disease burden globally, with the highest impact in low-income and middle-income countries such as the Indian subcontinent endowed with resource limited healthcare systems [14, 15].

Recent evidences suggest that use of multi-modal multi sensor fusion technologies along with big data enabled platform, would significantly contribute towards the strengthening resource-deprived healthcare systems prevalent in the Indian sub-continent. The technology (artificial intelligence AI) enabled transition of precision medicine to precision public health must be integrated into the existing framework of healthcare systems with a view administer the provision of affordable and accessible healthcare solutions intrinsic to the niche specific needs of LLMICs. The integration of the disruptive and cutting-edge healthcare solutions within the framework of the existing healthcare systems will significantly improve the health outcomes of the community at large.

A major drawback of the existing interpretation algorithms based on Artificial Neural Networks (ANN) is its black-box nature, which coupled with increased computational complexity leads to increased carbon foot-printing and thereby global warming [16, 17] but solutions are also emerging [18]. Apart from this, the process of obtaining a result is also difficult to understand as to why and how it arrived at the answer [1921]. Further, the nonlinear dynamical behavior of deep neural networks is prone to chaotic nature and fundamental underlying unpredictability [2224]. On the other hand, static and predictable algorithms like, Earth Movers Distance (EMDs) and Visibility Graph perform image match by computing perceptual similarity and provide more meaningful and interpretable solutions to matching problems. Taken together, ANNs have a long and very well researched history of inherent instability and its auto-mated decisions can’t be entrusted to make decisions critical to the survival of a patient afflicted with a severe case of a COVID-19 lung ailment [2528].

Chest radiographs are still most common modality for diagnosing lung disease conditions and the development of tools and applications that can seamlessly evaluate lung health of LMICs such as India will significantly augment healthcare outcomes. Additionally, the lack of well-structured databases for referencing and analysis hinders the progression of research from aiding and optimizing processes and clinical decision making with the help of Artificial Intelligence (AI) [29, 30].

India endowed with diverse genetic base and socio-cultural norms, presents a unique landscape of disease burden necessitating the need for niche specific databases for enhancing the accuracy of AI-enabled tools. The socioeconomic impact and benefits of AI based automation of Chest Radiograph analysis for LMICs like India will significantly improve the clinical outcomes of patients afflicted with lung diseases and outweigh the challenges leading to its integration to the existing framework of the healthcare system [31, 23]. The potential application of AI-enabled platforms would provide a valuable, precision public health tool for better management of lung disease epidemic by improving the clinical outcomes thereby alleviating a significant burden on the national health spend.

The fundamental impact of integrating smart clinical devices, IoT, and Industry 4.0 with clinical software and closed-loop resource allocation is the ability to rapidly de-ploy medical infrastructure in challenging places during natural calamities like flood, earthquake, drought, Tsunamis etc., while drastically reducing costs and reacting to demands in patients’ preferences, pharma industry changes, the supply chain, and technology upgrades.

Broadly, this work makes following contributions in moving towards clinical dis-ease tagging as a webservice:

  • This work describes a natural approach for automated Covid positive chest radiograph tagging using computational ideas like perceptual similarity, Earth Mover’s Distance (EMD) and converting the chest radiographs into a network/graph using horizontal visibility graph procedure and then computing similarity scores by HIM network distance metric. From all perspectives, it is a first work of its kind whose time has arrived due to Covid clinical and hence socio-economic emergency.

  • This disease tagging is being presented as an app on any mobile platform of choice where all user/patient/doctor/medical professional must do is to upload their Covid chest radiograph and system will generate disease tag. This is possible due to advent of affordable smartphone technology and its accessibility across socio-economic spectrums.

  • Same algorithmic ideas with intense computational engine are interfaced as web service, where disease tagging can be done in large batches since one of the major issues in Covid waves, large numbers of people getting infected in a very short time span. These is a clear need for such a system which can cope up with this kind of Covid-infection load in an agile fashion.

Rest of this chapter is organized as follows: Sect. 2 discusses related work in the context of HVG. Section 3 describes clinical and public health motivations behind deploying this technology. Section 4 presents basic mathematical definition of EMD. Section 5 describes HVG, HIM and related results. Section 6 illustrates different levels of processing in HVG for Covid positive chest radiographs. Section 7 reports on computational results from HVG–HIM implementation on chest radiographs for disease tagging. Section 8 demonstrates our basic implementations as a webservice for remote accessibilities in LMICs. Finally, Sect. 9 collects our insights and experiences in conclusion and suggests future directions for development.

2 Related Work

Motivation behind visibility graph generation was to develop simple and fast computational methods, which transform a time series into a network or a graph. This resulting visibility graph in turn inherits multiple features of the original timeseries in its spatial organization. For example, periodic timeseries transform into regular graphs, and random timeseries manifest themselves random graphs [32, 33]. Along these lines, horizontal visibility algorithm, a geometrically more intuitive and analytically tractable version of visibility graph algorithm, focusing on the transformation of timeseries into graphs [34], has been proposed. It turns out that, exact results on the topological properties of these horizontal visibility graphs, like, the degree distribution, the clustering coefficient, and the mean path length, can be obtained. The horizontal visibility algorithm can also be used as an intuitive method to discern between any two different timeseries. It is precisely this capability we leverage here to automatically categorize normal and Covid positive chest radiographs. HVG along with features like mean node degree and degree distribution has been used to categorize the sleep stages based on graph domain properties from a single-channel electroencephalogram (EEG) signal [17, 35].

Visibility graph methods have been found effective in describing the fractal properties of Geophysical time series [36]. The understanding of various graph-theoretical metrics pertaining to visibility graphs, their interdependent nature, and their sensitivity with respect to missing values and randomness are explored. Visibility graph algorithms have been applied to fMRI time series to simultaneously compute and process relevant dimensions of both local and global dynamics in a natural fashion, and to explore a transformation between time series and network theory in the context of network neuroscience [37]. It has been illustrated that the network architecture of the image visibility graphs represents important information on the organization of the image from which they are derived and potentially they can make good image filters [38]. Using HVG, a general class of predictors, which can be deployed to augment existing properties used in heart rate variability (HRV) analysis, and which show high predictive power for multiple cardiovascular diseases, have been defined and validated [39]. Normalized weight vertical visibility algorithm (NWVVA) has been proposed to extract EMG-based features for myopathy and ALS detection [40]. In this algorithm, sampling points or nodes based on sampling theory are derived, and features are computed based on interrelations among the vertical visibility nodes with their amplitude differences as weights. The similarity graph algorithm are used to analyze the time series of motor activity, extracted from actigraph registrations over 12 days in depressed and schizophrenic patients. These were mapped into a graph and then techniques from graph theory were applied to describe these time series, searching for variations in complexity [41].

Visibility graph methods were deployed to analyze ECoG signals in rats [42]. Subsequently, typical metrics in network science (graph properties) were applied to compute network properties of topological structure of these graphs derived from ECoG signals. A family of Feigenbaum graphs, which are horizontal visibility graphs (HVGs) generated from the trajectories of one-parameter unimodal maps undergoing a period-doubling route to chaos (Feigenbaum scenario), have been analyzed [43]. It has been found that while the maximum eigenvalue of HVG can easily discern chaos from a white noise process, it is not a good metric to quantify the chaoticity of the process, and that the eigenvalue density is perhaps a better indicator for the same.

3 Motivation for Building This Tool and Methodology

This work is motivated by following two objectives.

  1. 1.

    Development and validation of an Intelligent Decision Support System for segregating Chest Radiographs to detect COVID-19 associated lung diseases in both tertiary care settings and extended community along with tracking of patients through low end mobile health applications [4446].

  2. 2.

    Integration and validation of multi-modal tool in clinical practice involving auto-mated processing of anonymized chest radiographs along with conventional molecular biomarkers [47, 48] of tissue hypoxia in both angiogenic and fibrotic phases of the lung disease progression forming the rationale of effective triage methods for prioritizing the most urgent conditions to wait listed ones.

The race and sex-specific variations in the levels of conventional biomarkers such as Angiogenesis/Fibrosis indeed necessitate the validation and confirmation by a non-invasive AI-enabled modality, which can seamlessly crunch a large amount of data in an affordable and accessible manner. Our fruganomic data intensive AI-enabled tool will not only facilitate the same by incorporating the clinical-epidemiological features of the subjects evaluated at tertiary care centers and the extended community but also upon integration with the digital signals from surrogate molecular markers will result in the creation of a multi-modal multi fusion sensor technology [27, 29, 30] which will aim at not only resolving the dogma of missed and misdiagnosis of Lung diseases such as Tuberculosis or Pneumonia at tertiary care centers and extended community but also individualize the risk assessment of patients with suspected myocardial infarction or to categorize patients into low- or high-risk groups.

In Recent years, various computer-based tools have been developed which can be reliably used for computational disease tagging purposes. Healthcare Professionals with the help of such tools can accurately and computationally tag different disease conditions within a short time with a view to significantly improve the health outcomes of the community at large [4957].

In the past people have prospected the use of deep learning models with limited efficiency to diagnose lung diseases which use X-ray images as a modality to evaluate lung health as well as predict the onset of diseases such as Covid-19 in the patients [31]. In this paper, we have explored the possibility to predict the lung ailment by applying Earth mover’s Distance algorithm [58, 59] as our ongoing work along with Visibility Graph to the X-Ray images of the patients. EMD mimics the human perception of texture similarity whilst Horizontal visibility graph (HVG) and Hamming-Ipsen-Mikhailov (HIM) distance-based similarity approach forms a corner stone for automatically distinguishing clinical multimedia in an automated fashion. This stable and programmatic algorithmic capability can be leveraged to provide automated disease tagging where highly trained medical professional services are either too scarce or unaffordable. These observations when coupled together form the rationale for scalable automated clinical disease tagging for community-oriented health intervention.

3.1 Earthmover’s Distance (EMD)

Earthmover’s Distance (EMD) is a method to calculate the disparity between two multi-dimensional distribution in some space where a distance magnitude between single ones (ground distance) is given. Suppose the two distributions are there, one can be considered as the area with the mass of earth, and the other as a collection of holes in that same area. Then, the EMD is the measure of the least amount of work required to fill the holes with earth. Here the unit of work is the force needed in transporting unit earth by a unit of ground distance. So, it can also be defined as the minimum cost that must be provided to convert one histogram into other. Measuring of EMD is based on a solution of transportation problem [16]. For finding mathematical representation, firstly we formalized it as the following linear programming problem:

Let X be the first signature with n clusters, xi is the cluster representative, and wxi is the weight of cluster.

Let Y be the second signature with m clusters, yi is the cluster representative, and wyi is the weight of cluster.

Let D be the ground distance matrix, di j is the ground distance between clusters xi and y j.

Let F be the flow matrix and fi j is the between xi and y j.

Then,

$$X=\{\left({x}_{1},{w}_{x1}\right),\left({x}_{2},{w}_{x2}\right),\left({x}_{3},{w}_{x3}\right),...\left({x}_{n},{w}_{xn}\right)\}$$
$$Y=\{\left({y}_{1},{w}_{y1}\right),\left({y}_{2},{w}_{y2}\right),\left({y}_{3},{w}_{y3}\right),...\left({y}_{m},{w}_{ym}\right)\}$$
$$D=\left[{d}_{ij}\right]$$
$$F=\left[{f}_{ij}\right]$$

Now, the WORK (X, Y, F) = \({\sum }_{i=1}^{n}{\sum }_{i=1}^{m}{f}_{ij}{d}_{ij}\) Subject to constraints: (i) \({f}_{ij}\ge 0\), where \(0\le i\le n\), \(0\le j\le m\); (ii) \({\sum }_{j=1}^{m}{f}_{ij}\le {w}_{xi}\), where \(0\le i\le n\); (iii) \({\sum }_{j=1}^{m}{f}_{ij}\le {w}_{yj}\), where \(0\le j\le m\); (iv) \({\sum }_{i=1}^{n}{\sum }_{i=1}^{m}{f}_{ij}\) = min \({\sum }_{i=1}^{n}{w}_{xi}\). \({\sum }_{j=1}^{m}{w}_{yi}\) The constraint (i) enables mass moving from X to Y. (ii) and (iii) restricts the amount of mass that can be sent by the clusters in X to their weights and the clusters in Y to receive no more mass than their weights. (iv) One forces to move the maximum amount of mass possible. It is also known as the total flow. Once we solve the transportation problem, we will get the optimal flow F. Now the Earth Mover’s Distance is defined as the work normalised by the total flow:

$$ EMD\left( {X,Y} \right) = \mathop \sum \limits_{i = 1}^n \mathop \sum \limits_{j = 1}^m f_{ij} d_{ij} \div \mathop \sum \limits_{i = 1}^n \mathop \sum \limits_{j = 1}^m f_{ij} $$

3.2 Horizontal Visibility Graph (HVG) and Its Application for X-ray Chest Radiograph Processing in R

The notion of visibility says that if two data points in a time series are in the line of sight without being obstructed by any other data points then they are visibible and hence they are connected in a visibility graph. This tranformation by visibility gives rise to the mapping of a timeseries into a network as per given specific geometric condition which is outlined below. Any two given data points (t1, i1) and (t2, i2) from timeseries obtained from covid or normal X-ray image matrix time series will be said to be visible and hence connected in the ensuing graph if for any other data point (t3, i3), for all t1 < t3 < t2 satisfies.

$$\mathrm{i}3\hspace{0.17em}<\hspace{0.17em}\mathrm{i}1\hspace{0.17em}+\hspace{0.17em}(\mathrm{i}2\hspace{0.17em}-\hspace{0.17em}\mathrm{i}1)$$
$$\mathrm{tc}\hspace{0.17em}-\hspace{0.17em}\mathrm{t}1$$
$$\mathrm{t}2\hspace{0.17em}-\hspace{0.17em}\mathrm{t}1$$

What it essentially means that all values yi for all t1 < ti < t2 should stay below the line drawn between i1 and i2. Limiting this notion of visibility to only horizontal direction, one can intuitively understand the notion of horizontal visibility where two data points are horizontally visible if one can draw a horizontal line between them or establish a line of sight while all other values between these two data points are staying below this line: ii, il > ik for all k such that j < k < l [33]. Clearly, as in the visibility case, horizontal visibility algorithm maps a sequence of data points/timeseries to a horizontal visibility graph (HVG). Once, HVG representation is obtained, massive analytic capabilities of network analysis and tools of network science and graph theory can be deployed to analyze the original sequence of datapoints combinatorically, resulting in hitherto unknown criteria for data sequence characterization. While there are large number of visibility graph applications in multiple multidisciplinary areas, this work leverages this method for classifying and distinguishing patients with a certain pathology from healthy controls, by using the network attributes of HVGs as feature-vectors for automatic disease-tagging. In particular, an analysis of automation classification of healthy and corona-positive patients is presented with digital lung-Xray modality [34].

3.2.1 Hamming-Ipsen-Mikhailov (HIM) and Network Similarity Metric

Hamming distance is a simple metric which computes the number of slots where two strings of equal length differ [60]. Alternatively, it counts the number of edits or substitutions required to transform one representation into the other. Generally speaking, its edit distance between two strings and can be deployed as a local metric to compute two networks’ similarity indices. Ipsen-Mikhailov distance was pioneered by Ipsen [61] for graph reconstruction problems. Jurman et.al. [62] expanded its usage to “graph-comparison” methods.

The Ipsen-Mikhailov (IM) distance is a spectral measure which models a topology of N molecules connected by flexible springs. These network topologies are organized by the underlying adjacency matrix. The global (spectral) metric IM is the Ipsen-Mikhailov distance pertaining to the square-root of the squared difference of the Laplacian spectrum for each graph. The Ipsen-Mikhailov distance outlines the difference between two graphs by comparing their respective spectral densities and not by the raw eigenvalues themselves.

To take the advantage of local nature of Hamming and global nature of IM, the Hamming-Ipsen-Mikhailov distance is proposed. It is is a weighted combination of the Ipsen-Mikhailov (IM) and the normalized Hamming (H). The Hamming-Ipsen Mikhailov (HIM) distance is an Euclidean metric on the space created by the Cartesian product of the metric space associated with H and IM. The contributions of global and local information is governed by a combination factor ξ used in the formula. When ξ is one, local and global information are in balance; when ξ is tending to 0, it becomes (local) Hamming distance; and when it goes to ∞ it resembles the (global) Ipsen Mikhailov distance.

$${d}_{HIM}=\frac{1}{\sqrt{1+\xi }}\sqrt{\xi I{M}^{2}+{H}^{2}}$$

Like mentioned earlier, this distance benefits from the strengths of both the Hamming and the Ipsen-Mikhailov distances by leveraging local and global information. Further, since it combines two distances with a non-negative weight, it defines a proper network distance between graphs. The parameter \(\xi \) gives the control to the metric by letting the user favor one type of information over the other. However, empirically, it is well observed that this distance is computationally expensive, and thus costly to apply to the analysis of massive graphs and large datasets. For our purposes here, HIM distance is used to compare two horizontal visibility graphs (HVG) which are generated either from normal or covid positive x-ray radiographs. The ensuing network similarity helps us decide the appropriate disease tag as will be demonstrated in the computational results.

4 Dataset

Primary source of normal and Covid-positive chest radiographs have been sourced from Rajiv Gandhi Cancer Institute and Research Centre [63] where representative normal and unhealthy ECGs, were compared Diseasetagging with Visibility Graph and EMD based analysis of training data.

With the given 100 test ECGs. Similar process has been followed for Covid-positive disease tagging using EMD with 30% success rate. Following similar reasoning, if test chest radiograph is closer to normal radiograph i.e. its VG-HIM distance is smaller to normal one, then it is tagged as a normal chest radiograph and if it resembles Covid positive chest radiograph, i.e. its VG-HIM distance is small with respect to representative Covid positive chest radiograph, then it is tagged as a Covid positive chest radiograph as shown in Table 3 next. A success rate of 60% for HVG–HIM based disease tagging has been reported. The full process has been shown as a flowchart in Fig. 13. We compute that our success rate is 60 out of hundred or 60% which calls for multimodality and that is where biomarkers [47, 48] walk in as a natural basis of Covid positive classification to further enhance the automated tagging of Covid positive chest radiographs with enhanced confidence.

4.1 Transformation of a Chest Radiograph to a Horizontal Visibility Graph: Different Stage of Processing

To illustrate the complete process from starting with a chest radiograph to generating its horizontal visibility graph to get it ready for HIM distance computation is accomplished in multiple different computational processing stages. We are going to illustrate it using COVID positive training image, COVID Train7.jpeg as shown in Fig. 1. We can notice that compared with a normal chest radiograph it has more white cloud like structures which possibly might be due to Covid positive nature of the radiograph. To process it, chest radiograph is converted to a down sampled numerical matrix in R computational environment. In our case we have downsized it to 8 × 8 to present it in all clarity and show the relevance of different processing algorithms and visualization. This transformed Covid positive chest radiograph as a 8 × 8 matrix is displayed in Fig. 2 and different color intensities show different grey levels in original covid positive chest radiograph.

Fig. 1
figure 1

Chest X-Ray

Fig. 2
figure 2

Matrix Representation

In next stage of transformation this 8 × 8 matrix is stacked as a vector of size 64 and is plotted as a time series in Fig. 3. Now stage is set for the transformation of this timeseries to a horizontal visibility graph. Once horizontal visibility graph algorithm processes this timeseries, a network is generated whose adjacency plot is shown in Fig. 4. Its largely sparse graph with few connectivity here and there as displayed by yellow-colored cells. Real network shape and connectivity patterns of horizontal visibility graph is demonstrated in Fig. 5. It can be visualized in multiple ways in R environment and exposes larger number of features and properties of horizontal visibility graphs resulting from chest radiographs.

Fig. 3
figure 3

Matrix to vector conversion of COVID positive training image, COVID_Train7 and corresponding plot of values

Fig. 4
figure 4

Adjacency matrix plot of horizontal visibility graph generated from COVID positive training image, COVID_Train7.jpeg. Only yellow cells are one indicating connectivity and rest are disconnected

Fig. 5
figure 5

Network plot of horizontal visibility graph generated from COVID positive training image, COVID_Train7

Before we move to next Fig. 6, we need to recollect the definition of heatmap. A heatmap is a two-dimensional grid kind of visual representation of data/information/signal in a colorful fashion. Heatmaps can aide the viewer in trying to make sense of a complex spatial distribution of information. What Fig. 6 shows is connectivity activity on a two-dimensional grid to communicate in a user-friendly fashion. Figure 7 provides another view of same horizontal visibility graph obtained from Covid positive image and Fig. 8 shows the same diagram with node size being proportional to 5th power of the degree of the node, i.e. highly connected nodes or hubs are depicted with larger circles as compared to sparsely connected nodes. Figures 9 and 10 show the degree distribution and cumulative degree.

Fig. 6
figure 6

Heatmap of horizontal visibility graph generated from COVID positive training image, COVID_Train7

Fig. 7
figure 7

Another network view of horizontal visibility graph generated from COVID positive training image, COVID_Train7

Fig. 8
figure 8

Network view with node size 5th power of degree for horizontal visibility graph generated from COVID positive training image, COVID_Train7.jpeg

Fig. 9
figure 9

Histogram of degree for horizontal visibility graph generated from COVID positive training chest radiograph, COVID Train7.jpeg. Moderately connected

Fig. 10
figure 10

Cumulative degree distribution for horizontal visibility graph generated from COVID positive training chest radiograph, COVID Train7.jpeg. Moderately connected

A histogram type horizontal visibility graph is demonstrated in Fig. 11 obtained from Covid positive chest radiograph data. Its curvy form is demonstrated in Fig. 12. Both demonstrate interesting connectivity patterns. At this stage Covid positive chest radiograph’s HVG is ready to be used by HIM-distance metric to compute similairty among different HVGs generated from normal and Covid positive chest radiographs.

Fig. 11
figure 11

Covid image matrix data

Fig. 12
figure 12

Visibility graph generated from COVID positive training chest radiograph, COVID Train7.jpeg

Fig. 13
figure 13

Flow chart for Covid computational disease tagging algorithm using Visibility graph and Network Distance HIM

4.2 Computational Infrastructure Deployed

Matlab has been used for performing geometrical part of the work. EMD aspect of this work has been performed in R software (Rstudio Version 1.3.1093 ©2009–2020 RStudio, PBC”Apricot Nasturtium” (aee44535, 2020-09-17) for Ubuntu Bionic Mozilla/5.0 (X11; Linux x86i 64) AppleWebKit/537.36 (KHTML, like Gecko) QtWebEngine/5.12.8 Chrome/69.0.3497.128 Safari/537.36) on a HP Probook laptop.

Laptop’s operating system and other basic information from comand uname -a is given below:

Linux Krishna 5.4.0–48-generic #52-Ubuntu SMP Thu Sep 10 10:58:49 UTC 2020

 × 86 64 × 86i 64 × 86 64 GNU/Linux

Output of hardware atributes of the laptop is as follows:

-memory

description: System memory

physical id: 0

size: 8320MiB

-cpu

product: Intel(R) Core(TM) i5-8250U CPU @ 1.60 GHz

vendor: Intel Corp.

physical id: 1

bus info: cpu@0

size: 3304 MHz

capacity: 3400 MHz

Finally, Fig. 13 depicts the flow chart for Covid computational disease tagging algorithm using Visibility graph and network distance HIM in a sequential fashion. It summarizes all the computational steps used in different stages of processing at high level.

5 Experimental Results

5.1 Computational Experiment

This part describes the result of automated disease tagging using horizontal visibility graph and HIM based network similarity (distance) computation. Chest radiographs used here are sourced from Rajiv Gandhi Cancer Institute and Research Centre [64, 63].

Data Preprocessing To keep the computation of images and their processing commensurate to hardware platform capabilities, all the radiographs acquired are converted into the JPG format. For fast processing and declaration of results in almost-real-time, radiographs are down sampled to the 32 × 32 pixel size irrespective of their original size.

Radiograph data is grouped into two main groups, training and testing. Training group has 10 normal and 10 covid positive radiographs. Normal radiographs are compared amongst each other using HVG–HIM algorithm and representative normal radiograph is computed like our previous work using EMD as shown in Table 1. Same process is followed for the covid positive radiograph and a covid-positive representative radiograph is obtained. Out of multiple network distance available, Hamming-Ipsen Mikhailov (HIM) network distance is used for comparing the visibility graphs because of its balanced nature as a both global and local network distance or similarity metric.

Table 1 HIM score table for computing HIM distance among normal chest radio-graphs and locating normal-Rep

Training Using Normal Chest Radiographs As shown in the flowchart in Fig. 13, we begin with evaluating the normal representative chest radiograph (Normal-Rep) among normal ensemble of training chest radiographs. This Normal-Rep will be used to compare the test chest radiograph with, to decide if test chest radiograph can be tagged normal or Covid-positive. This process of computing Normal-Rep is by converting all the normal training chest radiographs into visibility graphs and measuring their computational similarity with HIM metric. The Table 1 for deciding Normal-Rep is given below where third normal training chest radiograph has been designated Normal-Rep for this ensemble of ten normal training chest radiographs due to its highest similarity (Hence lowest column sum score) with all other normal training chest radiographs. Its score is indicated in third column and sum row in bold.

Training Using Covid Positive Chest Radiographs Following the flowchart in Fig. 1. We begin with evaluating the Covid Positive representative chest radiograph (CovidPositveRep) among Covid positive ensemble of training chest radiographs. This Covid Positive-Rep will be used to compare the test chest radiograph with, to decide if test chest radiograph can be tagged normal or Covid-positive. This process of computing CovidPositiveRep is realized by converting all the Covid Positive training chest radiographs into visibility graphs and measuring their computational similarity with HIM metric. The Table 2 for deciding Covid Positive-Rep is given below where fourth Covid positive training chest radiograph has been designated Covid Positive-Rep for this ensemble of ten Covid positive training chest radiographs due to its highest similarity (Hence lowest column sum score) with all other Covid positive training chest radiographs. Its score is indicated in fourth column and sum row in bold.

Table 2 HIM score table for computing HIM distance among covid positive chest radiographs and locating covid positive-Rep
Table 3 HIM score based disease tagging table for test covid positive and normal chest radiographs using HVG–HIM

Final Testing and Automated Disease Tagging for Test Chest Radiographs Using HVG–HIM In, the follow-up testing phase, both healthy and covid positive representative radiographs are compared using HIM distance with pretagged test dataset of 20 radiographs. This test dataset has both healthy and covid positive radiographs. The result of automated disease tagging is presented below. Let’s define U as HIM Distance from Covid Positive representative and H as HIM Distance from Healthy representative. A simple observation tells us that this algorithm is able to tag the chest radiographs with overall accuracy of 60%. Healthy radiographs have been tagged with 60% accuracy and also covid positive radiographs are tagged with 60% accuracy. A natural future direction arises where other network distance metrics can be leveraged over larger datasets (Fig. 12, Table 3).

6 Final Testing and Automated Disease Tagging for Test Chest Radiographs with EMD

To draw a fair comparison between EMD and HVZ-VG, we run the same computational with direct perceptual similarity between chest radiographs-based evaluation and diseases tagging with EMD. To keep the computation of chest radiographs and their processing commensurate to hardware platform capabilities, all the radiographs acquired are converted into the JPG format. For fast processing and declaration of results in almost-real-time, radiographs are down sampled to the 32 × 32-pixel size irrespective of their original size. This is in consonance with the same computational experiment carried out HVG–HIM. Results of EMD-based disease tagging is shown in Table 4 where a meagre 30% accuracy is reported, and correct disease tag rows are highlighted in bold. This is in sharp contrast with accuracy of 60% achieved with HVZ–HIM, albeit at a higher computational investment.

Table 4 EMD score based disease tagging table for test covid positive and normal chest radiographs

7 Reflections on HVG–HIM and EMD Similarity Metrics

This experiment on working with HVG–HIM and EMD has given us certain insights into the implementation of these algorithms. Earth Movers Distance (EMD) algorithm computes the discrepancy pixel by pixel in the chest radiographs and gives us the overall average difference between the chest radiographs as a similarity metric. In the case of horizontal visibility graph (HVZ), each pixel compares itself with all other pixels of the same chest radiograph and gives a graphical representation. This graphical representation of one chest radiograph is compared with other chest radiographs’s graphical representation using the network distance metric. For calculating the difference in these graphs various network distance metrics can be used. Here, we have used HammingIpsen-Mikhailov (HIM) distance.

From a computational aspect, EMD does far fewer calculations than HVG–HIM metric does. EMD computes the results within the few seconds for given set of ten chest radiographs with similar size whereas for the same task HVG–HIM network distance takes several minutes. Clearly, there is a learning that details matter. HVG–HIM is giving twice the accuracy of 60% compared to EMD which gives the accuracy of 30% for the same task. Naturally, HVG–HIM is achieving this performance because of large computational investment. This leads to an interesting deployment choice as in, where chances of Covid-positive prevalence is extremely low and high accuracy is not needed, one can deploy EMD based procedures but for regions where prevalence is higher and accuracy is of paramount importance, HVZ-HIM with serious computational infrastructure will be needed.

8 Towards a Web-Service Based Implementation

Covid has emerged as an unprecedented global pandemic with serious impact on every individual. Provision of immediate and adequate health infrastructure for covid patients visiting a health service facility or practicing tele consultancy based on pathological examination of chest radiograph is the need of the hour. After centuries of advancements and developments in different health practices like Allopathy, Ayurved, Homeopathy and other forms of treatment strategies, human efforts against Covid has been dwarfed. The whole medical community is fighting with all available resources to tackle the situation and treat the patients. Having said that it is hard to deny in comparison to covid patients, the number of skilled and trained health workers like doctors, nurses and other health service providers is not sufficient, owing to this provisioning gap, high mortality and prolonged morbidity is recorded, especially in LMICs (Low- and Middle-income Countries.)

Today, we are living in a digital world where enormous amount of technology enabled health services are being practiced all across the world specially in the form of digital health encompassing—tele consultancy, telemedicine, telehealth, big data etc. They help in gathering meaningful information, processing it and producing the report in almost real-time so that the policy makers can formulate evidence-based health strategies leading to follow-up of patients more effectively.

Keeping this urgent need in mind, we have developed our own web-portal which is capable of collaborating with all the hospitals and individual medical practioners/patients through a centralized server. This server is designed in such a manner that any individual or hospital can access the server after proper validation and store their relevant information related to patients. Data security and confidentiality has been maintained by the server strictly. Edit access has been limited to information owners only on the portal. The purpose of this webservice is to store the information, process it and produce the result in form of scientific evidence which can be significantly utilized by policy makers for better decision making. At the same time, by accessing the portal, one can get all the relevant and accurate information related to Covid in form of text, presentation and multimedia (audio/video/images). This portal will also provide an individual specific service like tele consultancy to register and forward the unanswered queries directly to the specialist doctors and back to the query generator (possibly patient or someone curious about a medical condition) (Figs. 14 and 15).

Fig. 14
figure 14

Transiton from research to service provision: A pilot webservice for automated COVID disease tagging

Fig. 15
figure 15

Resource for Automated disease tagging with chest radiographs in LMICs and resource challenged African Countries

8.1 Webservice Methodology

The web-based healthcare management system for Covid patients is poised with the latest front and web-page development language—PhP 7.3.28 and the core technology used is MVC (Model View Controller). There are different modules in the website which incorporate API (Application Programming Interface) to interact with dedicated servers for dedicated processing of HVG and HIM-distance in terms of image analysis using R (RStudio 3.6.3) and other statistical packages and report generation with an attractive and effective graphical representation for the available dataset. MySQL server is used as the backend RDBMS (Relational database Management System) for data input, process, and output to APIs and individuals for usage downstream. We have integrated these technologies because they are open-source and compatible for design, development, and deployment. Further, they are customizable as per the medical data-keeping requirements of this project. Security and confidentiality are maintained at all levels of data flow starting from information gathering to report generation. A brief architecture of webservice technology has been shown in Fig. 16.

Fig. 16
figure 16

Web Service Architecture for Covid Positive Chest Radiograph based Disease Tagging

9 Conclusions and Future Directions

Poor lung health is known to play a statutory role causing increased susceptibility related to COVID-19. Lifestyle choices including smoking and sedentary lifestyle leading to obesity aren’t the only factors that influences lung health, environmental factors such as air pollution also exert a considerable effect. Researchers believe that while consumption of tobacco products (both smoking and smokeless), along with occupational hazards such as exposure to indoor and outdoor pollution makes people more susceptible to the infection that causes COVID-19 and its complications because these environmental factors also significantly damage the body’s natural defenses against some bacteria and viruses. A large number of countries coming under the umbrella of LLMICs having populations endowed with poor lung function and consequently poor lung health reflect their health outcomes as poor.

The use of extant deep learning technologies is not necessarily solving the problem of integrating the evidences from the community level in resource limited healthcare systems as they are intensive and energy-hogging with respect to computational resources leading to increased carbon foot-printing and hence global warming.

To this end use of static algorithms such as Earth Movers Distance (EMD) and Horizontal Visibility Graph (HVG) add value as they require significantly lesser investment of computational resources and dispel the black box nature of the deep learning algorithms with the glass box nature with more transparency with respect to big data computation and analytics [65]. We propose to extend our studies on the use of EMD and HVG based time series analysis, in which dynamic timeseries and clinical multimedia segments are mapped to visibility graphs as being descriptions of the corresponding states and the successively occurring states are linked. This procedure capable of converting a dynamic time series to a temporal network and at the same time a network of networks could be provide us rich information benefiting short-term and long-term predictions about lung of an individual or community at large, thereby providing the policy administrators at local, regional and global level nuanced data for developing comprehensive niche specific solutions aimed at alleviating the lung-health disparities.

Use of multi-modal multi sensor fusion technologies combined with big data enabled platforms will go a long way in strengthening resource-deprived healthcare systems. Our proposed disruptive AI-enabled point of care solution aims to gather evidences from the community level so as to augment catering to the creation of affordable and accessible healthcare technologies will focus on the application of innovative concepts to improve health outcomes in an affordable and equitable manner to overcome healthcare disparities but also inculcate capacity building through the provision of unique platform to individuals /organizations to validate their proof of concepts to scale-ups and ultimately commercially viable sustainable solutions.

10 Device Utility

  1. 1.

    Has potential application as an Adjunct Clinical Aid for the Pulmonologist/Medical Professionals.

  2. 2.

    Automatic Classification of X-ray chest radiographs facilitating large scale screening of subjects in remote health camps.

  3. 3.

    Easy, fast and robust technology with capabilities to be implemented in web-based, desktop-based and smartphone-based applications when coupled with X-ray device on the internet.

  4. 4.

    It has potential of turning Covid disease management as a self-care exercise. Control moves from the hands of expensive hospital to cheap and affordable selfcare devices.