Identification of Clinically Relevant Groups of Patients Through the Application of Cluster Analysis to a Complex Traumatic Brain Injury Data Set

McLennan, Flora; Hawthorne, Christopher; Shaw, Martin; Piper, Ian

doi:10.1007/978-3-319-22533-3_10

Flora McLennan³,
Christopher Hawthorne³,
Martin Shaw⁴ &
…
Ian Piper⁴

Part of the book series: Acta Neurochirurgica Supplement ((NEUROCHIRURGICA,volume 122))

2690 Accesses

Abstract

In neurological intensive care units (NICUs) we are collecting an ever increasing quantity of data. These range from patient demographics and physiological monitoring to treatment strategies and outcomes. The BrainIT database is an example of this type of rich data source. It contains validated data on 264 patients who suffered traumatic brain injury (TBI) admitted to 22 NICUs in 11 European countries between March 2003 and July 2005 [1, 6].

Access provided by Autonomous University of Puebla. Download chapter PDF

A systematic literature review of clustering techniques for patients with traumatic brain injury

Article 24 June 2023

Big Data Collection and Traumatic Brain Injury

Hierarchical Cluster Analysis Identifies Distinct Physiological States After Acute Brain Injury

Article 18 October 2021

Introduction

In neurological intensive care units (NICUs) we are collecting an ever increasing quantity of data. These range from patient demographics and physiological monitoring to treatment strategies and outcomes. The BrainIT database is an example of this type of rich data source. It contains validated data on 264 patients who suffered traumatic brain injury (TBI) admitted to 22 NICUs in 11 European countries between March 2003 and July 2005 [1, 6].

The application of data mining techniques may allow us to identify patterns or relationships in clinical databases and lead to new insights that improve patient care [9]. Cluster analysis is a form of data mining that allows researchers to analyse a data set in its entirety and provides the opportunity to identify patterns within the data [4]. The aim of cluster analysis, unlike simply categorising data, is to identify linked groups based on multivariate factors. Reasons for the formation of these groups can then be hypothesised and tested in a clinically appropriate scientific manner.

Cluster analysis has previously been applied to high-frequency physiological data collected from adult patients following TBI or polytrauma and from paediatric patients following TBI [2, 8, 10]. It has been possible to identify complex patient states that may be used to predict patient outcome [10]. However, these previous studies have been restricted to small numbers of patients and focussed only on continuous variables. The BrainIT database contains data on a much larger number of patients and the data are continuous, ordinal and categorical. We therefore describe a pilot study using cluster analysis to identify distinct groups of patients within the BrainIT database.

Materials and Methods

Data Processing

The BrainIT database is organised into nine large data tables: demographic and one-off clinical data; daily management data; laboratory data; event data; surgical procedures; monitoring data summary; neurological event summary; targeted therapies; vital monitoring data. For the purposes of this pilot study we decided to focus on the demographic and one-off clinical data. As mentioned above, this would mean dealing with a mixture of continuous, ordinal and categorical data. Also, the quantity of data involved compared with the vital monitoring was far less computationally demanding. The table contains 109 variables including age, gender, hospital admission and transfer times, diagnosis, admission physiological variables and Glasgow Coma Scale (GCS) score, admission laboratory results and Glasgow Outcome Scale (GOS) score.

The data table was processed to optimise its clinical relevance and suitability for cluster analysis. For example, dates and times were converted to “time to” values, with time of trauma being time zero. Any negative or unrealistically large values identified were examined and manually corrected if an obvious date/time input error was present, or else made blank. Additionally, multiple units used for hydrogen ion, haemoglobin and glucose concentrations were converted to a standard. An appendix documenting all of the alterations made to the data table and their rationale will be made available with the BrainIT database.

We further processed the data table to create an increased data density for cluster analysis. To achieve this, we limited analysis to variables with at least 70 % data entered and patients with at least 65 % of these variables measured. Ultimately, we created two final tables, with one containing a combination of categorical, ordinal and continuous data and another containing continuous data only.

Cluster Analysis

Gower’s dissimilarity metric was calculated to determine the difference between patients based on each of the measured variables [3]. It is regarded as the most appropriate measure when performing analysis on mixed data types. The resulting metric was then used to perform agglomerative hierarchical cluster analysis. This is illustrated in Fig. 1.

All processing of data tables and cluster analysis was performed in r studio version 0.95.258 with r statistics version 2.13.1. The Gower dissimilarity metric was calculated using the gower.dist function and agglomerative hierarchical cluster analysis using the function hclust.

Results

Data Processing

Processing of the demographic and one-off clinical data table resulted in the creation of one data table containing 42 variables for 251 patients and another containing 10 continuous variables for 203 patients (Fig. 2).

Cluster Analysis

Agglomerative hierarchical clustering of the combined data revealed five clusters selected based on visual inspection of the resulting dendrogram (Fig. 3). Cluster B contained most patients (160), clusters A, C and D were of similar size (33, 22 and 33) and cluster E was an outlier with only three patients (Table 1). The features of clusters A and D were the most clinically interesting. Patients in cluster A tended to be older and were most likely to have: consumed alcohol; fallen or been assaulted; previous dysfunction; mass lesion on CT (computed tomography). Despite having reasonable GCS motor scores on admission they had the highest mortality. Patients in cluster D tended to be younger and were most likely to have: been in a road traffic accident; associated multi-trauma; hypoxia and hypotension on admission; lowest GCS motor score; dilated and non-reactive pupils; lowest GOS.

Table 1 Selected group characteristics from the cluster analysis of all available data

Full size table

Hierarchical cluster analysis of the continuous data table revealed three clusters based on visual inspection of the dendrogram. Cluster A contained most of the patients (175), whereas clusters B and C were of equal size (14; Table 2). Cluster C seems to have the most unique features with a tendency to lower SaO₂ and PaO₂ values, associated with increased transfer time from the pre-neurosurgical hospital (PNSH) to the neurosurgical hospital (NSH).

Table 2 Group characteristics from the cluster analysis of continuous data

Full size table

Discussion

This is a pilot project of cluster analysis of the BrainIT database and is the first analysis of its type in this rich data source. However, results produced through cluster analysis must be interpreted with caution. The clusters formed are dependent upon the dissimilarity metric and the method of cluster analysis chosen [4]. In this study, we used the Gower dissimilarity metric owing to the inclusion of categorical, ordinal and continuous data. Similar to previous studies in this domain, we elected to perform agglomerative hierarchical cluster analysis [2, 8, 10].

With the above caution in mind, we believe that our analyses have identified clusters of patients that are physiologically and clinically sensible. In analysing the table containing combined data types, we have identified two clinically relevant clusters of patients. The first is an older group, who have a higher rate of falls and are more likely to have mass lesions on CT. The second is a younger group, who have suffered multi-trauma and have markers of severe injuries on admission with more hypoxia, hypotension, pupil abnormalities and lower GCS scores. As would be expected from existing predictive models [5, 7], these patients tended to have poorer outcomes as assessed by the GOS.

In addition, analysis of the table containing only continuous data revealed a relevant cluster of patients who tended to have lower SaO₂ and PaO₂ values. It can be reasoned that this physiological derangement led to them also having longer transfer times from the PNSH to the NSH.

This project has successfully demonstrated the feasibility of using cluster analysis in the exploration of a large, mixed database of TBI patients. The unsupervised identification of clinically recognisable groups of patients supports the validity of the results presented. We now plan to apply similar analyses to other tables in the BrainIT database that contain high-frequency physiological data including blood pressure and intracranial pressure. The ultimate aim is to identify demographically and physiologically distinct groups of patients who will be amenable to specific treatment strategies.

References

BrainIT Research Group. Brain monitoring with information technology. http://www.brain-it.eu. Accessed 15 December 2012
Cohen MJ, Grossman AD, Morabito D, Knudson MM, Butte AJ, Manley GT (2010) Identification of complex metabolic states in critically injured patients using bioinformatic cluster analysis. Crit Care 14:R10
Article PubMed PubMed Central Google Scholar
Gower JC (1971) A general coefficient of similarity and some of its properties. Biometrics 27:857–871
Article Google Scholar
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31:264–323
Article Google Scholar
McHugh GS, Engel DC, Butcher I et al (2007) Prognostic value of secondary insults in traumatic brain injury: results from the IMPACT study. J Neurotrauma 24:287–293
Article PubMed Google Scholar
Piper I, Citerio G, Chambers I et al (2003) The BrainIT group: concept and core dataset definition. Acta Neurochir (Wien) 145:615–628; discussion 628–629
Article CAS Google Scholar
Steyerberg EW, Mushkudiani N, Perel P et al (2008) Predicting outcome after traumatic brain injury: development and international validation of prognostic scores based on admission characteristics. PLoS Med 5:165
Article Google Scholar
Sorani MD, Hemphill JC, Morabito D, Rosenthal G, Manley GT (2007) New approaches to physiological informatics in neurocritical care. Neurocrit Care 7(1):45–52
Article PubMed Google Scholar
Sullivan R (2012) Introduction to data mining for the life sciences. Springer Science + Business Media, New York
Book Google Scholar
Wainwright MS, Lewandowski R (2012) Bioinformatics analysis of mortality associated with elevated intracranial pressure in children. Acta Neurochir Suppl 114:67–73
Article PubMed Google Scholar

Download references

Acknowledgements

The authors would like to acknowledge the work of the BrainIT group of investigators and participating centres in the BrainIT data set without whom this work could not have been conducted: Barcelona, Spain: Prof Sahuquillo; Cambridge, UK: Prof Pickard; Edinburgh, UK: Prof Whittle; Glasgow, UK: Mr Dunn; Gothenburg, Sweden: Dr Rydenhag; Heidelberg, Germany: Dr Kiening; Iasi, Romania: Dr Iencean; Kaunas, Lithuania: Prof Pavalkis; Leipzig, Germany: Prof Meixensberger; Leuven, Belgium: Prof Goffin; Mannheim, Germany: Prof Vajkoczy; Milan, Italy: Prof Stocchetti; Monza, Italy: Dr Citerio; Newcastle upon Tyne, UK: Dr Chambers; Novara, Italy: Prof Della Corte; Southampton, UK: Dr Hell; Uppsala, Sweden: Prof Enblad; Torino, Italy: Dr Mascia; Vilnius, Lithuania: Prof Jarzemaskas; Zurich, Switzerland: Prof Stocker.

Conflict of Interest Statement

We declare that we have no conflict of interest.

Author information

Authors and Affiliations

Academic Unit of Anaesthesia, Pain and Critical Care Medicine, University of Glasgow, Level 4, Walton Building, Glasgow Royal Infirmary, 84 Castle Street, Glasgow, G4 0SF, UK
Flora McLennan & Christopher Hawthorne
Department of Clinical Physics, NHS Greater Glasgow and Clyde, Glasgow, UK
Martin Shaw & Ian Piper

Authors

Flora McLennan
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Hawthorne
View author publications
You can also search for this author in PubMed Google Scholar
Martin Shaw
View author publications
You can also search for this author in PubMed Google Scholar
Ian Piper
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christopher Hawthorne .

Editor information

Editors and Affiliations

Department of Neurosurgery, National Neuroscience Institute, Singapore, Singapore
Beng-Ti Ang

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

McLennan, F., Hawthorne, C., Shaw, M., Piper, I. (2016). Identification of Clinically Relevant Groups of Patients Through the Application of Cluster Analysis to a Complex Traumatic Brain Injury Data Set. In: Ang, BT. (eds) Intracranial Pressure and Brain Monitoring XV. Acta Neurochirurgica Supplement, vol 122. Springer, Cham. https://doi.org/10.1007/978-3-319-22533-3_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-22533-3_10
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22532-6
Online ISBN: 978-3-319-22533-3
eBook Packages: MedicineMedicine (R0)

Publish with us

Policies and ethics

Identification of Clinically Relevant Groups of Patients Through the Application of Cluster Analysis to a Complex Traumatic Brain Injury Data Set

Abstract

Similar content being viewed by others

A systematic literature review of clustering techniques for patients with traumatic brain injury

Big Data Collection and Traumatic Brain Injury

Hierarchical Cluster Analysis Identifies Distinct Physiological States After Acute Brain Injury

Introduction

Materials and Methods

Data Processing

Cluster Analysis

Results

Data Processing

Cluster Analysis

Discussion

References

Acknowledgements

Conflict of Interest Statement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Identification of Clinically Relevant Groups of Patients Through the Application of Cluster Analysis to a Complex Traumatic Brain Injury Data Set

Abstract

Similar content being viewed by others

A systematic literature review of clustering techniques for patients with traumatic brain injury

Big Data Collection and Traumatic Brain Injury

Hierarchical Cluster Analysis Identifies Distinct Physiological States After Acute Brain Injury

Introduction

Materials and Methods

Data Processing

Cluster Analysis

Results

Data Processing

Cluster Analysis

Discussion

References

Acknowledgements

Conflict of Interest Statement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation