Keywords

1 Introduction

The first modern day deployment of fetal monitoring began in the early nineteenth century by de Kergeradee, whom suggested that listening to the baby’s heartbeat might be clinically useful [1]. He proposed that it could be used to diagnose fetal life and multiple pregnancies, and wondered whether it would be possible to assess fetal compromise from variations in the fetal heart rate. Today, monitoring the fetal heart during labour, by one method or another, appears to have become a routine part of care during labour, although access to such care varies across the world.

There are two basic types of fetal monitoring currently deployed routinely; intermittent auscultations (IA) and cardiotocography (CTG). With IA, the fetus is examined using a fetal stethoscope (a Pinard) at some frequency: typically every 15 min at the start of labour and more frequently (every 5 min) during delivery, though these figures are still debatable [2, 3]. This was the predominant form of fetal monitoring prior to the development of the CTG approach in the late part of the twentieth century. The CTG provides a continuous written recording of the fetal heart rate (FHR) and uterus activity (uterine contractions–UC). Note that the CTG can be recorded externally, using a Doppler type of device which is attached to a belt placed around the mother’s abdomen (external CTG). This tends to restrict the mother’s movement and prevents the use of a calming bath prior to delivery. During the actual delivery stage, after the amniotic sack has ruptured (either naturally or induced), a CTG clip can be placed on a presenting part (typically the head) and the FHR recorded (typically no UC data is acquired at this time). This form of CTG is termed an internal CTG.

Given this methodological approach to fetal monitoring, what information acquired from EFM should be utilised to render a decision as to the health status of the fetus? Essentially, there are four features which have been defined within the Guideline for Clinical practise, which are described in Table 1 [3]. The features are collected using standard medical equipment (termed a CTG) as described above, and are converted into a global feature which categorises the health status of the fetus (the first column in Table 1). This global feature is then utilised to generate a discrete category of the fetal status over each time point during the delivery monitoring period. There are three mutually exclusive categories which are depicted in Table 2.

Table 1 Summary of the fetal heart rate (FHR) features utilised to characterize the current health status of the fetus (see [3] for details)
Table 2 Nomenclature and definitions of the fetal health status (N, S, & P) indicated by the data tabulated in Table 2. (see [3] for details)

The categories defined in Table 2 form a set of guidelines used by medical practioners to assess the health status of the fetus on a general level during delivery. The question is whether these guidelines are sufficiently robust to provide an accurate assessment of the fetal status. One approach to addressing this problem is to evaluate large datasets that contain feature information collected across a large sample of subjects. This chapter first examines typical feature sets that have been made publically available and secondly, it presents a few randomly selected case studies where these publically available datasets have been used for computational analysis. The chapter ends with a brief discussion of the impact computational studies have produced in terms of informing the current set of guidelines.

2 Publically Available CTG Datasets

The UCI data repository is a reputable portal for freely available datasets collected from a wide range of domains [4]. In particular, there is a CTG dataset at the UCI data archive, which consists of the CTG records of 2,126 patients acquired up to and including actual delivery (acquired on the day of delivery). It should be noted that this dataset has been reported most frequently as the data source for publications available over the internet [59]. For each patient record, a classification was assigned based on pre and post-partum evaluation by a qualified and licensed medical practitioner (obstetrician). The feature set consists of nominal and real-valued features which focus on the four principle features of CTG (baseline, accelerations, decelerations, and variability). In addition, the FHR and derivative features such as histograms were computed off-line, yielding a total of 21 features in the dataset. There were no missing values, all data was acquired under identical conditions and involved the same trained medical staff, minimising inter-subject variation. There were three decision classes (assigned by medical professionals): Normal, Suspect, and Pathological, with an object distribution of 78, 14, and 8 % respectively (1,655, 295, and 175 for each case respectively).

From a computational perspective, the dataset is typical in the biomedical domain. There are a fairly large number of features (21), and though there are no missing values, the three categories are not equally represented in terms of the cardinality of each decision class. For instance, the ‘Normal’ class contained 78 % of the data, significantly biasing the data in that direction. Despite this unequal distribution of decision class exemplars, most reported studies indicate a very high classification rate, along with relatively high values for the positive and negative predictive value [1012]. These results obtained from a variety of difference approaches probably reflects the direct mapping of the features collected to the basis for rendering the decision. That is, the physician will base his decision on the definitions in Tables 1 and 2, and the features extracted are exactly those features. Judging by the consistently high classification accuracy of this dataset, acquired from diverse approaches such as neural networks, decision trees, support vector machines, rough sets, the relationship between the features and the decision class is supported by the data. This is a very encouraging result which could be exploited for the deployment of a real-time EFM system suitable in a clinical setting.

One approach to developing such a real-time monitoring system (and given the low sampling frequency, the ‘real-time’ adjective is not terribly critical) is to deploy a rule-based approach. There are several interesting papers which provide a relatively flat rule base from which one can comprehend and utilise quite efficiently in an expert system based approach. Our lab has applied a rough sets based approach to classifying the UCI based CTG dataset. The results of this study yield 100 % classification accuracy, with a PPV of 1.0 and a NPV of 1.0 as well [13]. The feature set of 21 values was reduced to 13, which were roughly evenly spread across all four major features (see Fig. 1). Other machine learning based systems have yielded similar values for classification accuracies over 99 % in many cases, but typically yield much lower values for PPV and NPV (see [1012] for examples). The advantage of a rule-based approach, as opposed to a more recalcitrant approach such as Neural Networks is that these approaches (rule-based) reveal the underlying features and their values in the classifier per se. Note that rough sets (the approach utilised in [13]) generates a rule-based system which typically yields very high values for classification accuracy, PPV, and PNV (see [14] for a tutorial based discussion of rough sets). There is no need to explore the weight space or perform parameter sweeping in order to determine which features are important in the resulting classification scheme. As an example of the output of a rule based system, a decision tree based classification result is depicted in Fig. 1.

Fig. 1
figure 1

The resulting decision tree generated from a rough sets approach to CTG data classification (see [13])

3 Conclusions

That the ability of a variety of machine learning tools can extract useful and clinically relevant information from the large pool of available EFM data is without question. This type of data is extremely useful to both the machine learning community as well as the relevant domain (the medical) community. Rarely is there such as direct mapping of feature space onto decision classes, yielding such accurate and clear cut decisions. With the ability to map feature values to the major categories (Normal, Suspicious, or Pathological), the ultimate question to be addressed is the actual medical significance of the CTG in the context of patient care.

First and foremost, the CTG measures the fetal heart rate, which is a direct measure of fetal circulation. The FHR is categorised roughly into 3 disjoint cases: Normal, Suspect, or Pathological (N, S, or P respectively). These cases are based on measurements of a 4 valued feature set composed of: baseline value, short term variability, accelerations and decelerations. Further, another potential class is based on the response of the fetus to external stimuli such as contractions and fetal movements. The baseline feature is a record of the beats per minute, which tends to decrease during the last trimester up to deliver. The short term variability (STV) reflects the ongoing balance between accelerations and decelerations (via sympa/parasympathetic activity controlled centrally) of the heart rate. Accelerations and decelerations occur naturally, when the fetus moves, the uterus contracts or the head moves down into the pelvic region. In addition, the administrations of drugs to the mother may induce changes in accelerations/decelerations. Based on these features which are recorded during CTG, subjects are classified as either: Normal, suspect, or pathological. The fundamental physiological variable that underlies these features is the oxygen delivery level to the fetus. If the umbilical cord is temporarily occluded, oxygen levels to the fetus are decreased (hypoxia) and the fetus will undergo hypercapnia (increase CO2 levels), which in turn cause respiratory acidosis. Furthermore, reduced oxygenation will induce metabolic acidosis, both of which induce a redistribution of blood flow to the fetus to compensate for these altered physiological states. These changes are normal—even fetal movement may cause fluctuations in the hemodynamics. What is important to monitor is the magnitude and duration of these measurable responses. Prolonged bradycardia (reduced heart rate below 100 bpm) is a very serious condition which may require immediate medical intervention. The magnitude, frequency, and duration of these physiological responses are used to categorise the status of the fetus.

That the CTG can be utilised in a real-time environment in an automated fashion is supported by the encouraging classification accuracy obtained by a variety of systems (with an emphasis here on rule-based systems). Training a rule based system based on existing literature results is the first step in this process. These results must be verified on unseen data in order to assess the generalisability of the resulting rule set. The results from a variety of such studies indicate high accuracies, PPV, and PNV values. These results together support the hypothesis that automated CTG based EFM systems are clinically useful. With a trained system in place, applying the rule-set (or other suitably trained machine learning algorithm) is typically extremely fast with respect to the phenomenon under investigation (especially true for the frequency deployed in EFM). So, from a computational perspective, EFM, measured via CTG, is a solved problem, in terms of being able to classify the fetus into one of the categories depicted in Table 2 (N, S, & P). The question remains whether or not this technology can be deployed at a lower level and/or integrated into a more comprehensive framework such as deployment throughout the 3rd trimester. To the first point, whether the system can detect secondary outcomes such as: it can detect the underlying causes of changes which place the fetus into one category or another? Integrating changes of blood oxygenation, CO2 levels, blood pH, etc. could be integrated into the monitoring system, informed by the CTG results. The results could be the development of a more biologically comprehensive model of the fetus, instead of reporting fiduciary marks at specific frequencies. As for the second issue, computerized antenatal care, the system could be deployed to assess secondary outcomes such as gestational age at birth, and neonatal seizures would be extremely helpful [Cochrane Study paper]. We are undertaking this work in our lab—but requires additional data be placed in a publically available repository to complete this work. For this to happen, computational biologists require the cooperation and efforts of the medical community.