Introduction

A most urgent challenge in healthcare currently is the phenomenon of preterm labor, or labor prior to 37 completed weeks of gestation. Preterm birth contributes to handicaps generally, and is also the cause of about 85% of all perinatal deaths.27 Preterm neonates, with birth-weights less than 2500 g, represent about 10% of the total number of babies born each year. The complications of preterm birth include significant neurological, mental, behavioral and pulmonary problems in later life. Among the preterm survivors, the rate of neurological impairment varies from 10% to 20%, and growth restriction occurs in approximately 20% of the surviving infants. The development of effective methods to prevent or reduce the incidence of preterm birth depends largely upon finding indicators for the problem.

Knowing that true labor (which will ultimately lead to delivery) has begun, as well as predicting when it will start, is important for both normal and abnormal pregnancies. Diagnosis of labor in normal pregnancies is important for minimizing unnecessary hospitalizations, interventions and expenses. On the other hand, accurate identification of spontaneous preterm labor would also allow clinicians to start necessary treatments early in women with true labor and avert unwarranted treatment and hospitalization for women who are simply having preterm contractions, but who are not in true labor. Various clinical techniques for determining this, such as the use of fetal fibronectin, salivary estriol, cervical ultrasound, and tocodynamometer (TOCO), to name a few, all have a limited range of usefulness in this regard.6, 13 Even noticeable dynamic cervical change, long thought to be indicative, may not always be an accurate identifier of true labor, since a large percentage of women with established cervical dilation do not actually deliver preterm, even when they are not treated with labor-inhibiting, or tocolytic, agents.19 No present method yields both a positive and a negative predictive value above 65%. What is called for, then, is an alternative diagnostic paradigm.

Uterine electromyography (EMG) has shown great promise for monitoring patients during pregnancy. Early studies established that the electrical activity of the myometrium, or uterine muscle, is responsible for myometrial contractions.16, 24 Many experiments have been performed in the last several decades in an attempt to monitor uterine contractility using the electrical activity measured from electrodes placed directly on the uterus.2, 5, 28 The latest work indicates that uterine EMG activity can be monitored non-invasively from the abdominal surface.1, 8, 9 Once uterine EMG data are obtained, however, they must be assimilated in some fashion to get a diagnostic result.

Artificial neural networks (ANNs) are mathematical algorithms that are ideal for the classification of objects (e.g., patients) based upon one or more input variables (e.g., uterine EMG variables).4, 11, 12 “Artificial intelligence” is the field of computer science that attempts to give computers humanlike thought. One of the primary means by which computers are endowed with such capability is through the use of an ANN.

An ANN is composed of fundamental components, usually a circuit or computer program, which are designed to be the machine equivalent of neurons in the brain of living creatures. ANNs are made up of inputs, “hidden layers” (a system of artificial neurons to process the data), and outputs. Modern applications of ANNs include the following: risk management, data validation, industrial process control, sales forecasting, speech recognition, diagnosis of hepatitis, interpretation of multi-meaning Chinese words, undersea mine detection, texture analysis, three-dimensional object recognition, hand-written word recognition; and facial recognition.4, 11, 12 Most recently, work has progressed which will utilize various biological and clinical markers for evaluating the risk of pre-term labor using ANNs.20

For the present study, we wanted to determine how well ANNs can classify pregnant patients. Values calculated for a number of uterine EMG parameters were used as inputs for a particular type of ANN, and the outputs, specifically patient classifications, were compared to clinical assessments. The joining of uterine EMG with ANNs in this way may produce a powerful, objective tool for assessing labor.

Materials and methods

Patients

The University of Texas Medical Branch (UTMB) Institutional Review Board gave approval for this study, and all patients were required to sign written consent forms. We included 134 term and 51 preterm women (all of whom ultimately delivered spontaneously). An additional 28 patients were excluded, as they exhibited complications sufficient to justify delivery via cesarean section. Only those patients for whom a clear clinical determination of labor or non-labor could be made were included. Patients were recruited over a period of 4 years. Maternal ages were within the 17- to 36-year-old range. Gestational ages ranged from 24 to 41 weeks. Measurement-to-delivery (MTD) times varied from less than 1 day in the case of term labor patients, to over 100 days in the case of preterm non-labor patients. All patients included in the study were admitted to the labor and delivery area of The University of Texas Medical Branch, in Galveston, Texas.

Signals

Contractions were monitored for at least 30 min using bipolar electrodes (Quik-Prep models, Quinton, Bothell, Washington) placed non-invasively on the abdominal surface. A standard 4-electrode arrangement was used for all patients (symmetric about the navel, with vertical and horizontal axes parallel to the patient vertical and horizontal axes, respectively, and with center-to-center distances between adjacent electrodes set at 6.35 cm).7, 22, 23 Analog uterine electrical signals were differentially band-pass filtered from 0.05 to 4.00 Hz, and sampled at 100 Hz using a custom-built uterine EMG patient-monitoring system (Reproductive Health Technologies, www.reproductivehealthtech.com). For each patient record, the optimal differential channel combination was utilized for calculations (i.e., the two-channel differential signals that exhibited the greatest effective signal/noise ratio, as calculated for the mean rectified voltage during “bursting,” or elevated electrical activity, periods, divided by the mean rectified voltage during quiescent, or “non-bursting,” periods). The mean S/N ratio for all optimal channels was 6.57 ± 4.90, while it was 2.47 ± 1.87 for the non-optimal channels.

From every recording, only the bursts of uterine electrical activity occurring during verifiable contractions were analyzed, while periods of inactivity between contractions were not quantified (for inclusion purposes, uterine electrical bursts were defined as a succession of increased-amplitude voltage spikes whose mean amplitude remained at >2× the mean baseline activity, and whose duration was >10 s; an additional inclusion criterion was that the burst also had to correspond, at least approximately, temporally to uterine pressure events as measured by TOCO). These burst periods were ascertained manually by the investigators offline (although current efforts are underway to develop software to automatically identify and quantify bursts in real-time). In order to exclude from analysis most of the artifact components due to motion, respiration, and cardiac electrical signals, only frequencies in the 0.34–1.00 Hz “uterine-specific” range were evaluated. This was accomplished by subsequently filtering the signals digitally. Standard TOCO external mechanical force readings were also recorded simultaneously by a transducer belt around the patients’ abdomens, but only to assist with additional confirmation of the presence of uterine contraction events. Since previous work with the crude TOCO instrument has indicated limited, if any, predictive utility,25 no numerical data were utilized from that instrument for this study.

Calculation of ANN Input Parameters

Spectral analysis was used to help quantify the electrical signals. Chart 5 software (ADInstruments, Castle Hill, Australia) was utilized for the spectral analysis. The magnitude squared of the Fast Fourier Transform (i.e., the FFT-generated periodogram) was used as an estimator of the spectrum. An even multiple of 8192 data points, using a 50% overlap for subsequent FFT windows (cosine-bell window used to reduce edge effects), was distributed in such a way as to include the greatest number of data points that were contained within each identified burst (Fig. 1). In this way, the 8192 FFTs actually step through the burst data in approximately 4.1-second intervals. The average periodogram for a given burst was then calculated using the several calculated periodograms for that burst, and this average is what was used to assess the dominant spectral peaks from the overall burst.

Figure 1.
figure 1

An example of a uterine electrical burst, and the use of overlapping, successive FFT windows to estimate the overall burst power. The beginning and end-points of the bursts are first determined. Then a multiple of 8192-sized FFT windows is applied to the burst-trace in such a way as to cover it maximally, temporally speaking. In this particular case, the burst is approximately 37 s long, requiring 8 FFT windows (overlapped 50%, using cosine-bell windowing to reduce edge effects). The total temporal “coverage” of the FFT windows is approximately 9 × 4.1 = 36.9 s. The 8 individual FFTs are used to find an average FFT, which is then used for determining spectral peaks.

The mean duration of the bursts in this study was 50.5 s. Although the uterine EMG traces, as a whole, are non-stationary, each of the short, individual FFT time-periods in the recordings that was assessed within the bursts was approximately stationary, i.e., these parts of the trace are locally stationary,14, 21 especially when considering the very-low-frequency uterine band-width being investigated.22 Furthermore, we were not concerned with (at least in the present study) mapping, temporally, any of the minute changes in the signal which may have occurred within the duration of the bursts themselves. Rather, we were only interested in an approximate overall power measure for each burst as a whole.

The largest-magnitude power spectrum (PS) peak within the 0.34–1.00 Hz range was found for each burst, and its frequency was noted. The average PS peak frequency for a given patient was then found by averaging the frequencies of the largest PS peaks from the bursts for the particular patient. A similar procedure was used to find the average burst duration: the duration of any particular uterine electrical burst was determined by noting the onset and termination time-points for the burst. Total activity for a patient was calculated by multiplying the number of bursts in a 30-minute period for the patient by the mean burst duration for that patient.

So, in all, the bursts of uterine electrical activity, corresponding to contractions, were quantified by finding the means and standard deviations (where applicable) of the PS peak frequency (i.e., frequency at which the greatest power occurs, using an 8192-point-size FFT for sufficient frequency resolution), burst durations, number of bursts per unit time, and total activity. Initially, seven input parameters were calculated from the uterine EMG for use in the ANN. An eighth input parameter was also formed by taking the ratio of the PS peak frequency divided by the standard deviation of the burst duration. These particular variables were chosen as a result of our previous studies that showed that they may either have some predictive worth, or some physiological or clinical significance.1, 8, 22

It must be mentioned that for the non-laboring patients, especially those who were preterm, the uterine electrical bursting activity was often not very well correlated with TOCO activity, and the electrical bursts themselves were, at many times, short-lived. However, for all the patients studied, at least three uterine electrical events were identified within the recording that satisfied the inclusion criteria described above.

Although the techniques for the clinical determination of pre-term labor are unreliable in many patient cases, there still exist a number of preterm patients who do exhibit such obvious symptoms of labor (or non-labor) that clinical determination can be made with a high degree of certainty for them. Since these are the types of patients we included in this particular study, term and preterm patients were each clinically divided into two subgroups, which resulted in the following categories: [term laboring (TL): n = 75; preterm laboring (PTL): n = 13] and [term non-laboring (TN): n = 59; preterm non-laboring (PTN): n = 38], with labor assessed by a physician using clinical determinations that incorporated contraction frequency, cervical dilation, station (which is, roughly speaking, the relative position of the fetus to the maternal pelvis), and effacement (cervical “thin-ness”), among other factors. For clinically assessing labor, often the “typical” contraction frequency is >3 contractions/10 min, with cervical dilation >3 cm, but this is not necessarily an industry standard, and can vary from patient to patient, and must therefore be interpreted in the context of other clinical parameters.

ANN Data Processing

Once all uterine EMG variables were calculated, the full data set was conditioned according to common practice for ANN data preparation,18 specifically by rescaling all data values to common decimal representation, replacing any missing data values with data set means for that parameter, and by removing outliers (according to Grubb’s test). Training of the ANN was accomplished by running training data (38 TL patients, 30 TN patients, 6 PTL patients, and 19 PTN patients) through 10,000 iterations, and using a learning rate of 0.6. Training was performed separately for the term analysis and then again for the preterm analysis. The initial learning rate (see below) was chosen so as to be high enough to increase the learning capability of the network while not being so high as to cause the network to “oscillate.” Based on previous publications,3, 10 we tried increasing the initial learning rate from 0.1 to 0.5, and then finally up to 0.6, with no difference in the classification results. ANN (NeuroXL Classifier, www.analyzerXL.com) was then used on the calculated EMG input data in an attempt to algorithmically classify patients into labor and non-labor. The particular ANN used for classification in this study utilizes the “Kohonen” method. This was chosen because in some studies, the Kohonen method gave superior results over the more traditional back-propagation neural network method.29 Ease of use (unsupervised training – see below) is another reason the Kohonen method was employed.

A Kohonen ANN differs considerably from a feed-forward back-propagation neural network, for example. The Kohonen ANN differs both in how it is trained and how it recognizes a pattern (Fig. 2). The following steps apply generally to Kohonen networks:

  1. (a)

    Input the values αi, where i = 1, ..., n.

  2. (b)

    Output nodes compute the output values βj by

    $$ \beta _{\text{j}} \sum\limits_{i = 1}^n {(\alpha _1 - \sigma _{{\text{ij}}} )^2 {\text{,}}} $$

    where j = 1,..., m. and where σ ij is the weight associated with the connection from the i-th input node to the j-th output node.

  3. (c)

    “Winning” node is defined as the node with minimum output β min = min{βj}

  4. (d)

    Update weights of the connections to the winning node and its neighbors.

  5. (e)

    Repeat all steps for specified number of iterations.

Figure 2.
figure 2

The layers of an ANN include the inputs (for the present study, uterine EMG calculated values), the hidden layer(s), and the outputs (in the present study, classification of patients as either labor or non-labor). The hidden layer in our ANN, a Kohonen layer, is where the actual decision-making process essentially takes place.

When the learning rate is set to decrease as the number of iterations increases,15 as in our own algorithm, convergence is guaranteed in a Kohonen neural network. This convergence was assumed to have been achieved if there was no subsequent change in the final classification state of the neural network with at least a 2nd-order-of-magnitude increase in the number of iterations.29 In fact, beyond 10,000 iterations, no change was seen in the final classification state for a 2nd-order-of-magnitude increase in the number of iterations (up to 1,000,000).

The Kohohen ANN does not use an activation function, the output does not consist of the output of several neurons, and it is trained in an unsupervised fashion. This means that the Kohonen network is presented with data, but that the correct output corresponding to the data is not specified a priori. When data are presented to a Kohonen network, one of the output neurons is selected as a “winner.” The winning neuron becomes the output from the Kohonen network. These winning neurons represent groups, or clusters, in the data set. Using the Kohonen network, then, the output data are already essentially classified. For this study, these ANN-sorted groups were designated as term labor (TLS), term non-labor (TNS), pre-term labor (PTLS), and pre-term non-labor (PTNS).

Statistics

After ANN sorting of patients was performed, the effectiveness of the resulting patient classification was estimated by the following:

  1. (1)

    The percentage of correctly categorized patients was found by comparing the ANN-sorted patients to the clinically-determined patients separately for both the term and preterm groups, with the following rule:

    If a = number of patients in the clinically-sorted group, and b = number of those same patients in the associated ANN-sorted group, then % correctly identified for the patient group is b/a × 100%.

    A correctly-identified patient percentage of ≥65% was considered acceptable, since no current clinical method can accurately distinguish a greater percentage of preterm labor and non-labor patients.6, 13 Any combinations of ANN input parameters which yielded lower classification results than this were invalidated.

  1. (2)

    Comparisons of gestational age and MTD of the clinically-classified groups were made (using Student’s t-test, Sigma-Stat software, SPSS Inc., Chicago, Illinois).

  2. (3)

    EMG values for TLS and TNS groups, as classified by ANN, were compared (by Student’s t-test, Sigma-Stat software) using means and standard deviations of their uterine EMG input variables. EMG values for PTLS and PTNS groups, as classified by ANN, were also compared in this way.

A high percentage of correctly classified patients (above the 65% threshold) would be evidence that the ANN was successful in properly discerning between the labor and non-labor patients. Moreover, a significant difference in any of the uterine EMG parameters of the ANN-sorted groups would suggest that those particular EMG parameters may have some physiological importance for the diagnosis of labor.

Results

The mean gestational age of TL (39.5 ± 1.3 weeks) was significantly higher (p < 0.05) than that of TN (38.3 ± 1.0 weeks), while the mean gestational age of PTL (33.8 ± 3.31 weeks) was significantly higher (p < 0.05) than that of PTN (31.0 ± 4.0 weeks). The mean MTD interval of TL (0.3 ± 0.2 days) was significantly shorter (p < 0.01) than that of TN (9.9 ± 6.9 days), and the mean MTD interval of PTL (6.4 ± 6.8 days) was significantly shorter (p < 0.01) than that of PTN (53.7 ± 30.4 days). These results suggest that the clinical determination of labor in the patient sets was well-performed, and thus was a good standard for this study.

Uterine EMG bursts generally corresponded temporally to contractions as measured by TOCO (Fig. 3). Again, this was nearly always true for laboring and term patients, while less-so for non-labor and preterm patients.

Figure 3.
figure 3

Correspondence of uterine contractions, as measured mechanically by TOCO belt-transducer (bottom trace), with uterine electrical bursts, as recorded non-invasively with trans-abdominal surface EMG (top trace). Since the electrical activity is actually responsible for the muscular-contraction events, monitoring the state and function of the uterus using EMG is a viable (and perhaps superior) alternative to bulky, unreliable TOCO or invasive intrauterine pressure catheters.

When the average PS peak frequency, the standard deviation of the burst duration, and the ratio of these two were used as input parameters for the ANN, the training set was classified correctly for 30/38 (79%) of the TL patients, 5/6 (83%) of the PTL patients, 25/30 (83%) of the TN patients, and 14/19 (74%) of the PTN patients, while the test set was classified correctly for 29/37 (78%) of the TL patients, 7/7 (100%) of the PTL patients, 26/29 (90%) of the TN patients, and 13/19 (68%) of the PTN patients. Overall, when both the training and test classification results were combined, 59/75 (79%) of all TL patients, 12/13 (92%) of all PTL patients, 51/59 (86%) of all TN patients, and 27/38 (71%) of all PTN patients were correctly classified (Fig. 4). Consequently the failure rate for classification of these groups overall was 21%, 8%, 14%, and 29%, respectively.

Figure 4.
figure 4

The percentage of correctly identified patients was high for all groups. The number of total patients, and the fraction of correctly classified patients, is also shown on each bar. Note the especially high percentage of correctly classified PTL patients. This demonstrates the capability of ANNs to identify labor. Even better results are possible if a greater number of patients could be included, and if a larger number of physiologically pertinent input parameters are investigated, giving the ANN more information with which to work.

Using the combined training and test data, the average PS peak frequency was significantly higher (p < 0.05) for TLS (0.4371 ± 0.0449 Hz) compared with TNS (0.3916 ± 0.0223 Hz) and for PTLS (0.4708 ± 0.0459 Hz) compared with PTNS (0.3982 ± 0.0231 Hz) (Fig. 5, Table 1). The average standard deviation of the burst duration was significantly lower (p < 0.05) for TLS ((0.1114 ± 0.0551)(×102) s) compared with TNS ((0.2197 ± 0.1002) × (102 ) s) and for PTLS ((0.1869 ± 0.0741) × (102 ) s) compared with PTNS ((0.3163 ± 0.1340) × (102 ) s) (Fig. 6, Table 1). The ratio [(average PS peak frequency)/(average standard deviation of burst duration)] was significantly higher (p < 0.05) for TLS ((0.5047 ± 0.2754)(×10−1) Hz/s) compared with TNS ((0.2259 ± 0.1079)(×10−1) Hz/s) and for PTLS ((0.3285 ± 0.1947)(×10−1) Hz/s) compared with PTNS ((0.1445 ± 0.0624)(×10−1) Hz/s) (Fig. 7, Table 1).

Figure 5.
figure 5

(a) Average PS peak frequency was significantly higher for TLS (0.4371 ± 0.0449 Hz) compared with TNS (0.3916 ± 0.0223 Hz) and (b) for PTLS (0.4708 ± 0.0459 Hz) compared with PTNS (0.3982 ± 0.0231 Hz). Mean ± SD is shown. p < 0.05 was used for significance.

TABLE 1. Summarizes the experimental results. The values for the three useful uterine EMG variables (i.e. those that resulted in >65% successful ANN patient categorization) are shown, along with their p-values (when comparing the labor and non-labor term and preterm groups).
Figure 6.
figure 6

(a) The average standard deviation of the uterine electrical burst duration was significantly lower for TLS ((0.1114 ± 0.0551)(×102) s) compared with TNS ((0.2197 ± 0.1002)(×102) s) and (b) for PTLS ((0.1869 ± 0.0741)(×102) s) compared with PTNS ((0.3163 ± 0.1340)(×102) s). Mean ± SD is shown. Note that this value is the mean and standard deviation (as calculated over the group of patients) of the electrical burst duration standard deviation values (as calculated for individual patients using multiple bursts). p < 0.05 was used for significance. It is worth emphasizing that the average mean of the burst duration did not show a significant difference between the groups.

Figure 7.
figure 7

(a) The ratio of the (PS peak frequency)/(standard deviation of burst duration) was significantly higher for TLS ((0.5047 ± 0.2754)(×10−1) Hz/s) compared with TNS ((0.2259 ± 0.1079)(×10−1) Hz/s) and (b) for PTLS ((0.3285 ± 0.1947)(×10−1) Hz/s) compared with PTNS ((0.1445 ± 0.0624)(×10−1) Hz/s). Mean ± SD is shown. p < 0.05 was used for significance.

No other significant differences between groups was seen in any of the other uterine EMG parameters investigated (i.e., standard deviation of PS frequency, mean of burst duration, number of bursts in a given time, as well as mean and standard deviation of total activity). In fact, inclusion of these other variables in the ANN classification algorithm actually reduced the capability of the ANN to properly identify labor (below the established 65% threshold), likely because those parameters had little, if any, physiological relevance or predictive usefulness.

Conclusions

As in previous studies, we established herewith that non-invasive trans-abdominal uterine EMG measurements can be used to effectively monitor pregnant patients. Artificial neural networks, in conjunction with uterine EMG data, seem to be an effective method for classifying both term and preterm pregnant patients into those who are in labor versus those who are not. The high percentage of correctly classified patients, and the significant difference in values of the electrical parameters for ANN-sorted groups, is proof that the method is effective.

The best uterine EMG classification input parameters for the ANN in this study were the PS peak frequency, the standard deviation of burst duration, and the ratio of these two. These resulted in a sufficiently high classification rate. The PS peak frequency has previously been shown to be a useful parameter in prediction of labor and delivery.22 Because this parameter has previously been linked to contraction strength,28 it was not unexpected that this parameter might also have utility as an input for the ANN. Similarly, it has been thought for some time that the duration of individual contractile events, as represented by uterine EMG bursts, varies from pre-term to term and from non-labor to labor.28 The variation in contractile duration appears to be more extreme for non-labor patients than for labor patients. In many term and preterm non-labor recordings, both short and long contractile events were seen, whereas the contractile events in both term and preterm labor patients generally were more consistent in their durations. This indicates that the uterus cannot make consistent and useful contractions in the non-labor state.

The remaining parameters investigated actually reduced the predictive capability of the ANN when processed by the classification algorithm. We suppose that this is because they have little, if any, physiological significance, or at least they seem to have little diagnostic relevance. However, other sophisticated and perhaps “less-traditional” calculated uterine EMG parameters not considered in this study (e.g., propagation velocity,17 fractal dimension,23 wavelet energy, or Lyapunov exponents) should also be investigated for patient classification capabilities, using them as input variables for the ANN in future work. Recent studies, including our own, indicate that such non-linear variables could be a useful tool for successfully discerning between labor and non-labor patients.23, 26 The inclusion of other non-EMG demographic and clinical parameters may also be useful if included in any forthcoming ANN analysis for labor.20

Since, in this study, we were interested only in the labor vs. non-labor state of patients, the ANN only had two output nodes, and was used separately on the term and subsequently on the preterm patients to define two further subgroups for each – namely labor and non-labor. In future studies, we may try to use neural networks to classify non-labor patients or labor patients into more than 2 sub-classes in order to identify abnormal conditions. Unfortunately, we really did not yet have enough patients for that type of investigation.

The present study concentrated on patients that could be clearly identified clinically as being in labor or not in labor. This is reflected in the fact that the average time from measurement to delivery for the labor patients (as determined clinically) was statistically lower than that for the non-labor patients in both the term and preterm groups. The MTD cutoff for our labor patients was set at 24 h, based on results of previous work.22 With this rather broad criterion, patients were classified using a Kohonen neural network simply as either “in labor” or “not in labor.” Prediction of the exact time of labor onset for each patient, however, was not investigated here. We anticipate that in order to be able to discern between, for example, those patients who will enter labor 8 h after assessment, as opposed to those patients who will begin labor 18 h after assessment, a much greater number of individuals will have to be included in the analysis (and a very great number of ANN inputs will undoubtedly be required, too). It must be mentioned that other techniques being utilized for diagnosing preterm labor, although currently not sufficiently effective at predicting preterm labor, have been scrutinized in a number of subgroups from the overall patient population. By including a larger patient aggregate and by using a greater number of physiologically meaningful input parameters (if they can be determined), we hope the approach that we have developed here will prove to be as reliable in the general patient population as it was for this study.

This would give practitioners the capability to better manage patients and to provide better health care for them and their unborn children. In turn, it could reduce resulting pregnancy complications and improve delivery outcomes associated with both term and preterm labor.