Introduction

Ultrasound is increasingly used in the diagnostic evaluation of patients with symptoms and signs of pelvic floor dysfunction [1, 2] and is the most appropriate form of imaging in urogynecology for reasons of cost, access and performance, and because it provides information in real time. Translabial (or “perineal”, or “transperineal”) 3D/4D imaging is particularly useful in assessing women with pelvic organ prolapse, since 3D ultrasound gives easy access to the axial plane, which was previously only possible with the use of magnetic resonance imaging (MRI).

To date, a number of studies have demonstrated moderate to almost perfect repeatability for measurements of organ prolapse and pelvic floor anatomy using translabial 3D ultrasound [39]. However, these studies commonly focus on the repeatability of the offline analysis of stored volume data sets and of measurements obtained on the same day. Hence, there are few data on the agreement between two independent measurements obtained at longer intervals. This is relevant because of the potential confounding effect of bowel filling and stool consistency, neuromuscular activation state, and the varying efficacy of patient instruction and investigator performance. It is recognised that the repeatability of measures is influenced not only by the quality of post-processing, but also by the quality of volume acquisition at the time of the examination.

Owing to the nature of clinical practice at our unit, patients undergoing surgical management for pelvic floor disorders are assessed twice, once when they are first seen in our clinic, and a second time when they undergo urodynamic testing prior to surgical consent, or occasionally in reverse order. In this study, we assessed the short- to medium-term repeatability of translabial ultrasound measures of pelvic floor anatomy by comparing archived volume ultrasound datasets obtained at two separate appointments at a maximum interval of 6 months.

Materials and methods

This retrospective study was performed by analysing archived ultrasound volume datasets obtained in the context of routine clinical practice at a tertiary urogynecological unit. Patients undergoing surgical management for urinary incontinence or female pelvic organ prolapse are assessed twice: once at the time of the initial visit to a public hospital, and again after urodynamic testing in a private setting. In some instances the order was reversed. One hundred and fifiy-six patients with two separate assessments and two available ultrasound volume datasets prior to surgical management between January 2008 and November 2012 were identified from the institutional databases. At both assessments participants had undergone a physician-administered, standardised interview regarding demographic data. All complained of symptoms of pelvic floor dysfunction. A clinical assessment was performed using the International Continence Society Prolapse Quantification (ICS POP-Q) system [10].

Four-dimensional (4D) translabial ultrasound was performed after voiding and in the supine position, using a Voluson system with a RAB 4- to 8-MHz transducer, as previously described [11]. The probe was placed on the perineum in the sagittal direction. After an initial 2D assessment of bladder volume, detrusor, urethra and organ descent, ultrasound volumes were obtained by a total of approximately 20 individual examiners (medical doctors in gynaecology or urogynaecology training, of widely varying degrees of ultrasound expertise), under the supervision of the senior author or five other staff members trained by him for 3 months or more, at rest, on pelvic floor muscle contraction (PFMC) and on maximal Valsalva. Each manoeuvre was recorded three times. Patients were instructed not to be inhibited by the presence of urinary, faecal, or flatus incontinence in order to achieve maximum pushing effort, and a Valsalva was required to last at least 6 s [12]. Care was taken to include the dorsal aspect of the symphysis pubis, as it is an essential landmark for all the parameters assessed in this study. During Valsalva, we also tried to control for levator co-activation [13]. To achieve an optimal PFM contraction we provided visual biofeedback to the patient.

Ultrasound data analysis of stored 4D volume data sets was performed offline by LT, who was blinded to all other data, on a desktop PC using the software 4D View v 10.0 (Kretz Medizintechnik, Zipf, Austria). Assessment of the second appointment was undertaken blinded to the results of the assessment of the first appointment. The most effective Valsalva and contraction volume data were utilized, defined as the volume data showing the most organ displacement. Bladder neck and pelvic organ descent were determined relative to a horizontal reference line placed at the level of the inferoposterior margin of the symphysis pubis [14]. Bladder neck descent (Fig. 1a, b) is a dynamic measurement obtained by comparing measurements at rest with those obtained on maximal Valsalva, while measurements of maximum bladder, uterine and rectal descent (Fig. 1c) are determinants of organ position on maximal Valsalva without reference to organ position at rest. Hiatal area on Valsalva was measured in the axial plane at the location of the minimal anteroposterior diameter of the hiatus, as previously described (Fig. 2) [5]. A true rectocele was defined as a sharp discontinuity in the anterior contour of the anorectal muscularis layer and a resulting herniation into the vagina of ≥10 mm in depth [15]. Rectocele depth was measured perpendicular to a line projected along the expected contour of the anterior anorectal muscularis [15]. Levator avulsion was determined on maximal PFMC by tomographic ultrasound imaging (TUI), as previously described [16], if all three central slices, i.e. the plane of minimal dimensions plus slices 2.5 mm and 5 mm cranial to this plane, showed an abnormal insertion of the puborectalis muscle on the inferior pubic ramus (Fig. 2). In doubtful cases, the “levator urethra gap” (LUG) was measured, with the insertion regarded as abnormal if the LUG is over 2.5 cm [17].

Fig. 1
figure 1

Assessment of a, b bladder neck descent and c prolapse by translabial 4D ultrasound. S symphysis pubis, Ur urethra, B bladder, C cystocele, U uterine cervix, R rectal ampulla

Fig. 2
figure 2

Assessment of a levator hiatal area on Valsalva and b levator avulsion on translabial ultrasound. SP symphysis pubis, LH levator hiatus, L levator ani. In b the asterisk marks the location of a full avulsion on the patient’s right (the left aspect of slices 3–8)

Prior to this study, after an initial training session of 10–20 cases assessed under direct supervision of one of the authors, an offline test–retest series of 20 ultrasound volume datasets was undertaken between the first author (who had no prior experience in pelvic floor ultrasound) and RGR, a senior trainee with over 1 year’s experience in translabial ultrasound.

Statistical analysis was undertaken using SPSS V 16 (IBM, Armonk, NY, USA). Cohen’s kappa was used to assess qualitative data, such as for levator avulsion and true rectocele. Intraclass correlation coefficient (ICC) statistics (absolute agreement definition) were obtained to test the repeatability of continuous measurements. We did not perform any power calculations because of the absence of pilot data and the retrospective nature of this research. ICC values under 0.20 were considered poor, 0.21–0.40 was considered fair, 0.41–0.60 moderate, 0.61–0.80 good and 0.81–1.00 very good or excellent [18]. To provide a wide spectrum of measures of repeatability, we calculated not just intraclass correlations but also systematic bias and the mean absolute difference between measurements.

This study was approved by our local institutional review board (NBMLHD HREC, reference 13–07). Owing to the retrospective nature of the research and the fact that all data collection occurred as part of routine clinical care, the committee waived the requirement for individual informed consent.

Results

Of 156 patients, 40 with intervals greater than 6 months and 10 with missing volume datasets were excluded, leaving 106 for analysis. All reported data pertain to those 106 patients. The interval between assessments was a mean/median of 73 days (range, 1–178). Demographic data and information on symptoms are given in Table 1. On examination, 96 women (90.6 %) were diagnosed with significant prolapse (ICS POP-Q grade 2 or higher), 73 (68.9 %) in the anterior compartment, 64 (60.4 %) in the posterior compartment, and 13 (12.3 %) in the central compartment.

Table 1 Demographic data and prevalence of symptoms at first assessment (n = 106)

In the test–retest series undertaken prior to this study, we documented good to excellent inter-observer repeatability for the postprocessing analysis of bladder neck descent (ICC 0.82, 95 % CI 0.60–0.93), cystocele (ICC 0.82, 95 % CI 0.51–0.93), uterine descent (ICC 0.81, 95 % CI 0.53–0.93), hiatal area on Valsalva (ICC 0.92, 95 % CI 0.80–0.97), rectal ampulla position (ICC 0.73, 95 % CI 0.39–0.89), rectocele depth (ICC 0.75, 95 % CI 0.48–0.89), true rectocele (Kappa 0.69, 95 % CI 0.29–1.09) and levator avulsion on tomographic ultrasound (Kappa 0.77, 95 % CI 0.47–1.0).

Table 2 shows a comparison of ultrasound measurements for the first and second appointments, with means, ranges, intraclass correlation and confidence intervals for the same, systematic bias and mean absolute difference between measurements. All the parameters measured demonstrated good to very good repeatability between the two assessments (ICC, 0.73–0.93) with the exception of moderate repeatability for rectocele descent (ICC, 0.44). Among them, hiatal area on Valsalva showed the best repeatability (ICC, 0.93). As regards qualitative findings, such as the diagnosis of levator avulsion and true rectocele, agreement was very high (agreement in 101/106 cases, kappa 0.91 for avulsion [CI 0.77–0.94] and agreement in 92/106 cases, kappa 0.73 [CI 0.56–0.84] for true rectocele). There was no appreciable systematic bias, with measurements showing mean differences of 0.9 mm for bladder neck descent and cystocele descent and 1.2 mm for uterine descent.

Table 2 Repeatability of sonographic measures of pelvic floor functional anatomy (n = 106)

Discussion

The present study demonstrated good to excellent repeatability of measurements of bladder neck descent, cystocele, uterine descent, rectocele depth and hiatal area on Valsalva (ICC, 0.73–0.93) in two assessments at an average interval of 73 days, with a range of 1–178 days. The qualitative diagnoses of levator avulsion and true rectocele showed very high agreement. Our findings confirm and reinforce previous test–retest, intra- and interobserver studies using translabial ultrasound [5, 6, 8, 9, 19], which suggest that 3D/4D translabial ultrasound is a highly reliable method for the evaluation of pelvic organ prolapse and pelvic floor anatomy.

With regard to the hiatal area on Valsalva, our results demonstrate that this is likely to be amongst the most highly repeatable measures of pelvic organ descent. Recent studies demonstrated that even an inexperienced observer could adequately perform evaluation of the hiatal area for both the offline measurement technique [8, 9, 19] and volume acquisition [8], after a limited amount of teaching. As was the case in those studies, volumes in our study were obtained by a large number of trainees under supervision of the senior author or other staff trained by him for 3 months or more. Postprocessing analysis was performed by a novice after less than 2 weeks of training. These observations suggest that measurement of the hiatal area on Valsalva is a robust parameter and easy to learn with an acceptable level of effort. It is also clearly a measure of high clinical validity as it is strongly associated with prolapse and prolapse symptoms [6].

Regarding the displacement of the rectal ampulla, our observation of moderate repeatability is in contrast to the findings of Dietz and Steensma [15], who reported high interobserver reproducibility in a test–retest series, with an ICC of 0.75. This might be explained by the fact that this earlier study used two experienced observers for the evaluation of volumes that were obtained on the same day, with similar rectal filling, while we used volumes obtained at an average interval of 73 days. Regardless of this, it appears that offline assessment of the posterior compartment is more challenging for trainees and requires more extensive training [20]. This may be related to the effect of bowel filling and stool consistency, muscle resting tone and neuromuscular activation status, which is variable especially for evaluations carried out at longer intervals.

The fact that volume acquisition for this present study was performed by multiple observers is also bound to add variability, since it involves the effectiveness of instruction and investigator performance. The quality of a Valsalva manoeuvre is strongly reliant on instructions given to the patient and the patient’s cooperation. The quality of instruction or coaching depends to a large degree on the operator’s awareness of confounders such as levator co-activation [13], and bladder and bowel filling. Real-time imaging has the advantage of demonstrating these confounders, allowing correction of suboptimal effort and recognition of organ filling and levator activation [1]. In order to achieve maximum pushing effort, we instructed patients to continue the Valsalva manoeuvre for at least 6 s [12] and not be restrained by urinary or anal incontinence, which may be regarded as a form of standardization. To achieve an optimal PFM contraction we provided visual biofeedback to the patient, as previously described [21]. The excellent ICC values of most measurements suggest that we were successful in controlling for confounders.

There are a number of potential weaknesses of this study that have to be acknowledged. The study design was retrospective, with about 20 individual ultrasound operators with varying experience being involved in volume acquisition. However, this may rather be seen as a strength as it implies that our data might be more widely applicable. We used archived 4D ultrasound data sets rather than 2D cine loops or stills for reasons of convenience as this is required for the hiatus and tomographic imaging of the levator. It is recognised, however, that sonographic assessment for organ descent does not need 3D/4D imaging as it can just as well be performed on 2D images. Another potential weakness is that hiatal area was determined in a simple axial plane of the minimal anteroposterior diameter of the hiatus. In spite of its high reproducibility [2, 21] and comparability to magnetic resonance imaging [22] it has recently been shown that determination of hiatal area in a rendered volume may be more appropriate, although this latter method is not available on all 3D ultrasound systems [22]. Additionally, it has to be recognised that in some instances intercurrent treatment such as pelvic floor muscle exercise teaching or temporary pessary placement may have occurred, confounding the results. The absence of systematic bias between the two assessments argues against any significant effect of intercurrent treatment. Ageing is unlikely to be a significant confounder owing to the short interval of 73 days on average.

Finally, while we analysed the qualitative diagnosis of levator avulsion, other measures such as the numeric value of the levator–urethra gap, and other quantitative parameters, such as detrusor wall thickness were not assessed, leaving room for future work.

The main strength of the study is that, as opposed to other similar work, we assessed the medium-term repeatability of parameters obtained on sonographic assessment of functional pelvic floor anatomy, which includes the variability of operator performance, patient instruction, and recording ultrasound volumes. Our results are reassuring, especially given that multiple different individuals participated in volume acquisition, and given that offline analysis was performed by a novice trainee. Our study suggests that despite high potential operator dependence of sonographic assessments, 3D systems can reduce this operator dependence and help facilitate both performance and analysis of a translabial ultrasound examination.

In conclusion, the repeatability of translabial ultrasound measures of functional pelvic floor anatomy performed with a mean interval of 73 days was moderate to excellent. Based on the results of the present study, we feel that 3D/4D translabial ultrasound is easy to perform and interpret. The technology appears to be suitable for introduction into clinical practice.