Introduction

Female urinary incontinence (UI) is a common, age-dependent pelvic floor disorder affecting 25 to 45% of women. It is currently a major public health problem and associated with a negative psychologic and social impact [1]. Two pathophysiologic mechanisms are described to explain female stress urinary incontinence (SUI): urethral hypermobility and instrinsic sphincter deficiency [2, 3]. Urethral hypermobility is due to a weakening of the supporting structures of the proximal urethra [4].

Currently, different techniques are used to measure urethral mobility in women with SUI. The Q-tip test is a simple and inexpensive clinical tool which was developed in the 1970s [5]. The Aa point, which corresponds to the position of the bladder neck in the International Classification of Pelvic Organ Prolapse (POP-Q), can be used to evaluate urethral mobility in a patient with SUI. However, the reliability of the method is a matter of controversy: some studies report a poor correlation with the diagnosis of SUI [6] or with the Q-tip test to quantify urethral mobility [7], while others report that the POP-Q system is highly predictive of the straining urethral angle [8]. Among the paraclinical tools developed in this setting, perineal ultrasound seems the least invasive and is easy to perform in a urogynecologic department [9]. The most frequently used measures are the posterior urethrovesical angle (UVP angle) and the mobility of the bladder neck along two axes (the axis of the symphysis and its perpendicular). Dietz et al. found good reliability using a simple transperineal ultrasound approach by measuring the bladder neck descent (BND) during stress on the axis of the perineal probe [10]. Nevertheless, reproducibility studies to validate this measure are lacking.

Therefore, the primary outcome of the present study was to assess the intra-class correlation coefficients of the ultrasound measurements of urethral mobility during stress. The secondary outcome was the Spearman coefficient for the ultrasound and POP-Q measurements. The hypothesis of the present study was that reproducibility would be good in two different periods (during pregnancy and after delivery).

Materials and methods

Population

The present study is a secondary analysis of the 3PN study (“Prenatal Pelvic floor Prevention”), which compared the effect of antenatal pelvic floor muscle training versus written information alone on the severity of UI at 12 months postpartum. The 3PN study was a French multicenter, randomized controlled trial involving primiparous women with singleton pregnancies at low obstetric risk [11]. In this trial, inclusion criteria were: women > 18 years who could read French and were between the 20th and 24th weeks of pregnancy. Exclusion criteria were: multiple pregnancy, high obstetric risk factors, and pelvic floor muscle training within the 12 months prior to pregnancy.

Perineal ultrasound measurements were performed either at the inclusion visit or 2 months after delivery. It was an optional test in the 3PN study for those centers where the measurement could be performed without an additional visit (the Intercommunal Hospital Center of Poissy-Saint-Germain-en-Laye) during the morphologic ultrasound of the 2nd trimester of pregnancy or during the postpartum consultation. The sample size was thus limited to the available data collected within the trial.

Each measurement was performed twice by two operators (AF and JF) to assess intra- and inter-observer reliability. The measurements were made by transperineal ultrasound: a 3.5- to 7-MHz protected probe was placed on the perineum with the patient in a supine position after having been asked to empty her bladder as described by Dietz (Fig. 1). The distance between the bladder neck and the horizontal line passing through the lower edge of the symphysis (bladder neck-symphyseal distance, BSD) was measured at rest and on straining. Women were asked to push against a closed glottis for 6 s, three times. The largest measurement was retained for analysis. The difference between the measurement at rest and during stress gives the BND value (mm) [12]. Urethral mobility is routinely performed at our institution during urogynecologic evaluation for SUI, and both operators were experienced in pelvic floor ultrasound. All the patients gave their informed consent to perform the procedure evaluated in the study.

Fig. 1
figure 1

Ultrasound measurements of urethral mobility: bladder neck descent. (N.B.: the figures are shown with a bladder filled to average maximal bladder capacity)

Clinical urethral mobility was assessed by one practitioner (AF) by measuring point Aa of the POP-Q classification during maximum strain with an empty bladder [13]. This point is located 3 cm proximal to or above the hymenal ring anteriorly at rest. The measurement of point Aa is made during Valsalva pushing. Each measurement is made in centimeters above or proximal to the hymen (negative number), or below or distal to the hymen (positive number), with the plane of the hymen being defined as zero (0). Point Aa therefore varies between −3 and + 3 cm.

Statistical analyses

Statistical analyses of inter- and intra-observer reliability were performed using the Spearman correlation coefficient (rho) and intra-class coefficients (ICC) [14]. Mean differences between two measurements with 95% confidence interval (CI) according to Bland and Altman plots are presented [15]. Correlations between ultrasound and clinical measurements were performed using the Spearman correlation coefficient.

Correlation coefficients were interpreted as follows: 0.9–1.0, very high correlation; 0.7–0.9, high correlation; 0.5–0.7, moderate correlation; 0.3–0.5, low correlation; 0.0–0.3, negligible correlation [16].

In the definition of repeatability using the Bland and Altman method, it is expected that at least 95% of the differences between two observers are < 2 standard deviations.

Informed consent was obtained from all of the women participating in the study. The “Comité de Protection des Personnes” (Ethical Review Commitee) of Bordeaux examined and approved the research. The study was registered at the AFSSAPS (Agence Française de Sécurité Sanitaire des Produits de Santé) with the number 2007-A00641–52 and at ClinicalTrials.gov with the number NCT00551551.

The results were analyzed using SPSS (IBM SPSS Statistics for Macintosh, Version 22.0. Armonk, NY: IBM Corp.).

Results

Fifty women were included (31 during the pregnancy and 19 after delivery). All of the 200 measurements performed were analyzed. No complications related to the procedure were recorded, and the measurements were well tolerated by all the patients.

The mean age of the population was 29.6 (± 4.4) years, and the mean body mass index (BMI) was 22.3 (± 3.3) kg/m2. Demographic data were not significantly different between the women evaluated in the ante- and postpartum periods (Table 1).

Table 1 Patients’ characteristics

The mean ultrasound antepartum urethral mobility was lower than in the postpartum period (11.0 ± 5.8 mm vs. 13.7 ± 4.2 mm). The mean clinical ante- and postpartum urethral mobility was −2.0 (± 0.5) cm and − 1.0 (± 0.5) cm, respectively. The correlation between antepartum ultrasound and clinical urethral mobility was moderate (rho = 0.50, p < 0.05). The correlation for the postpartum measurements was low without reaching significance (rho = 0.34, p = 0.09).

Table 2 shows the mean inter- and intra-observer differences and the Spearman correlation coefficients. Intra-observer agreement was high: ICC = 0.75 (0.59–0.85) for operator 1 and 0.73 (0.55–0.84) for operator 2. Inter-observer agreement was low or moderate: ICC = 0.35 (0.01–0.54) for the first measurement and 0.52 (0.27–0.71) for the second. The ICCs were significant in each case (p < 0.05).

Table 2 Intra- and inter-observer reliability in the overall population

In the antepartum group, intra-observer agreements were high with ICCs of 0.80 (0.62–0.91) for operator 1 and 0.84 (0.68–0.92) for operator 2. The mean differences were 0.8 (± 3.9) mm for operator 1 and 0.4 (± 3.9) mm for operator 2. Inter-observer agreements were considered moderate: ICC = 0.58 (0.26–0.78) between the first measurement of each operator and 0.68 (0.42–0.84) for the second. The mean differences between the two operators were 0.7 (± 5.9) mm for first measure and 1.1 (± 5.4) mm for the second.

In the postpartum group, intra-observer agreements were moderate: ICC = 0.61 (0.21–0.83) for operator 1 and 0.57 (0.17–0.81) for operator 2. However, inter-observer agreements were low: ICC = 0.15 (0.10–0.41) for measurement 1 and 0.21 (0.10–0.58) for measurement 2.

Figure 2 represents the Bland and Altman plots of the ultrasound measurements for the two operators. Each dot represents one woman. The magnitude of discrepancies was about 1 mm for intra-observer reliability and + 1.5 and − 1.5 mm for inter-observer reliability (Table 2).

Fig. 2
figure 2

Individual differences between the measurements of the two operators using the Bland and Altman method. Each point represents one woman (N = 50)

Discussion

The results of the present study showed moderate to good intra-observer reliability for the BSD measure in both the ante- and postpartum periods and a moderate inter-observer reliability in the antepartum period. The inter-observer agreements were very poor in the postpartum period. The correlation between antepartum ultrasound and clinical urethral mobility was moderate.

Numerous ultrasound parameters have been described: a reliable technique is one that results in little variation in measurements between operators regardless of the situation (during or after pregnancy; in continent and incontinent patients, for example). In the present study we had the opportunity to study this measure in the ante- and postpartum periods.

In the 1990s, Creighton et al. showed good reliability of the urethro-vesical junction movement in women with vaginal prolapse. However, the sample population was small, and the group was not homogeneous [17]. In 1995, Schaer et al. studied the reliability of the measurement of the posterior urethro-vesical angle during the Valsalva maneuver: they demonstrated a significant difference between their measurements. The effects of bladder filling and catheterization were not evaluated [18]. Due to this lack of reliability, the use of angle measurements would not appear to be usable in current practice.

The inferior-posterior edge of the symphysis pubis is easy to distinguish in all women and is used as a fixed anatomical landmark for dynamic measurements. In our practice, we use it as a benchmark to check the absence of involuntary mobilization of the probe during dynamic movements. Using an x,y coordinate system based on the pubic bone, Salvatore et al. found a moderate correlation [19] while Peschers et al. found a good intra-observer agreement [ICC = 0.99 (0.97–0.99)]. In the latter study, inter-observer reliability was not evaluated [20]. DeLancey’s team used a vector-based assessment to determine the magnitude and direction of bladder neck movements in ten nulliparous continent women, ten primiparous continent women, and ten primiparous stress-incontinent women. They concluded that the measurements were possible in all 30 subjects, and test-retest reliability correlations were more than an r value of 0.7 in all measures [21]. This could thus constitute an interesting method.

We did not think it necessary to describe bladder neck mobility using two axes or a vector since Dietz published his reliability results for BND measurement [ICC of 0.98 (0.94–0.99)] [22]. This measure is easier to use in clinical practice assessing BND along a single axis passing through the lower edge of the pubic symphysis.

Another interesting measure could be the one described by Wlazlak et al. They reported the results of 92 women using 2D introital ultrasonography. The location of the urethral internal orifice was defined with coordinates of two points: point CI marking the urethral anterior edge visualized on ultrasound as closer to the pubic symphysis and point CII marking the posterior edge visualized more peripherally from the pubic symphysis. Reliability measurements of point CI location and mobility were good and very good (0.6710–0.9961) and medium, good, and very good for point CII (0.5738–0.9944). Point CI was clearly visible in all cases while it was not possible to accurately mark point CII in 4.3–17.4% of cases [23].

Although the work we present here did not focus on clinical symptoms of SUI or postpartum urethral mobility, we found a quite good correlation between antepartum ultrasound and clinical urethral mobility. In a previous study, we also found that prenatal urethral hypermobility (assessed clinically or by ultrasound) was significantly associated with UI at 1 year after the first delivery [24].

Currently, different techniques are used to measure urethral mobility in women with SUI. The Aa point of the POP-Q classification to evaluate urethral mobility is not well correlated with the diagnosis of SUI, and its reliability is matter of controversy. To date, the superiority of ultrasound measurement over clinical measurement has never really been demonstrated. A number of factors may explain these results: population choice, continence status, association with anterior compartment prolapse, and the urogynecologic symptoms specified. The Q-tip test is a validated objective measure but can cause discomfort for women. Moreover, it measures the rotation while straightening the urethra, while the ultrasound BND measured is not a rotation or vector.

Vesical volume and bladder catheterization can affect urethral mobility results as a greater bladder volume reduces urethral mobility [25]. To avoid the influence of bladder filling, we decided to perform the measurement according to the technique described by Dietz [12]. On the other hand, it is also more difficult to perform measurements with an empty bladder. Catheterization does not seem to affect urethral mobility. Finally, the major challenge of measuring urethral mobility is the difficulty in standardizing the Valsalva maneuver to obtain maximum strain. Some studies report that levator ani muscle co-activation during the Valsalva maneuver reduces urethral mobility [26]. Thus, clear instructions for achieving the Vasalva maneuver appear to be important for good evaluation.

The strength of our study is that transperineal ultrasound mobility was assessed in two different periods. To our knowledge, the only other study which performed inter-observer reliability measurements with transperineal ultrasound on women during pregnancy and the postpartum period was that of Dietz et al. They studied the reliability of three-dimensional ultrasound in an attempt to define the extent and nature of traumatic damage to pelvic floor structures during delivery. The authors found a high degree of concordance between observers but the reliability of urethral mobility measure was not assessed [27].

Despite its interest, this study has some limitations. Differences emerged retrospectively on the interpretation of the BSD measurement. This could be because the choice of the symphysis axis was not always the same. It is important to develop operator experience and materials (2D, 3D, 4D ultrasound) before using transperineal ultrasound routinely. The measurement has since been standardized by the Special Interest Group Imaging of IUGA (International Urogynecologic Association). The ‘IUGA cookbook’ and their online interactive training in the technique are available through the Pelvic Floor Ultrasound course of the SIG (https://www.iuga.org/tools/pfic/pfic-overview).

Our sample population was small, leading to a loss of power, particularly for the correlation analyses with clinical examination and the evaluation of sensitivity to change. Furthermore, we were not able to compare the same women in the ante- and postpartum periods. The women we included were relatively young with a low BMI, which facilitated the measurements but may mean that our results are of limited reliability for the general population. Moreover, we did not find clinically significant urethral hypermobility in our subjects (generally > 2 cm of mobility is considered clinically relevant [28]), which implies that we did not test the measurement over a wide range of urethral mobility.

Conclusions

Although BND measurement is a promising measure of urethral mobility, we failed to demonstrate good inter-observer reliability in either the ante- or postpartum periods. To date, due to poor measurement standardization, ultrasound cannot be used to assess urethral mobility. Other studies should be conducted with standardized measurements and in other populations with and without UI symptoms.