Introduction

Pelvic organ prolapse (POP) affects more than 30% of women in the age of 50 years and older [1, 2]. If patients perceive the prolapse or its associated symptoms as severe, surgical treatment may be indicated. The first step towards optimal treatment selection is to obtain an unambiguous diagnosis, which precisely describes the nature and severity of the prolapse and its related pelvic floor symptoms in a valid way.

The primary diagnostic work-up consists of history taking and physical examination [3]. The pelvic examination includes POP-Quantification (POP-Q) scoring, which facilitates a uniform and accurate measurement of the prolapse [4, 5]. To improve understanding of the reported pelvic floor symptoms and to optimalize (surgical) treatment planning, the International Continence Society advocates diagnostic testing prior to POP surgery [4]. Various imaging and function tests may be considered such as magnetic resonance imaging (MRI), defecography (DG), urodynamic evaluation (UDE), and anorectal function testing, including anal endosonography (AFT).

Unfortunately, due to a lack of evidence on their clinical value, no guidelines exist concerning the optimal use of these tests in a clinical practice setting. Their application essentially is opinion-based. Therefore, the National Institutes of Health has recommended research to clarify the role of additional testing the diagnostic work-up of pelvic organ prolapse [6, 7]. This prompted us to study the influence of diagnostic tests on treatment selection in patients with symptomatic POP. The following questions were investigated. Firstly, what is the effect of additional diagnostic testing on the treatment selection in patients with primary POP compared to the outcome of a consensus panel meeting, with full information available? Secondly, what is the diagnostic value that individual gynecologists assign to the above-mentioned additional diagnostic tests?

Materials and methods

Between January 2000 and January 2002, women presenting with genital prolapse at the gynecology outpatient clinic of the Onze Lieve Vrouwe Hospital were invited to participate in the study. Included were women who experienced a sagging sensation and/or micturition and defecation problems at least once a week and in whom one of the compartments was at least a stage II prolapse according to the POP-Q system. Exclusion criteria were less than 6 months postpartum, gynecological pathology additional to the prolapse, previous prolapse surgery and/or hysterectomy, poor general condition, and insufficient knowledge of the Dutch language. The study was approved by the hospital's Medical Ethical Board. Informed consent was obtained by all participating patients.

Participating patients were initially examined (by the first author) at the gynecology outpatient clinic of the Onze Lieve Vrouwe Hospital. Patient's characteristics were obtained according to a standardized history: age, main symptom, assessment of bladder, bowel and sexual functioning, and obstetric history. In addition, the patient completed a comprehensive questionnaire consisting of the validated generic health-related quality of life questionnaire (MOS SF36) and two validated disease specific quality of life questionnaires: the Urogenital Distress Inventory (UDI) and the Defecation Distress Inventory (DDI) [8, 9]. For this study, we used the Rome II criteria to identify patients with constipation [10]. Constipation was considered to be present when at least two of the following statements from the DDI were positively answered: less than three bowel movements a week, in more than 25% of the time straining to achieve bowel movement, sensation of incomplete evacuation, manual assistance at defecation, or a feeling of anal blockage. Patients were considered to have fecal incontinence if they experienced one of the following complaints: incontinence for liquid stool, incontinence for formed stool, incontinence with urgency, or unnoticed loss of feces.

Patients were considered to have urinary incontinence if they positively answered one of the following three questions from the UDI survey: Do you experience urine leakage related to physical activity, coughing, or sneezing? Do you experience urine leakage related to a feeling of urgency? Do you experience unnoticed urinary loss without physical activity?

At physical examination, patient's length and body weight were measured. The degree of prolapse was measured using the POP-Q system [4]. Next, all patients underwent MRI [11], DG [12], UDE [13], and AFT [14]. Both fast dynamic MRI and DG were performed at rest and during Valsalva maneuver. Defecography was performed with the patient sitting on an artificial toilet, while for MRI, the patients were in supine position. First, three-dimensional imaging was performed for the assessment of the anatomy of the pelvic floor and the pelvic organs and exclusion of pelvic pathology. This was followed by sagittal imaging for the measurement of the pelvic organ descent in relation to the pubococcygeus-line, running from the lower edge of the symphysis pubis to the sacro-coccygeal articulation.

To determine the effects of the four additional diagnostic tests compared with consensus outcome on the treatment selection, we structured the diagnostic process into three steps (Fig. 1). After each step, a treatment advice was defined, based on the cumulative diagnostic evidence so far.

Fig. 1
figure 1

The three-step diagnostic process. After each step, a treatment advice was defined, based on the cumulative diagnostic evidence so far

Step I

In the first step, the initially intended treatment (T1) was based on history taking and physical examination, including POP-Q scores. This step was performed by the first author (gynecologist A).

Step II

In the second step, gynecologists A, B, and C, all considered to be experts in uro-gynecology, independently selected the optimal treatment (T2), in which the combined information of history taking, pelvic examination, and the four additional diagnostic investigations were considered. Furthermore, they individually assigned a score to express the added value of each diagnostic test in the process of clinical-decision taking. The “assigned diagnostic value” (ADV) was a self-report response to express how useful each gynecologist regarded each test. It is an intended subjective score. ADV was calculated as follows. First, each gynecologist rated the value of each additional diagnostic test to select the proper treatment as “useful” (i.e., the information influences treatment strategy), “questionable” (i.e., the information may affect the treatment strategy), or “unnecessary” (i.e., no contribution to treatment strategy). Secondly, the ADV was obtained by assigning 1 point to useful, 0.5 points to questionable, and 0 points to unnecessary. For each diagnostic test, the overall ADV score was calculated by adding the points assigned by the three panelists for all evaluated patients. The ADV per test is expressed as a percentage of the maximum possible ADV and calculated as: total points × 100/the number of performed observations. In addition to the total ADV scores for the entire study population, we also calculated the ADV scores stratified for patient's findings at physical examination (POP-Q scores) and for the presence of bladder or bowel symptoms (urinary and fecal incontinence or constipation).

Step III

In the third step, gynecologists A, B, and C provided collectively a consensus treatment advice (T3) during a panel meeting held once a month. The time between evaluation of the patient at their first visit and treatment decision by consensus debate varied between 4 and 8 weeks. In this paper, we considered this consensus treatment advice as the reference standard.

Treatment advice implicated either conservative management or surgical management. In case of surgical management, one specific vaginal or abdominal procedure was selected depending of the compartments involved. We use the term simple prolapse if only one compartment was affected, and the term combined prolapse in case two or more compartments were involved. The following vaginal or abdominal surgical procedures could be selected. For a simple prolapse of the anterior vaginal wall, an anterior repair or urethra-suspension; for a prolapse of the middle compartment, vaginal hysterectomy or abdominal sacro-colpopexy; in case of enterocele, vaginal or abdominal repair; and for a combined prolapse, a vaginal hysterectomy with vaginal wall repair or abdominal sacro-colpopexy.

Analysis

The aim of the analysis was to establish the effect of the additional diagnostic test information (MRI, DG, UDE, AFT) on treatment selection in patients candidate for POP surgery. We adopted the consensus decision of three panelists, all experts in uro-gynecology, using the full diagnostic information, as reference standard (best available standard). A second aim was to evaluate the diagnostic value of the individual tests scored by the three panelists individually. First, simple descriptive statistics were used for the presentation of the patient population. Next, we computed the agreement between intended treatment proposal after history taking and physical examination (T1) and the respective treatment decisions after full diagnostic information of the three panelists (T2A, T2B, and T2C), separately. The agreement between initial and second treatment plan (T1 vs. T2) and between second and consensus treatment plan (T2 vs. T3) was quantified as the proportion of complete agreement (percent) and by Cohen's unweighted kappa statistic. We used these two alternative descriptive measures since kappas may be sensitive to small variations in classification. Furthermore, we display the number of changes made in terms of “true” and “false” following the consensus outcome for each gynecologist separately (Fig. 2). This procedure discriminates between changes that move to vs. changes that move away from the reference. Regarding our second aim of the study to rate the test's relative importance, the ADV for each gynecologist was described by simple descriptive statistics. For history taking, physical examination and the four diagnostic tests, we averaged the individual ADV over the three gynecologists adjusting for missing values. SPSS 15.0 was used for data management and statistical analysis. A two-sided p value <0.05 was considered to be statistically significant.

Fig. 2
figure 2

The number of treatment plan revisions made by the gynecologists A, B, and C with the aid of the additional diagnostic test information

Results

During the study period, 68 patients met the inclusion criteria of which 53 patients were included in the analysis. Excluded were 15 patients in whom the outcome of at least one of the diagnostic tests was missing prior to the meeting; four patients withdrew participation because the tests were bothersome; four patients canceled at least one (but not all) of the diagnostic tests; and in seven patients, protocol violations occurred due to logistic or physician's reasons.

Table 1 shows the characteristics of the 53 participating patients. The majority of patients had an overall prolapse POP-Q stage III with as leading edge the anterior or middle compartment. According to our criteria, 34 patients had urinary incontinence symptoms, 25 patients suffered from constipation, and 20 patients complained of fecal incontinence. Thirteen of the 20 patients had both fecal incontinence and constipation complaints.

Table 1 Characteristics and POP-Q stages in 53 patients

Table 2 shows the degree of agreement between the treatment decision of gynecologist A before and after information about the results of the additional diagnostic tests (T1 vs. T2) and between the treatment decision before and after the consensus meeting (T2 vs. T3). With the aid of additional diagnostic test, results agreement rose from 66% (T1 vs. T3) to 72–83% (T2 vs. T3, depending on the panelist).

Table 2 Agreement between initial (T1) and second (T2) treatment plan and agreement between second (T2) and consensus (T2) treatment plan for the three panelists (A, B, and C)

Figure 2 shows the number of revisions made by the three panelists with the aid of the additional diagnostic test info; it also shows the comparison with the joint consensus treatment advice. On average, the individual panelists modified 38% of all initial treatment plans (T1) after disclosure of the additional diagnostic test information. Eventually, 24% of all second management plans (T2) were revised by the consensus meeting as they did not meet the consensus criteria (T3).

Table 3 shows the ADV for history taking, pelvic examination, MRI, DG, UDE, and AFT. History taking and pelvic examination were judged as the most useful tests in the guidance of treatment planning. The three gynecologists markedly varied in the assigned diagnostic values for DG, UDE, and AFT, but they agreed that the diagnostic value of MRI was low. The ADV for DG and AFT considerably increased when POP-Q stage >2 or fecal incontinence was present.

Table 3 Assigned diagnostic value of four diagnostic tests (MRI, DG, UDE, AFT) by three gynecologists A, B, C (53 patients, 159 judgments) for the entire study population and stratified for symptoms or prolapse stage

Discussion

This study is among the first to establish the diagnostic value of a series of additional tests (MRI, DG, UDE, AFT), which are regularly used in the evaluation of POP. Intended treatment plans were frequently adapted after the disclosure of additional diagnostic test information, but of all changes, almost an equal number moved to as well as from the consensus treatment advice. Eventually, one fourth of all second treatment plans, either changed or unchanged by more diagnostic information, were not in agreement with the consensus outcome. The extra diagnostic information was often considered important, but the importance varied across gynecologists and tests. None of the tests showed overall superior subjective utility, but in this practice, MRI proved of little value. The assigned value of DG and AFT increased significantly in case of fecal incontinence and large posterior wall prolapse.

Some limitations of the study need to be discussed. First, we used the outcome of the consensus meeting as a gold standard. The gold standard is not necessarily correct, but scientific based evidence in this field has not reached the level that for each combination of anatomical abnormalities the optimal treatment has been defined. In other words, we used the best available reference standard. Second, evaluation of agreement between step I and II may have limited value as the initial treatment decision based on history taking and physical examination was only performed by one gynecologist. There were two reasons to do so: History taking (validated questionnaires) and physical examination (POP-Q system) are highly standardized (high inter-physician agreement), and we felt that physical examination by all three gynecologists separately was too bothersome for our patients. As a consequence, the comparison between initial vs. second treatment plan may in reality be subject to slightly smaller intra-individual variation than observed in this study but the number of total revisions may change in either direction.

Another possible drawback might be the unblinding of gynecologist A. Gynecologists B and C were blinded for initial treatment plans, but it was impossible to blind gynecologist A, who examined the patients at their first visit. We cannot exclude the possibility that unblinding may have prejudiced gynecologist A's judgment of the test results (reflected in the ADV scores and less adaptations of the treatment plans), but bias of the consensus judgment is less likely. If information bias occurred, the effect on consensus outcome is probably small. The information delivered per case was rather abundant, and the cases were evaluated in several sessions.

Finally, the results may be affected by the composition of the study group as high stage posterior wall prolapse was underrepresented in our study population. In patients with primary pelvic organ prolapse, severe posterior wall prolapse are generally less prevalent than in patients with recurrent prolapse [15]. However, our study group comprises an average population comparable with other studies [10, 16]. We speculate that overrepresentation of patients with advanced posterior wall prolapse probably provides outcomes, which are easily too optimistic regarding added diagnostic value. We could argue that the outcomes of the ADV for defecography and anorectal function testing have been influenced by the low number of severe posterior wall prolapses. As shown by the stratified ADV for defecography, defecography is regarded more valuable in case of severe posterior vaginal wall prolapse. We believe that defecography and anorectal function testing would have been judged too optimistically when the study population would only consist of patients with (severe) posterior wall prolapses stage III or higher.

Since similar studies are unavailable, only a partial comparison with the literature is possible. Several studies have examined the influence of defecography or MRI on treatment selection. The study of Harvey et al. reported that diagnostic confidence rose significantly after evacuation proctography and that it altered intended diagnosis and therapy in 18% and 28% of the patients, respectively [17]. Hetzer et al. also concluded that defecography MRI findings lead to changes in the surgical approach in 67% of the patients in whom some form of surgery was performed to treat fecal incontinence [18]. In our study, the assigned diagnostic value of defecography increased in case posterior compartment disorders were present. Apparently, the panelists considered defecography as a helpful diagnostic tool in the work-up of these patients as physical examination often misses occult defects like enteroceles, rectal prolapses, and intussusception [12]. However, the presence of an enterocele and/or rectal intussusception can also be successfully predicted with the results of history taking and physical examination [19]. This may explain why the assigned diagnostic value of history taking and pelvic examination exceeded that of DG. Kaufman et al. showed that MRI is valuable for the identification of pelvic floor defects like levator ani hernias, which were often missed by clinical examination. In their study, dynamic magnetic resonance led to altered operative plans in 41% of the patients [20]. In our study, the lowest diagnostic value was assigned to dynamic MRI regardless of patient's findings and symptoms. Literature in fact supports panelist's opinions in this respect: Quantitative measurements of POP by MRI do not provide better information than physical examination [21, 22], except for the detection of levator ani defects and enterocele, which can also be detected with defecography. The value of UDE in POP patient scheduled for surgical repair is still under debate. UDE is carried out to predict post-operative urinary incontinence and to decide whether in case of stress-incontinence surgery should additionally be performed. A literature review on the clinical relevance of UDE shows that the diagnostic and therapeutic benefits of UDE are yet unproven [23]. None of the gynecologists contributing to this study were advocates of combining prolapse surgery with stress-incontinence surgery because of the increased risk on de novo detrusor instability as compared to prolapse surgery only [2426]. The relative high assigned diagnostic value of 46% for UDE is probably due to fact that until recently, all patients undergoing POP surgery routinely underwent UDE. The panelists nevertheless valued UDE less in patients without complaints of urinary incontinence.

The use of anal manometry and endoanal ultrasonography is reserved for patients with POP who have bothersome defecatory symptoms, especially when a defect of the anal sphincter is suspected [14, 27]. The overall assigned diagnostic value of AFT, including endosonsography, was scored rather low. However, all three expert gynecologists assigned higher values to AFT when defecatory disorders were present. In case of fecal incontinence, the ADV almost doubled. The assigned value of additional diagnostic tests not only varied across tests but also among gynecologists. Part of the inter- and intra-physician variation is due to a lack of a singular reference standard. It is known that individual treatment decisions may be greatly influenced by personal preferences, formed by education, experience, and routine of the department or country [28, 29]. Our management for treatment setting in POP seems still to a great extent guided by experience instead of evidence based medicine.

An important strength of our study is that it provides a better understanding of the impact of additional diagnostic tests and consensus procedure on the treatment decision. This study is, as far as we know, the first to evaluate the impact of a series of diagnostic tests and consensus meeting on ultimate management plans in POP. Our findings can serve to set priorities and to guide future research in this field.

In conclusion, “battery” testing as a routine strategy appears to be not an option. Even with the availability of extensive diagnostic test information, a certain degree of disagreement concerning the optimal therapy among gynecologists continues to exist.

Our study cannot prove which additional diagnostic tests are indicated in which patients and when. Overall history taking and physical examination are always considered to be useful while MRI often is not. Considerable diagnostic value has been assigned to defecography, especially in patients with bowel symptoms and large posterior vaginal wall prolapse. The same applies to AFT in patients with fecal incontinence and, to a lesser extent, in patients with constipation. This study shows that clinicians often feel supported in their decision taking by additional diagnostic testing, but even with complete diagnostic information (almost), full agreement on final treatment advise was not reached. The true additional (therapeutic) value of the individual diagnostic tests needs to be further established in diagnostic-therapeutic clinical trials. As long as evidence providing the diagnostic value of MRI, DG, UDE, and AFT has not been provided, we advocate to base the treatment decision in patients undergoing primary POP surgery on history taking and findings at pelvic examination. Exceptions may be made for patients with specific conditions like fecal incontinence, large posterior wall prolapse, and to a lesser extent for constipation. In these cases, AFT, including anal endosonography and defecography, are useful diagnostic tools in directing therapy.