Introduction

The sonographic examination for vesicoureteric reflux (VUR) with intravesical administration of US contrast agent (UCA)—voiding urosonography (VUS)—has undergone gradual development over the last decade. A wide-ranging review of various procedural aspects of this examination has already been presented [1]. We report here a critical analysis of studies comparing VUS with voiding cystourethrography (VCUG) and radionuclide cystography (RNC) and present detailed tables demonstrating the diagnostic value of these procedures and their reflux gradings. The advantages and limitations of VUS are discussed and the criteria for selection of the reflux imaging modality elaborated. The aim was to present a comprehensive review of all currently available comparative literature on VUS and prepare the ground for an objective evaluation and decision-making.

The first publications on Levovist-enhanced VUS appeared in 1998 [2, 3]. Since then some 40 studies have become available that have compared this method with radiological reflux examination modalities, direct radionuclide cystography (DRNC) and VCUG [240]. In Tables 1, 2, 3 and 4 these studies are presented along with parameters for comparison of the diagnostic values. Studies in which the VUS was performed using a first-generation UCA (Levovist, Tables 2 and 3) or a second-generation UCA (SonoVue, Table 4) were distinguished. To be included in these tables for comparison a study had to fulfil the following criteria: (a) the patients were only children or adolescents without selection of a specific subgroup, (b) the main comparative parameter was reflux detection rate, (c) the reference method was clearly denoted as DRNC or VCUG, (d) both VUS and DRNC or VCUG were carried out during the same examination session, (e) the comparison did not exclude grade I reflux in VUS, (f) the data were analyzed in terms of pelviureteric units (PUUs) and not only in terms of patients, and (g) sufficient data were available to construct a 2 × 2 table of VUS and the reference method. Comparative studies that did not fulfil all the above criteria were excluded from the listings so that the aggregated data were homogeneous. For each study the sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV), and the diagnostic accuracy of VUS with respect to the reference method are given. In addition, the numbers of PUUs with reflux detected only by VUS or DRNC/VCUG are presented including the percentage in relation to the total number of refluxing units.

Table 1 Detection of reflux: VUS vs. DRNC
Table 2 Detection of reflux: VUS with first-generation UCA (Levovist) vs. VCUG
Table 3 Grades of reflux of VUR detected only on VUS or VCUG (only those PUUs with appropriate and comparable reflux grades on both VUS and VCUG are included)
Table 4 Reflux detection: VUS (low MI harmonic imaging) with second-generation UCA (SonoVue) vs. DRNC and VCUG

Limitations of comparative reflux studies

When carrying out comparative studies for VUR in children or analyzing the results one has to be aware of the many procedural factors that may influence the outcome. The temperature of the fluid filling the bladder seems to affect the reflux detection rate. Papadopoulou et al. [45] in a study comprising almost 1,800 VCUGs found reflux in 18% of those who received prewarmed radiographic contrast agent but in only 11% of those who received contrast agent at room temperature. The bladder volume is a further issue, as the pressure in the bladder is proportional to the bladder filling. It has also been hypothesized that repeated bladder filling may unmask occult VUR via two mechanisms: bladder mucosal oedema due to repeated exposure to contrast material or transient bladder instability due to repeated filling [46]. Thus in the comparative studies the order in which VUS and the radiological method are performed has been matter open to discussion. In a recent study [47] including 308 children, SonoVue-enhanced VUS and VCUG were carried out during the same examination session. The patients were divided into two matched groups. In the first group VCUG was followed by VUS and in the second group VUS was followed by VCUG. VUR was detected in 34% and 38% of PUUs, respectively, but the difference in the reflux detection rate between the two groups was not significant. To overcome or minimize some of the technical problems, in a few centres VUS and DRNC or VCUG have been performed simultaneously [21, 28, 29, 36]. This requires two persons doing the examinations at the same time, and it is not possible to perform concurrently US and fluoroscopy of the bladder [36].

In the comparison it is also essential to have comparable patient groups. Documenting grade I reflux by VCUG, but neglecting to evaluate the ureters by VUS does not allow appropriate comparison [25]. It is also important to exclude those patients who voided in one but not in the other examination [12]. Selecting and performing the comparison in patients only with a specific finding, e.g. hydroureteronephrosis or transplanted kidneys, may have its own merit but does not reflect the overall diagnostic accuracy of the method [32]. There is definitely a learning curve in VUS that can effect the outcome of the comparative results. Kenda et al. [21] and Kenda [48] analyzed separately the first and the second halves of the assembled data of their comparative studies of VUS and DRNC. They found that in the second half of their study the sensitivity had risen from 74% to 86% and the specificity from 89% to 94%. This underlines the need to perform a number of VUS examinations in order to gain sufficient experience. With increasingly improved US methods for detection of microbubbles there is a marked decrease in the time required to gain adequate experience.

Comparisons of VUS

DRNC

The data comparing VUS with DRNC are relatively sparse [2, 5, 21, 22, 35]. Actually, DRNC is from the procedural point of view a more suitable comparator for VUS than VCUG. It is technically easier to fill the bladder with both the radionuclide and UCA and perform the two examinations simultaneously [21, 22]. DRNC is primarily used to diagnose or exclude VUR at the cost of reduced anatomic resolution and without imaging the urethra. DRNC is more sensitive than VCUG in detecting VUR and the severity grading of VUR in DRNC is simpler, i.e. there are only three grades [5, 21]. A possible explanation for the lack of using DRNC as the reference method in comparative studies with VUS may be that in most parts of Europe, and in particular in those countries from which most of the published reports come, DRNC is not as widely available or implemented as VCUG.

There are only three comparative studies of Levovist-enhanced VUS in children with DRNC as the reference method, presented with appropriate statistical analyses (Table 1) [5, 21, 35]. Only two of these included comparison of reflux grades [5, 21]. Kenda et al. [21] conducted VUS and DRNC simultaneously filling the bladder with both UCA and radionuclide (technetium-99m DTPA, 20 MBq) and scanning at the same time, including during voiding. No adverse events have been reported due to this combined examination. In the other two studies VUS was performed first followed by DRNC [5, 35]. Fundamental US modality was used in two studies and colour Doppler in one. These three studies included a total of 203 patients with an age range of 1 month to 13 years. A total of 406 PUUs were compared. Concordant results were found in 88% with VUR being detected only on VUS in 11% and only on DRNC in 20% of the refluxing PUUs. Although the results are based on a relatively small number of patients they do indicate that using only fundamental or colour Doppler imaging DRNC is probably more sensitive than VUS in detecting VUR. However, one has to be aware of the difficulties that can arise in interpreting low-grade reflux by DRNC, particularly in small children. Thus there might have been some false diagnoses of low-grade reflux with the reference method DRNC, as pointed out by Kenda et al. [21]. Bosio [2] claims that RNC is less sensitive than VUS because out of 54 PUUs, reflux was detected in 20 and 32, respectively. Unfortunately, in this study not only both indirect and direct RNC had been utilized but also the VUS and RNC were sometimes months to years apart. There is one more comparative study, but the study included only 23 adult renal transplant recipients (age range 23–60 years) [22]. In this group of patients both VUS and DRNC were carried out simultaneously. In 74% the results were concordant, with VUR being diagnosed with VUS alone in two and solely with DRNC in four PUUs. In one other study [42] SonoVue-enhanced VUS with harmonic imaging was compared with DRNC in 20 patients with 41 PUUs. In 20 PUUs reflux was detected, in 9 [48%] of which it was only diagnosed by VUS. This might indicate that with the implementation of newer generation UCA and/or use of a dedicated US imaging modality that VUS still has the potential to be ranked equal to or higher than DRNC with regard to reflux detection rate.

From the above discussion one can conclude that, currently, comparative studies with DRNC are not only too few, but also do not incorporate in most studies the latest and more sensitive US methods. The presently available comparative results do not allow a conclusive statement regarding the sensitivity of VUS in comparison with DRNC with regard to reflux detection. Further comparative evaluations using DRNC as the reference method are essential and should be given a higher priority than studies using VCUG whenever possible.

VCUG

Reflux detection

The bulk of comparative studies has been carried out using VCUG as the reference method (Tables 2, 3 and 4). VUS was performed mostly using Levovist (Tables 2 and 3) and only recently three reports with SonoVue-enhanced VUS have been presented (Table 4). The comparative results with these two UCAs are discussed below separately.

In the Levovist-enhanced VUS group 18 studies fulfilled the criteria presented above and were included in the aggregate data analysis and listed according to the US modality utilized for VUS and whether or not the VUS and VCUG were performed simultaneously or successively (Table 2). This group comprised of 1,338 patients with 2,893 PUUs. The numbers of patients (PUUs) in the studies discussed here ranged from 24 to 216 (47 to 440). Two studies are listed twice as two different US modalities were compared with VCUG [23, 37]. The 103 patients in this group were counted only once but their PUUs twice in accordance with the two US modalities. With VCUG as the reference method the sensitivity of VUS ranged from 57% to 100% and the specificity from 85% to 100%. The PPV and NPV ranged from 58% to 100% and 87% to 100%, respectively. The most important comparative parameter, i.e. the diagnostic accuracy, ranged from 78% to 96%. With the exception of two studies, the diagnostic accuracy was 90% and above. In terms of PUUs the overall agreement between the two methods was 91% (2,622 out of 2,893; Fig. 1).

Fig. 1
figure 1

VUS with dedicated high-MI modality and VCUG carried out in the same patient successively during one examination session illustrating the high concordance between VUS and VCUG. a On the right there is a duplex kidney with a dilated upper moiety pelvicalyceal system. b–d Reflux is detected in both moieties by both VUS and VCUG (b VUS image with “grey-scale + contrast” option, c VUS image with “contrast-only” option). Not only are the results regarding reflux detection concordant in VUS and VCUG, but the reflux severity is also comparable. In both examinations there is grade II reflux in the lower moiety (arrowhead). In the upper moiety one might be inclined to make the diagnosis of grade V reflux (arrow). In the absence of a dilated ureter, the combination of reflux with pelviureteric junction obstruction is to be considered. In such a case the grading systems are misleading

The discordant findings were due to the fact that reflux was detected in a given PUU by only one modality. In 170 (19%) of 886 refluxing units, the diagnosis was made only by VUS, and the diagnosis was made only by VCUG in 90 units (10%). Thus, overall, 9% more reflux episodes were detected using VUS. Except in three [41], in all other studies listed in Table 2 the grades of reflux in the discordant findings were available for comparison (Table 3) [12, 32]. It is interesting to note that 70% of reflux episodes missed on VCUG and detected solely on VUS were grades II–V, whereas in the reverse case, 68% of the episodes detected solely on VCUG were grade I. The following explanations may partly help to explain the reasons for these discordant findings: (a) reflux as such is intermittent in nature as demonstrated in a number of studies involving cyclical bladder filling [49, 50]; (b) US enables continuous scanning of ureters and kidneys starting at the time of bladder filling and continuing until after micturition, whereas in VCUG only brief glimpses are obtained with intermittent fluoroscopy, i.e. the scan time after contrast agent administration in VUS is many times greater than in VCUG [7, 12]; (c) in a massively dilated ureter or pelvicalyceal system or in a combination of reflux and vesicoureteric or pelviureteric junction obstruction it becomes much easier to pick out a single or small number of echogenic microbubbles due to the echo-free surroundings, in contrast to the case in VCUG in which a small amount of radiographic contrast agent refluxing into a dilated ureter or pelvicalyceal system may be difficult or impossible to detect due to a dilution effect; (d) unlike VCUG in which the bladder, ureters, and both kidneys can be visualized simultaneously, it is only possible to examine one organ at a time during VUS; (e) the distal ureters may be partly obscured if in VUS the Levovist is injected too fast creating dorsal acoustic shadow; and (f) in VUS nondilated refluxing ureters may be difficult to delineate from the surroundings, particularly in the presence of bladder abnormalities or in uncooperative children [51].

Is the reflux detected only on VUS of any significance? Anthopoulou et al. [52] in 146 children with 292 PUUs evaluated dimercaptosuccinic acid (DMSA) scintigraphy in addition to both VUS (SonoVue) and VCUG. No significant difference in renal damage was found between children with reflux on VCUG and those with reflux only on VUS and also in children without reflux on VCUG and those without reflux on VUS. Consequently, reflux missed by VCUG and shown only by VUS is associated with the same incidence of renal damage as reflux shown by VCUG.

The rate of refluxes detected only on VUS increased from fundamental modality to colour Doppler US to harmonic imaging, i.e. 18%, 21% and 28% (Table 2). With ongoing improvement in US contrast imaging, further increases in sensitivity of VUS can be expected. Newer generation and more stable UCAs might compound positively the technological advances in US. The only three studies in which SonoVue for VUS was compared with VCUG provided comparable results to the use of Levovist-enhanced VUS with harmonic imaging (Table 4). In a total of 190 patients with 381 PUUs the diagnostic accuracy was 91%, with VUR being detected only on VUS in 26% of the refluxing PUUs.

In summary, the comparative aggregated data between VUS and VCUG indicate the following: (a) reflux exclusion and diagnosis between the two methods is highly concordant, (b) the discordant findings are primarily due to more reflux episodes being detected with VUS than with VCUG, i.e. VUS is a more sensitive method for reflux detection [51], (c) the reflux episodes detected solely by VUS are of higher grade and thus clinically more relevant than the predominantly low-grade reflux found only on VCUG, and (d) the consistently high NPV of VUS in all studies has practical consequences as it demonstrates that VUS is suitable for screening those without reflux. The latter is important because more than half of children presenting routinely for investigation of possible reflux do not have it.

Reflux grading

Most of the comparative studies apply similar reflux grading in VUS and in VCUG [13]. To be included in the grading comparison the study had to also fulfil the criteria listed for reflux comparison, but with the following modifications: (a) the reference method is only VCUG, (b) a five-grade system is applied in both modalities without combining reflux grades, and (c) a detailed comparison of the reflux grades is presented making it possible to construct a 5×5 table of reflux grades.

Six studies fulfilled these criteria (Table 5) [7, 13, 14, 23, 37, 39]. In two of these studies VCUG was compared with two different US modalities, fundamental and harmonic imaging, and the reflux grade comparisons were presented separately. Each of these is included as a separate comparison [23, 37]. Thus the aggregated data include eight reflux grade comparisons, with fundamental imaging being used in five studies and harmonic imaging in three. The total number of patients in these studies was 539. The reflux grade comparisons included only 326 PUUs with reflux detected on both VUS and VCUG. Those PUUs with reflux detected by only one of the modalities were excluded from grading comparison.

Table 5 Reflux grading. VUS with first-generation UCA (Levovist) vs. VCUG

In 240/326 PUUs (73.6%) the reflux grades were concordant in VUS and VCUG (Fig. 1). Considering only lower grade reflux (grades I/II) the concordance was lower—63.9% (99/155 PUUs). The high-grade reflux episodes (grade III–V) were significantly more concordant—82.5% (141/171 PUUs). Overall, the reflux was graded lower on VUS than on VCUG in 22 (6.8%) PUUs. The reflux grade was found to be higher on VUS than on VCUG in 64 PUUs (19.6%). In this group most of the refluxes on VCUG were grade I, i.e. 42 (12.9%). A major discrepancy between VUS and VCUG with respect to grading was that in 71.2% of PUUs with grade I reflux on VCUG, microbubbles were detected in the respective renal pelvis on VUS, i.e. were grade II and higher [13]. In comparative studies between VCUG and DRNC, this discrepancy has been found to be even greater reaching 100% [53]. The rate of renal scarring in grades I and II being the same also points to the fact that VUR considered to be grade I in VCUG is actually grade II and higher. As the grade of reflux affects the therapeutic choices, and in particular as some regard grade I reflux in VCUG to be of no clinical relevance, the finding on VUS further stresses that the division of grades I and II in VCUG is more or less artificial.

These findings indicate that grading of reflux on VUS is not only possible, but also applicable in routine work-up. In summary, the comparative aggregated data for reflux grading between VUS and VCUG indicate the following: (a) reflux grades between the two methods are concordant in about 75% of PUUs, (b) the discordant findings are primarily due to a significant number of grade I reflux episodes on VCUG being grade II or higher on VUS.

Selection criteria for VUS

With the introduction of VUS as a routine diagnostic imaging option besides VCUG and RNC the question arises as to when to use which examination. The procedural similarities and comparability of the diagnostic results between DRNC and VUS indicate that VUS has the potential to fully replace DRNC [48, 54]. This is not the case for VCUG. The selection criteria between VUS and VCUG have been largely dominated by the question of urethral imaging [48, 54, 55]. Although in the last few years there have been more attempts to include transperineal US of the urethra as part of VUS, it is still not widely implemented [811, 24, 40]. Thus until now the primary application of VUS has been in the diagnosis or exclusion of reflux and not urethral imaging (Table 6) [7, 12, 19, 20, 23, 27, 33, 37, 48, 56].

Table 6 Selection criteria. Primary diagnostic imaging modality for VUR: VUS or VCUG

The patients primarily selected for VUS rather than VCUG are those presenting for follow-up studies for monitoring the outcome of conservative or surgical therapy (Table 6). At this time the main question is the presence or absence of VUR, without the need to demonstrate the urethra. In particular, this group of patients benefits most from VUS as the repeated radiation exposure from radiological reflux examinations can be avoided. A second large group of patients selected primarily for VUS are girls presenting for the first time for reflux examination. Urethral pathology in the presence or absence of VUR, particularly a significant one that would require some kind of intervention, is extremely rare in girls [55]. A third group of patients, again in whom imaging of the urethra is not of primary interest, are those high-risk patients coming for screening of reflux, e.g. patients with a transplanted kidney [57]. In this regard, VUS would be a suitable modality for screening siblings of patients with reflux [55, 58]. No systematic study evaluating the utility of VUS solely in sibling screening is yet available. VCUG has been recommended as the primary imaging modality in those patients in whom urethral imaging is of importance (Table 6). This group includes boys presenting for their first reflux examination and all those patients referred specifically for diagnosis of a urethral anomaly [55]. Another group of patients are those with potential voiding problems, in whom evaluation of bladder morphology and function are of primary importance [54, 56]. A clear indication to forego VUS and perform radiological reflux examination directly is when the bladder or one of the kidneys cannot be visualized on US, e.g. malposition of kidneys due to severe scoliosis [32, 55].

When VUS is used as the primary modality and reflux is excluded there is usually no discussion of further imaging. This may be the case too if low-grade VUR is demonstrated on VUS. In the presence of high-grade VUR and impending surgical management there are only very few centres with long-standing experience with VUS that refrain from additionally performing a VCUG. Mostly in routine practice in these cases a VCUG is still carried out in addition following a positive VUS [55]. There are further imaging details that are considered when making the selection for the primary type of reflux imaging modality or performing a VCUG when reflux is detected on VUS [54, 55]. Reflux in duplicated collecting systems may be such an indication, as VCUG offers a panoramic view of ureteric morphology, and most would opt for surgical management in these cases with the assumption that spontaneous resolution is unlikely. However, it is to be noted that there are reports suggesting that spontaneous resolution of mild-to-moderate VUR in completely duplicated collecting systems is similar to that in single systems with an identical degree of VUR during a comparable period of observation [59]. The depiction of paraureteric diverticula (Hutch diverticula) may also be easier with VCUG. In the presence of VUR, paraureteric diverticula have been an indication for surgical management with a similar reasoning as for duplex systems. However, a recent study has shown that the spontaneous resolution rate of VUR in the presence of a Hutch diverticulum is comparable to that without [60].

It is not only the diagnostic ability of a procedure that determines its acceptability in routine practice. Other factors also play important roles in decision making, such as accessibility of equipment, availability of UCA, and cost of the procedure including the possibility and type of reimbursement. An interesting aspect in this regard is the introduction of VUS in a developing nation due to the availability of US machines, but limited access to fluoroscopy or scintigraphy [40]. Furthermore, introduction of new methods depends largely on the availability of personnel skilled in the methods that they feel confident to rely on and perform [48]. It is a fact that there is relatively marked heterogeneity among urologists and nephrologists in the diagnostic approach and management of children with VUR [55]. Accordingly, the selection criteria have to accommodate local practice regarding VUR. When both VUS and VCUG are available for imaging a noticeable reduction in the number of VCUG studies would be expected. In a study using similar criteria as discussed above, the VCUG rate was reduced by 53% [55]. Thus the number of children exposed to ionizing radiation was cut by more than half.

Pros and cons of VUS

It is almost a decade since the first reports on Levovist-enhanced VUS appeared [2, 3]. VUS has emerged as an alternative option in the routine diagnostic imaging of VUR. Thousands of contrast-enhanced VUS examinations have been carried out in children and over 60 published reports have appeared. VUS has turned out to be a topic of intense and partly controversial discussion in paediatric uroradiology. VUS is now mentioned or described in standard paediatric radiology and urology textbooks and incorporated into guidelines.

There are still a number of limitations to VUS. One of the major factors restricting its widespread use is the higher cost of UCAs in general compared to radiographic contrast agents. This varies from one country to another and also depends on the type of reimbursement for the examination [54]. Although urethrosonography is being carried out in a few centres, it is not yet widely accepted. Urethral imaging still remains the domain of VCUG [55]. Another point cited against VUS is the longer examination time [7, 12]. The comparably inadequate evaluation of bladder morphology and function and lack of panoramic view of the urinary tract are to some extent disadvantageous [54, 56]. The operator-dependence of US examinations is not to be underestimated [21, 48]. There is also lack of sufficient standardization. However, advances in both US technology and UCAs are having positive effects in reducing the examination time, on the dose of UCA and thus on the cost of the examination, and on the learning curve [14]. An emerging problem is that Levovist is no longer being actively marketed and in some countries is not as easily available as in the past. The new UCA, SonoVue, has also not yet been approved for VUS.

The major advantage of VUS is the possibility of avoiding the exposure of children to radiation. The availability of pulsed fluoroscopy with a clear reduction in radiation dose should not be a deterrent to using a modality free of radiation. The comparable comfort to the child of US examination is not to be taken lightly. Furthermore, US is not only a widespread modality, but also one, depending on locality, that is performed not only by paediatric radiologists, but also by paediatricians and various paediatric subspecialists, including sonographers. In some places the potential also exists to shift reflux examinations by VUS from the radiologist to the sonographer, consequently reducing the load of fluoroscopic examinations for the doctor. VUS is not only more sensitive, particularly compared to VCUG, but also detects higher grades of reflux. Moreover, VUS provides anatomic detail clearly superior to RNC. The following quotation by Sara O’Hara in an editorial in Radiology sums up the current standing of VUS: “No radiation, no bladder catheterisation, no sedation, low cost, high sensitivity, and excellent anatomic detail—now that would be the perfect screening cystographic examination. With all these factors considered, cystosonography (=VUS) is fairly close to the mark” [61].