Introduction

Evidence-based medicine (EBM) has been increasing in importance and gaining popularity in the medical field. EBM entails the application of high-quality results to clinical practice using an integrated scientific process based on clinical evidence [1]. With the growing importance of EBM in modern medicine, the use of randomized controlled trials (RCTs) has become more significant even as the number of RCTs increases globally [2]. This is because RCT is the highest-rated study method of the many models as it minimizes study design bias [3]. Bias abounds if, from the design to the implementation of a study, the basic elements of RCT such as randomization, blinding, and allocation concealment are not properly performed [4]. To minimize mistakes and improve the quality of articles, it is necessary to objectively assess the methodological quality of articles [5]. Currently, as a guideline for quality improvement of RCTs, the CONSORT statement is recommended by the International Committee of Medical Journal Editors [6,7,8]. The CONSORT statement guideline helps medical doctors conduct RCTs with minimal bias [9]. There is a limitation, however, that the CONSORT statement is merely a guideline for RCTs and not a tool for assessing the quality of articles. There are individual markers, checklists, and scales used for RCT methodological quality assessment. Of these, scales have the advantage of comparison ease in quantitative assessment of clinical trial quality between studies [7, 8].

Representative quality assessment tools using scoring systems include the Jadad scale, the van Tulder scale, and Cochrane Collaboration Risk of Bias tool (CCRBT). Jadad scale is a tool for assessing randomization, double blinding, and drop-out items related to bias reduction [10]. Although the Jadad scale has the advantage of being simple and easy to assess, it does not include the individual marker index of allocation concealment, making it difficult to assess selection bias in patient allocation for treatment. The van Tulder scale [11] and CCRBT [12] contain assessment items for allocation concealment with the advantage of selection bias assessment.

Erectile dysfunction (ED) refers to a state characterized by inability to develop or maintain proper erection of the penis for sexual intercourse [13]. The incidence and severity of ED increase with age. It occurs in 52% of men aged 40–70 with various degrees of severity (mild, moderate or severe), with an incidence of 8% in men in their 40 s to 15% in men in their 70 s, according to the Massachusetts Male Aging Study (MMAS) [14]. ED treatment varies from the administration of oral phosphodiesterase type 5 (PDE-5) inhibitors to implants, but the choice of treatment should consider factors such as patient age and general patient health. Many RCTs have published studies on the efficacy of ED treatments. There have heretofore been no studies analyzing the quality of RCTs on ED.

Thus, the purpose of this study is to assess the quality of RCTs on ED published in the last 10 years using 3 representative RCT quality assessment tools, namely Jadad scale, van Tulder scale, and CCRBT, and to propose directions for further studies.

Materials and methods

The subjects of analysis

The subjects of the analysis were RCT studies that were identified from searches in the PubMed, Embase, and Cochrane Library databases using the keyword “Erectile Dysfunction.” RCT original articles on ED published from 2007 to 2018 and identified from the searches were selected.

The quality assessment was conducted after dividing the period under study into 6, 2-year periods, namely 2007–2008, 2009–2010, 2011–2012, 2013–2014, 2015–2016, and 2017–2018.

Selection of RCTs

Two reviewers independently searched for RCTs using PubMed. Then, to find RCTs that may have been missed, the reviewers modified their search by using the keywords “randomized,” “randomization,” “randomly,” and “erectile dysfunction.” The different search outcomes of extracted articles obtained by each reviewer were adjusted and fine-tuned by a third reviewer (Fig. 1).

Fig. 1
figure 1

Flowchart of selected data

Assessment method using quality assessment tools

The 2 reviewers analyzed the RCTs using the Jadad scale, the van Tulder scale, and CCRBT. If there were differences in data obtained by the 2 reviewers, the third reviewer modulated them.

Jadad scale

The Jadad scale, also known as the Oxford quality scoring system, is a tool for assessing the quality of RCTs and it consists of a total of 3 graded questions. A maximum of 2 points can be awarded for a randomization question, 2 points for a blinding question, and one point for a drop-out question. For the randomization question, a point is awarded when the RCT article includes a description of randomization. An additional point is assigned if the article includes appropriate description on randomization, but 1 point is deducted for inappropriate description. Thus, Jadad scale distributes 0–2 points in total. For the blinding question, a point is assigned when double blinding is mentioned in the article. An additional point is assigned when appropriate blinding is included, but 1 point is deducted for inappropriate blinding. For the drop-out question, a point is awarded when a drop-out is mentioned in the article. The quality of the final RCT article is classified as low quality when 0–2 points are accrued, and high quality when 3–5 points are awarded in the assessment for the allocation of a possible maximum of 5 points [15].

van Tulder scale

Van Tulder scale is one of the most appropriate tools for RCT quality assessment. It includes 11 factors, namely: randomization, allocation concealment, baseline characteristics, patient blinding, care provider blinding, observer blinding, co-intervention, compliance, drop-out rate, end-point assessment time point, and intention-to-treat analysis. Each item is assessed by a response of “yes,” “no” or “I do not know” and the RCT article is regarded as high quality if the score is ≥5 points [11].

Cochrane collaboration risk of bias tool (CCRBT)

The Cochrane assessment is divided into 6 domains: sequence generation, allocation concealment, blinding, incomplete outcome data, selective outcome reporting, and other potential threats to validity. Each item is assessed by a response of “yes,” “no” or “unclear.” The reviewers judge and assess according to the detailed criteria for “yes,” “no” or “unclear” responses for each of the 6 domains, which indicate low, high or uncertain risks of bias, respectively. If the responses for the first 3 domains are all “yes” and no important concerns are identified in relation to the last 3 domains, the RCT article is considered to have a low risk of bias. If the response in ≤2 domains is “unclear” or “no,” the study is classified as having a moderate risk of bias. If the response in ≥3 domains is “unclear” or “no,” the RCT is considered to have a high risk of bias [12].

Analysis of RCT quality according to other factors

In the RCT quality assessment in this study, the presence of intervention, funding, and institutional review board (IRB) approval were also taken into account.

Statistical analysis methods

The score of each assessment tool was compared and analyzed using one-way analysis of variance test, while chi-squared test was used to compare and analyze the ratio of the high-quality articles and the quality assessment outcomes from CCRBT. The student’s t-test was used for comparison based on the presence or absence of IRB approval, funding, and intervention. All statistical analyses were performed using SPSS v.22.0, and a p-value < 0.05 was considered statistically significant.

Results

Quantitative changes of RCTs over time

From 2007 to 2018, there were a total of 277 RCT original articles related to ED. RCT studies were responsible for the increase in the number of articles (from 58 to 67 articles) from 2007 to 2010, with a subsequent decrease (from 48 to 30 articles) from the year 2015 to 2018 (Table 1).

Table 1 Characteristics of RCTs by publication year with quality assessment of RCTs

Qualitative changes of RCTs over time

  1. 1.

    Jadad assessment scale: From 2007 to 2018, the mean score of the Jadad scale for the RCTs was 2.80 ± 1.24. In the period 2007–2008, the score was 2.60 ± 1.30 and 3.27 ± 1.31 in 2011–2012, meaning that the mean score increased. However, it decreased to 2.73 ± 1.22 in 2015–2016 (p = 0.09). Also, the number of high-quality articles was 30 (51.7%) in 2007–2008, 28 (75.7%) in 2011–2012, and 27 (56.3%) in the period 2015–2016 (p = 0.06). In 2017–2018, the score and the number of high-quality articles were increased compare to 2015–2016. The total number of high-quality articles was 174 (62.8%) (Table 1).

  2. 2.

    van Tulder assessment scale: The mean scores of the van Tulder scale at 2-year intervals for the periods 2007–2008, 2009–2010, 2011–2012, 2013–2014, 2015–2016, and 2017-2018 were 6.05 ± 1.48, 5.72 ± 1.47, 6.35 ± 1.57, 6.14 ± 1.57, 5.83 ± 1.66, and 6.33 ± 1.61, respectively (p = 0.27). The numbers of high-quality articles were 49 (84.5%), 54 (80.6%), 31 (83.8%), 30(81.1%), 37 (77.1%), and 24 (80.0%), respectively (p = 0.95) among the total 225 high-quality articles (81.2%).

  3. 3.

    CCRBT: There were 5 (8.6%), 3 (8.1%), 2 (4.2%), and 5(16.7%) articles with a low risk of bias for the periods 2007–2008, 2011–2012, 2015–2016 and 2017–2018, respectively based on the CCRBT (p = 0.08; Table 1).

Analysis of factors related to the quality of the articles

In the assessments of Jadad and van Tulder scales, the number of high-quality articles was significantly higher when there were funding, IRB approval, interventions, or single-center studies, and the comparison is as follows: funding (yes: 66.7%, p < 0.01; no: 36.4%, p = 0.02), IRB (yes: 83.3%, p < 0.01; no: 18.7%, p = 0.01), intervention (yes: 98.9%, p < 0.01; no: 1.3%, p < 0.01), and single-center studies (yes: 55.2%, p < 0.01; multicenter: 43.6%, p = 0.01). For CCRBT, the number of studies with low risk of bias was of higher statistical significance when there was IRB approval, funding, single-center, or interventions (Table 2).

Table 2 Factors associated with the quality of RCTs

Analysis of subject related to the quality of the articles

Many RCT studies on PDE-5 inhibitors such as sildenafil, tadalafil, vardenafil, and udenafil have been published (Table 3). In the analysis of the quality of RCTs according to the ED treatment drug, the Jadad scale scores were 2.73 ± 1.10, 2.88 ± 1.27, 3.63 ± 1.15, and 2.69 ± 1.23 for sildenafil, tadalafil, vardenafil, and others, respectively. The numbers of high-quality articles obtained using the Jadad scale for sildenafil, tadalafil, vardenafil, and others were 34 (69.4%), 27 (65.9%), 14 (87.5%) and 52 (55.9%), respectively (p < 0.01). The van Tulder scores for sildenafil, tadalafil, vardenafil, and others were 5.96 ± 1.35, 6.27 ± 1.47, 6.81 ± 1.47, and 5.74 ± 1.52, respectively. There were also 43 (87.8%), 36 (87.8%), 16 (100.0%), and 70 (75.3%) high-quality articles for sildenafil, tadalafil, vardenafil, and others, respectively as assessed using the van Tulden scale (p = 0.04). The numbers of RCTs assessed by CCBRT to have a low risk of bias for sildenafil, tadalafil, vardenafil, and others were 3 (6.1%), 3 (7.3%), 1 (6.3%) and 9 (9.7%) respectively (p = 0.10; Table 3).

Table 3 Assessment of RCT quality according to subject of erectile dysfunction research

Distribution of RCTs of ED with other diseases and Journals

The frequency of ED as a topic was reported when it appeared as a single subject, associated with diabetes, benign prostatic hyperplasia or other diseases, and prostate cancer (Fig. 2). The distribution of journals with publications on RCTs related to ED include The Journal of Sexual Medicine, International Journal of Impotence Research, British Journal of Urology International, and Asian Journal of Andrology (Fig. 3).

Fig. 2
figure 2

Distribution of randomized controlled trials on erectile dysfunction with other diseases

Fig. 3
figure 3

Distribution of journals publishing randomized controlled trials on erectile dysfunction. BJUI British Journal of Urology International, JSM Journal of Sexual Medicine, IJIR International Journal of Impotence Research, J Urology: Journal of Urology, J Andrology: Journal of Andrology, AJA Asian Journal of Andrology, Int. J Andrology International Journal of Andrology, IJCP International Journal of Clinical Practice

Discussion

In the quality assessment of RCTs conducted on ED as a subject, it was observed that the quality steadily improved over time from 2007 to 2012 and from 2017 to 2018, except for the assessment using CCRBT. However, the number of high-quality RCTs declined from 2012 to 2016. The quality of the RCTs was higher when there were IRB approval, funding, interventions, and single-center studies. Most of the RCT studies on ED was associated with PDE-5 inhibitors, and published in the Journal of Sexual Medicine. This study found that the number of RCTs on ED has been constantly increasing as evidenced by recent publications in many international journals. Scales et al. reported that the number and percentage of RCTs have increased over time because of a comparison of RCTs published in journals such as The Journal of Urology, Urology, European Urology, and BJU International from 1996 to 2004 [16]. Lee et al. reported an increase in the number of RCTs published by Korean Journal of Urology over the past 20 years [17]. This increase in RCTs is considered the reason for the growing importance of EBM.

As the value of EBM to clinical medicine increases, a systematic and scientific approach to RCT is emphasized, with RCT quality also taking on greater importance [18]. Studies on EBM and their findings have been consistently collected in databases by Cochrane Collaboration with the help of clinical epidemiologists working around the world. Data compiled by meta-analyzing and collecting RCTs in health and medicine constitute the Cochrane Library, which is published through various media forms and widely used as a guideline for medical doctors around the world [19]. In recent years, journals require authors to ensure that RCT studies are conducted in accordance with CONSORT statement, by completing tasks in a checklist, before submission for publication. However, it is hard to quantitatively assess RCT quality using the CONSORT statement as the CONSORT statement has no weight on items and is an unspecified guideline for high quality. RCT quality assessment is important for evaluating the bias occurring in the study process, the validity of study conclusions, and the necessity of further studies. Thus, RCT quality assessment should be conducted [20, 21].

There are many tools for quantitatively assessing RCT quality such as Campell, Moher, Chalmers, Jadad, van Tulder, Newell’s, and Cochrane [5]. In this study, we used 3 tools, namely Jadad scale, van Tulder scale, and CCRBT, to comprehensively analyze the various elements of the CONSORT statement for RCT quality assessment. RCT quality assessment using these tools has recently been conducted in several places. Kim Lee et al. analyzed RCTs published in several urology journals (International Journal of Impotence Research, Journal of Endourology, neurourology and urodynamics) [22,23,24]. According to the study, an increase was observed in the mean Jadad scale scores of the RCTs and in the number of high-quality articles over time. Lee et al. analyzed the RCTs published in the journal Neurourology and Urodynamics from 1993 to 2012 using Jadad scale, Van Tulder scale, and CCRBT and concluded that the number of RCTs increased but the quality did not improve [24]. In our study, the overall quality improved over time, but no qualitative improvement was observed during certain periods. This revealed a qualitative decline caused by a decrease in blinding. Therefore, there is a need to complement these points in future ED studies.

In this study, the quality differences based on IRB approval were also investigated. Bridoux et al. concluded that high-quality RCTs were approved by IRB [25]. The IRB review is a step that is recognized for the validity of design and implementation in the research planning phase of a study, and is considered an international standard. In recent RCTs, a valid study plan for obtaining approval at the IRB review plays a key role in improving the rate of high-quality articles. Schulz et al. reported that if the concealment of allocation was not performed properly, randomization in clinical research may be compromised, and even with initial randomization, the effect of intervention could be distorted by more than 40% [26]. Hewitt et al. assessed the quality of RCTs published in the New England Journal of Medicine, BMJ, JAMA, and Lancet, and reported that 46% of the studies were conducted with uncertain concealment of allocation [27]. In this study, the percentage of RCT studies on ED that correctly conducted concealment of allocation was low. Concealment of allocation is a major component of improving the quality of study. Therefore, it is important to ensure that concealment of allocation in further studies is properly implemented. Clifford et al. analyzed 100 RCTs published in five different peer-reviewed, general medical journals with high impact factor and concluded that there was no correlation between funding sources and quality of articles [28]. However, Lee et al. reported that financially supported RCTs had many high-quality articles because they made well-designed and large-scale studies possible [17]. In this study, the quality assessment scores of financially supported articles evaluated using Jadad and van Tulder scales were higher than those of articles which were not financially supported. In the quality assessment based on the implementation of interventions, the number of high-quality studies, evaluated with Jadad and van Tulder scales, was greater when interventions were conducted. Assessment using CCRBT also revealed a higher risk ratio in RCTs with interventions. Computer generated randomization, allocated concealment, and double blinding are performed to reduce the risk of bias in a large number of RCTs with interventions [29]. For this reason, RCT studies with intervention are considered of high quality due to the increased use of objective methods. In the RCTs on ED, the topics most studied were PDE-5 inhibitor-related, and the quality assessment was relatively high. The PDE-5 inhibitor, sildenafil, was first developed and later, tadalafil, vardenafil, and others were developed [30]. During this process, a large number of RCT studies were conducted in connection with various diseases such as benign prostatic hyperplasia and pulmonary hypertension [30]. The quality of PDE-5 inhibitor-related RCTs was the highest of the RCTs on treatment.

One limitation of this study is that the subjective opinion of the reviewers may play a part in the assessment process due to the manual nature of the research and evaluation. To minimize this limitation, 2 reviewers participated independently in RCT sampling and quality assessment. When there were differences in the results, a third reviewer was consulted for correction to ensure the objectivity and reliability of the study. Another limitation of this study is that there are no formal representative tools among the currently available quality assessment tools in RCT quality analysis for assessing all the items represented in the CONSORT statement. However, we tackled this limitation by using 3 of the most widely used representative tools for RCT quality assessment. This study is of great significance because it shows the quantitative and qualitative changes of RCTs on ED over time, and suggests ways to improve the quality of further studies by analyzing factors affecting the quality of RCTs.

Conclusion

The number of RCT original articles on ED published increase from 2007 to 2018. However, it starts to decrease from 2013 and no significant increase in the quality of RCTs is observed from 2013. The number of high-quality articles increased when IRB approval is granted, when funding is provided, or when RCTs involved interventions or single-center studies. Researchers will do well to focus efforts on conducting high-quality studies.