FormalPara Key Points for Decision Makers

Although the average scores of the included most-cited studies in Chinese were higher than the sampled studies in Chinese, a large gap persisted between Chinese and English publications.

The methods, results, and discussion sections of the studies published in Chinese were of low quality.

Editors of Chinese journals need to consider adopting improved quality standards, evaluate submitted articles according to the more stringent requirements of the CHEERS criteria, and urge authors to provide higher-quality work for publication.

1 Introduction

In China, resource allocation and cost issues must be considered in policy decisions about health-related population interventions [1]. Utilizing limited health resources to their best effect is one of China's medical and health industry reform and development goals. The problem of “difficult and expensive medical treatment” has not been effectively solved in China. The policy of “targeted poverty alleviation” in China also has higher requirements for allocating health resources. Health economic evaluation research could be important at this crucial time as it provides insights into managing healthcare costs and ensuring optimal use of scarce resources. Generally, health economic evaluations are now cited routinely to assess the quality of related literature in framing policy statements, consensus guidelines, and professional society technical reviews [2].

At present, the well-recognized guidelines for evaluating the quality of health economic evaluations can be divided into two categories. The first are guidelines that focus more on conduct, such as The Quality of Health Economic Studies (QHES) and The Consensus on Health Economic Criteria (CHEC). The second are guidelines that focus more on the reporting, such as the Consolidated Health Economic Evaluation Reporting Standards (CHEERS). The main purpose of this research was to evaluate the quality of reporting of health economic evaluations in mainland China and help Chinese researchers to understand the problems in their reporting, to report health economics evaluations in a more standardized way, and to better interpret the process and results of health economic evaluations. For these reasons, we chose CHEERS as the tool for this research. CHEERS [3] was published in 2013 and is an evaluation checklist of 24 items mainly used to guide the reporting of health economic evaluations and evaluate the quality of publications in peer review. The types of research evaluated can cover various subjects such as pharmacoeconomics, health intervention measures, vaccines, and medical equipment. CHEERS is well-recognized by most health economists as it reflects the quality of reporting of health economics evaluations satisfactorily.

In addition, we believed that the quality of studies differs between those in Chinese and those in English. As such, we conducted a comparative analysis to help researchers who have no experience in publishing English literature to write higher-quality reports and provide a standard for editors of Chinese journals to improve the quality of their publications.

2 Methods

2.1 Literature Search and Screening

Figure 1 provides an overview of the identification of the included studies as a Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) diagram. A targeted literature review was designed using relevant search terms for health interventions in China and economic analyses. The search strategy was developed using medical subject headings related to health interventions and types of cost-analysis studies in China.

Fig. 1
figure 1

Flow diagram of the literature search and included studies

A systematic search of the literature was conducted using the PubMed databases (in English) and the CBM, CNKI, VIP, and Wanfang databases (in Chinese) to identify health economic evaluation studies pertaining to China in both Chinese and English from 2006 to 2015. Search terms included the keywords health economic, cost, cost-effectiveness analysis, cost-minimization analysis, cost-utility analysis, cost-benefit analysis, economics evaluation, and China, either used alone or combined with a Boolean operator. The search strategy for the PubMed search engine is presented in Supplementary Material, Box 1.

The inclusion criteria for this study were as follows: (1) original studies reporting the findings of health economic evaluations in China; (2) studies were full economic evaluations comparing both costs and consequences and included a comparison of alternatives [1]; (3) studies were conducted in mainland China; (4) the first author was from China, except for Hong Kong, Macau, and Taiwan; and (5) the manuscript was published in English or Chinese. Studies comparing publications from multiple countries were excluded. Articles were excluded if they were only introductions to health economic evaluations in public health (with no data) or if costs were not a main topic of the study. However, the reference lists of these articles were screened to identify additional relevant articles. Full journal publication was required for a study to be included in this review; thus, meeting abstracts, letters to the editor, treatment guidelines or recommendations, expert opinions, and narrative reviews were excluded.

All studies in English that met the inclusion criteria were included. For the studies in Chinese, two types of studies were included. First, we used the random seed method to select 200 studies from the pool of 6345 studies that met the inclusion criteria. These 200 sampled studies represented the general quality of Chinese studies to a certain extent. Second, we selected the 57 most cited studies published in journals from Chinese Social Sciences Citation Index or the Core Journal of China from the pool. We considered these 57 most-cited studies to represent the quality of high-level Chinese studies.

All literature was screened by two independent reviewers (YZ and MT). Where these reviewers disagreed, four reviewers discussed the literature to reach a conclusion (SCZ, JHC, YZ, and MT). The primary reason for exclusion was recorded during the screening and review process.

2.2 Data Extraction and Evaluation of Studies

A bespoke data extraction table was used to collate extracted data, which mainly included general study characteristics (i.e., year of publication, type of health economic evaluation, type of intervention, affiliation of the first author, and funding source) and study details (i.e., costs, outcomes, valuation methods, discounting rate, and sensitivity analysis).

The CHEERS checklist comprises 24 items to assess the quality of reporting. Our reviewers used an English-language checklist to rate each report. The 24 items were scored using “yes” (reported in full), “part” (partially reported), “no” (not reported), and “NA” (not applicable). To estimate a reporting score, we assigned a score of 1 for complete reporting, 0.5 for partial reporting, and 0 for not reporting. Items marked as NA were not counted in the score. Equal weight was assigned to each item of the checklist. The scoring system can be shown as follows:

$${\text{Score}} = \left( {N_{{{\text{Yes}}}} \times \, 1 \, + \, N_{{{\text{Part}}}} \times \, 0.5 \, + \, N_{{{\text{No}}}} \times \, 0} \right)/N_{{{\text{Item}}}} \times \, 100,$$

where NYes is the number of items marked “yes”, NPart is the number of items marked “part”, NNo is the number of items marked “no”, and NItem is the total number of items except for “NA.” The maximum possible score for an article that completely reported all information was 100.

2.3 Data Analysis

Descriptive statistical analysis, including frequency and percentages, was used to describe the characteristics of the included studies. The differences in characteristics between the Chinese and English studies were compared using the chi-squared test.

In addition to language, the scores of articles in different categories were compared according to the type of intervention, first-author affiliation, and funding. Different types of intervention mean that studies belong to different research fields and have different professionalism and writing logic requirements. Different research institutions have different research policies, organizational supervision, and research resources. Authors from universities, hospitals, or Centers for Disease Control (CDCs) also work in differing research fields. Although many articles are jointly completed by authors from different institutions, the contribution of the first author is relatively higher. The availability of research resources affects the quality of scientific research reporting; this can be seen in the conclusions of studies with and without funding. The differences in scores between these different categories were compared using the rank-sum test, and we used Spearman’s rank correlation coefficient to test whether the factors were related. All statistical analyses were conducted using SPSS version 25.

3 Results

After screening, 310 studies were included in the review. Of these, 200 were sampled studies in Chinese [4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203], 57 were the most-cited studies [6, 16, 33, 133, 264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316] in Chinese, and 57 were published in English [204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260]. There were four duplicates between the included sampled studies and the most-cited studies in Chinese. Figure 2 shows the number of included studies published in each year. The number of included most-cited studies in Chinese and studies in English increased significantly after 2012.

Fig. 2
figure 2

Number of included studies published in each year

Figure 3 shows a comparison of the number and overall score trends between the included health economic evaluation studies in Chinese and English published between 2006 and 2015. The scores of both the included studies in English and the included most-cited studies in Chinese showed a steady growth trend, whereas the scores of the included studies in Chinese did not change significantly over time. The overall quality score of the included studies in English was always higher than that of the Chinese studies.

Fig. 3
figure 3

Scores and time trends of included studies

The study compared the characteristics of the included studies (Table 1). The most common type of analysis was cost-effectiveness analysis (CEA; 82.26%). Regarding the choice of outcome measures, most of the studies in Chinese chose clinical endpoints (sampled studies 96.50%, most-cited studies 70.18%) to measure health outcomes. Nearly half of the sampled studies in English chose quality-adjusted life-years (QALYs) (42.11%) as health outcome measurements, followed by index clinical endpoints (38.60%). The time horizon of the included studies in Chinese was most commonly ≤ 1 year (sampled studies 85.50%, most-cited studies 61.40%). Nearly half of the included studies in English also had time horizons of ≤ 1 year (43.86%). The chi-squared test showed that the differences in time horizons of the included studies were statistically significant (χ2 = 92.82 [df = 6, N = 314], p < 0.001), and pairwise comparisons between these three study groups also showed statistically significant differences.

Table 1 General characteristics of the included studies

Overall, in terms of types of interventions, the included studies mostly focused on pharmaceuticals (48.64%). However, although most of the included sampled studies in Chinese focused on pharmaceuticals as the intervention (60.50%), most English studies focused on clinical treatments (64.91%) such as surgery and choice of clinical treatment plan. The rank-sum test showed that scores of studies focusing on clinical treatment were higher than those focusing on pharmaceuticals (mean rank 170.56 vs. 136.44, p = 0.010).

In terms of funding, if the article made no mention of funding, we assumed it was not funded. Most included studies in Chinese were not funded (sampled studies 93.50%, most-cited studies 52.63%), whereas only 19.30% of studies in English were not funded (χ2 = 136.79 [df = 2, N = 314], p < 0.001), and pairwise comparison between these three study groups also showed statistically significant differences. The rank-sum test showed that studies with funding scored higher than those without funding (mean rank 207.91 vs. 126.67, p < 0.001). According to the affiliations of each study’s first author, most came from the hospital sector. The first author of the Chinese studies were mostly affiliated with hospitals (sampled studies 92.50%; most-cited studies 68.42%); however, few first authors of English studies were affiliated with hospitals (33.33%, χ2 = 100.56 [df = 4, N = 314], p < 0.001). The rank-sum test showed that studies with the first author from a university scored significantly higher than those with a first author from a hospital (mean rank 218.47 vs. 138.03, p < 0.001). There was a correlation between the first-author affiliation and funding (p < 0.001) and between first-author affiliation and intervention type (p < 0.001).

Table 2 presents an assessment of the quality of the included studies. The average quality score of the included studies was 56.59 ± 16.90. Scores varied greatly among the different CHEERS items. Most of the included studies were of good quality in terms of title (95.32), choice of health outcomes (90.32), model-based estimating resources and costs (83.33), and setting and location (83.50). However, the included studies rated poorly on some items, especially those about the choice of model (18.02), assumptions (18.77), and discount rate (17.90).

Table 2 Quality of reporting of health economic evaluation in mainland China per item of the CHEERS checklist

There was a distinct gap between the average quality scores of the included sampled studies in Chinese (49.78 ± 9.31) and those in English (82.48 ± 17.69). Similarly, a gap existed between the average quality scores of the included most-cited studies in Chinese (54.08 ± 10.27) and those in English, which was slightly narrower. The average quality score of the included studies in Chinese was lower than that of the English studies for most items. The average quality scores of the included sampled studies in Chinese were only higher than those in English for the following items: title (98.75 vs. 91.23), synthesis-based measurement of effectiveness (83.33 vs. 66.67), and estimating resources and costs (single study-based 61.86 vs. 41.46 and model-based 90.00 vs. 84.78). The included most-cited studies in Chinese did not have synthesis-based estimates. The average quality scores of the included most-cited studies in Chinese were only higher than those in English for the item estimating resources and costs (single study-based 81.52 vs. 41.46).

Figures 4, 5 and 6 show the comparison of the proportion of the included studies in Chinese and English scored as completely adequate, partially, or not at all based on CHEERS items. The most frequent partially or not reported items were “conflicts of interest.” Most of the included sampled studies (0.75) and most-cited studies (1.75) in Chinese did not describe any potential conflicts of interest of study contributors, whereas almost three-quarters of the included studies in English (87.72) did. The discount rate was stated in 5% of the included sampled studies in Chinese, 19.30% of the included most-cited studies in Chinese, and 66.67% of those in English. Although it is common practice to not apply discount rates for economic evaluations with short time horizons (e.g., < 1 year), researchers are encouraged to report this rate as 0% for clarity [261]. Additionally, fewer included sampled studies (5%) and most-cited studies (22.81%) in Chinese used model-based estimating than did studies in English (80.70%). The average quality scores in terms of model-based characterizing heterogeneity were much lower for the included sampled studies (55.00) and most-cited studies (69.32) in Chinese than for those in English (73.08). In addition, when describing the relevant content of the decision-analytic model, the scores of the included sampled and most-cited studies in Chinese were much lower than those in English.

Fig. 4
figure 4

Quality of reporting of Chinese-language health economic evaluation sampled studies in mainland China per item of the CHEERS checklist. CHEERS Consolidated Health Economic Evaluation Reporting Standards, NA not applicable, No not reported, Part partially reported, Yes reported

Fig. 5
figure 5

Quality of reporting of Chinese-language health economic evaluation most-cited studies in mainland China per item of the CHEERS checklist. CHEERS Consolidated Health Economic Evaluation Reporting Standards, NA not applicable, No not reported, Part partially reported, Yes reported

Fig. 6
figure 6

Quality of reporting of English-language health economic evaluation studies in mainland China per item of the CHEERS checklist. CHEERS Consolidated Health Economic Evaluation Reporting Standards, NA not applicable, No not reported, Part partially reported, Yes reported

4 Discussion

The quality of reporting of health economic evaluations in mainland China has developed slowly. Although the average scores of the included most-cited studies in Chinese were higher than those of the sampled studies in Chinese, both scores consistently differed widely from those of the included English studies.

The overall quality score of the included studies in Chinese was low. Many of the included studies in Chinese provided only a simple description of the cost and outcome indicators and did not fully describe the design features of the single effectiveness study or the reasons why the single study was a sufficient source of clinical-effectiveness data. As the time horizon of clinical events and healthcare interventions and their consequences are important aspects of health economic evaluations [262], they all affect estimated costs and results [263]. The choice of time horizon and the methods for adjusting estimated unit costs can be challenging for many analysts and policy makers, many of whom may not have strong familiarity with economic concepts. Researchers have different backgrounds and clinical experience and therefore different understanding and ability to apply health economic and model methods. In addition, most of the included studies in Chinese only summarized their key findings in the discussion but did not describe the limitations of the study. As a result, most of the included studies in Chinese scored poorly for the quality of methods, results, and discussions.

The quality of reporting of the included studies was associated with the first-author’s affiliation and the intervention type. The first-author’s affiliation was also related to intervention type. This may be because researchers from hospitals have good knowledge of and experience in clinical medicine and pharmacy and researchers from universities have relatively solid theoretical knowledge. Those working in CDCs and other institutions also have theoretical knowledge and practical experience in epidemiology and public health. Health economic evaluations involve the application of multiple disciplines in the field of health and are closely related to medicine, hygiene, demography, and sociology. Increasing cooperation between personnel in various professional fields as well as multidisciplinary communication would help improve the quality of health economic evaluation research.

Whether the source of funding was indicated was significantly related to the quality of reporting. Health economic research is resource consuming and requires a certain amount of human, financial, and material resources. Financial guarantees could enable research to proceed smoothly. A lower proportion of Chinese studies indicated funding status, which may also be the reason for the lower quality score.

The differences in scores between the included studies in Chinese and English also reflected that editors of Chinese journals need to consider adopting improved quality standards. The CHEERS guidelines have good operability and are freely available for editors and peer reviewers. The quality of studies in Chinese would be improved if editors of and peer reviewers for Chinese journals were familiar with the CHEERS checklist items, evaluated according to the higher requirements of the CHEERS guidelines, and urged authors to complete their studies to a higher quality.

4.1 Study Limitations

This review has some limitations. CHEERS is a qualitative instrument used to measure the quality of reporting rather than a quantitative instrument. We tried a quantitative approach based on CHEERS but it may have potential bias.

The period of the search was from 2006 to 2015. We acknowledge that the quality of health economic evaluations in mainland China may have improved between 2016 and 2021.The year 2015 is the final year of China's Twelfth Five-Year Plan and 2016 is the first year of the Thirteenth Five-Year Plan. The year 2016 was also when medical and health industry reforms began to deepen. In 2016, China carried out inspections and reforms on medical insurance payments, drug prices, and medical service provision. According to the reform plan, China would establish a basic medical and health system covering urban and rural residents by 2020. These events would certainly have had an impact on the quality of reporting of health economic evaluations in mainland China. However, we believe that the gap in the quality of reporting between Chinese and English studies, as well as some other common problems, would still exist after 2015. Therefore, our project is mainly divided into two stages. The first stage was to assess the quality of studies published from 2006 to 2015, which reflects the past situation. We mainly focus on horizontal comparison in this first stage. The second stage will be to assess the quality of studies published from 2016 to 2020, which will reflect the current situation. In this stage, we will mainly focus on the analysis of longitudinal time series.

The number of studies in Chinese that met the inclusion criteria was too large to review each study individually; therefore, the case-matching principle of epidemiological case–control studies was used for reference. That is, the matching ratio can be 1:1 or 1:2 but should not be greater than 1:4. The number of English studies that met the criteria was 57, so we chose 200 as the matching number of Chinese studies. Using the random seed method for sampling, the probability of each study being selected was equal. Therefore, we assumed that the 200 selected studies satisfactorily represented the quality of 6345 studies. We also acknowledge that the result may have been different had the sample size been larger.

5 Conclusions

The number of health economics evaluation studies in China increased steadily over the 10 years from 2006 to 2015. However, many problems with the research designs, presentation of results, and related explanations persisted. Most of the research content was incompletely presented, decreasing the reliability of the results. Therefore, standardizing and improving health economics research is of great significance for China.