Introduction

Rheumatoid arthritis (RA) is a chronic heterogeneous autoimmune disease characterised by painful joint inflammation which may cause destructive bone erosions [1]. It is reported to affect 0.1–2.0% of the population worldwide. The aetiology of RA remains poorly understood, and despite recent therapeutic advances, there is no known cure [2,3,4,5]. RA is considered a multifactorial disease, where various genetic and environmental factors [6, 7] influence the prevalence of the disease across and within countries [8]. At population level, Australia has reported the highest RA prevalence (2%) worldwide [1], based on self-reported data from the 2014–2015 National Health Survey (NHS) [9]. At community level, American Indigenous populations have the highest reported RA prevalence of 5.3% and 6.8% for Pima [10] and Chippewa Indians [11]. In contrast, a low prevalence or even absence of RA has been reported in rural populations in South Africa (0.0026%) [12] and Nigeria (0%) [13].

The variations in currently available prevalence estimates can be attributed to differences in methodologies, such as case-ascertainment criteria; geographic residence area; socioeconomic position; and exposure to genetic and environmental factors [6, 8, 14]. An accurate estimate of RA prevalence will help determine the disease and economic burden of care for RA patients and inform health policy to reduce the burden of this disease [15, 16] and provide information for health care resource allocation [17, 18]. A systematic review and meta-analysis of RA prevalence data will partially aid current and future planning processes [19, 20].

There has been only one attempt to estimate the global prevalence of RA, through a systematic review [21], but no meta-analysis has been published on this topic. The study by Cross et al. [21] has several limitations, including an outdated review of the literature (1980–2009), lack of publication bias assessment, and use of mathematical modelling for missing data for certain regions. These limitations have raised questions regarding the validity of the global estimates of RA prevalence. Therefore, we conducted a meta-analysis to (a) determine the global prevalence of RA based on published population-based studies by estimating point-versus period-prevalence; (b) explore the most suitable methodology to investigate RA prevalence; (c) generate predictive intervals for RA prevalence for future studies.

Methods

Study selection

We conducted a meta-analysis of all published peer-reviewed studies on the prevalence of RA from 1 January 1980 through 26 June 2019. The timeframe was selected to estimate and account for changes in trends in reporting prevalence data due to significant revisions of RA classification criteria that may have affected the reported incidence and prevalence [22, 23]. We were guided by the Joanna Briggs Institute (JBI) guidelines for conducting a systematic review of prevalence data [19] and Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2009 recommendations [24]. The protocol was registered in the JBI systematic review and meta-analysis register [25]. A study list of key terminology, corresponding definitions, and PRISMA checklist can be found in Online Resources 1 & 2.

All authors were involved in the development of the search strategy, the eligibility criteria, and data extraction sheet—the first author who run the search in multiple databases. The search strategy was tested and reiterated several times for search completeness with support of senior supervising author (C.I) and senior librarian (SB) to identify extra possible search items and synonyms that can be found in relevant studies based on PRISMA 2009 recommendations.

All title of papers and abstracts were examined and assessed for relevance to answering the main questions of the meta-analysis. The full texts of potentially relevant articles were obtained. The methodological quality of the studies was assessed for risk of bias of prevalence studies by one independent reviewer (KA) and verified by the senior supervising author (C.I) before inclusion. We extracted the following variables from included articles (author, year of publication, continents, country, prevalence proportion, prevalence methodology, data source, RA classification criteria, geographic population settings, and income country settings).

Search strategy

A search was undertaken of the electronic databases, including ProQuest Central, MEDLINE (Ovid), Web of Science, and EMBASE (Ovid) using the relevant medical subject heading search terms and keywords. Our search terms and search strategy for each database can be found in Table 1 and Tables S2–S5 in Online resource 3.

Table 1 Keywords used to identify relevant studies

Inclusion and exclusion criteria

Studies were included if they provided adequate information to calculate point- and/or period-prevalence for RA, published between 1980 and 2019. Studies were included if they met the following inclusion criteria: (a) the participants were representative of the adult populations based on country reference populations using the World Health Organisation [26] and the United Nations data repository [27]; (b) the participants had clinically verified RA or met one of the published RA classification sets [22, 23]; (c) residents in a defined country; or (d) lived in defined geographic population settings.

We excluded studies that (a) had participants aged under 16 years; (b) only presented prevalence estimates based on subsets of a population or communities by age range, sex, or ethnicity; (c) had fewer than 300 participants; (d) were volunteer participants or participants with self-reported RA diagnosis without clinical confirmation; (e) comprised RA prevalence studies from outpatient clinics, residential homes, or hospitals; (f) were published in a language other than English; (g) comprised non-research papers including letters and editorials, narrative, systematic and seminar reviews, case studies, series were reporting cases or abstracts; (h) included capture–recapture studies or disease model studies.

Risk of bias assessment

The methodological quality of the studies was assessed using the Hoy et al. tool for risk of bias of prevalence studies [28]. The details of the risk bias assessment method are presented in Online Resource 4.

Data analysis

Pooled estimates of the prevalence of RA in population-based studies were calculated using the random-effects meta-analysis model [29], due to the anticipated heterogeneity that results from the difference in methodological approach, geographical location, diagnostic criteria, data sources and geographic settings [30, 31]. Statistical analyses were performed using R (V.3.6.1) with 'meta' packages of R [32]. The Freeman–Tukey double-arcsine transformation was used for variance stabilisation of proportions before pooling the data with the random-effects model [33]. The meta-prop command was used to generate forest plots of pooled prevalence with 95% confidence intervals (CI) and 95% prediction intervals for future RA prevalence estimates [34]. Heterogeneity was assessed using I2 with thresholds of ≥ 25%, ≥ 50% and ≥ 75% indicating low, moderate and high heterogeneity, respectively [35]. To assess the robustness of our results, a sensitivity analysis of the pooled estimates of RA prevalence was pooled based on influence analyses including leave-one-out analyses, risk of bias assessment for studies, influential outliers, and residual analyses. A cut-off value of z score > 2 in absolute value was considered as potential outliers and verified through Baujat plot [36]. To assess if the pooled prevalence of RA was increasing over time, we stratified the cohort studies into two periods: (a) January 1983 to December 2000, and (b) January 2001 to December 2018. Subgroup analyses were performed and defined by prevalence methods, sampling methodology, geographical location, RA classification criteria, data sources, participation rate ≥ 75%, Community Oriented Program for Control of Rheumatic Diseases (COPCORD) studies, geographic population settings, countries’ income settings, risk of bias assessment levels. Correlation analyses were performed using Spearman's rank correlation to investigate the association between study variables and prevalence estimates. Also, the univariate and multivariate meta-regression were conducted using random-effects model with adjusted R2 to assess the impact of study characteristics on pooled prevalence estimates. For publication bias analysis, funnel plots were produced to explore the possibility of publication bias due to preferential publication of prevalence reports with positive findings and amongst small studies that reported high prevalence estimates [37]. The Begg–Mazumdar, Egger, and Harbord tests of publication bias were performed with P < 0.05 as an indication of publication bias [38].

Results

Selection studies

The preliminary search identified 1821 relevant articles (Fig. 1). We identified a total of 650 papers on ProQuest, 588 papers on Medline, 468 papers on Web of Science, and 115 papers on EMBASE. After these titles and abstracts were screened and removing the duplicates, 143 papers were examined in detail. After applying inclusion and exclusion criteria, 57 papers were deemed to have relevant to the systematic review and meta-analysis. Additional 20 more references were located from the reference lists of the included papers, and only three studies from them met the inclusion criteria. In the final refinement of the research, we identified a total of 60 population-based studies. Six of these population-based studies had multiple cohorts, and each cohort was recognised separately during analysis (Table S6 in Online Resource 5). The total number of cohort studies included in the analysis was 67. The studies that were excluded, along with reasons for the exclusion of each paper, are presented in Online Resource 6.

Fig. 1
figure 1

Preferred reporting items for systematic reviews and meta-analyses (PRISMA) flow diagram of search and study selection process for prevalence studies of RA [24]

Study characteristics

The sixty-seven cohort studies originated from 41 countries and contained 742,246 RA patients and 211,592,925 healthy controls (Fig. 2 and Table S6 in Online Resource 5). No study examining RA prevalence from Oceania met the inclusion criteria. Among 32 studies using a point-prevalence method, the point-prevalence of RA varied from 0.0% to 2.7% for period 19862014, with a mean point-prevalence of 0.56% (SD = 0.51). While studies using a period-prevalence method (n = 35) were estimated with a mean period-prevalence of 0.51% (SD = 0.35) for period 19552015. In comparison with mean prevalence estimates derived from World Health Organisation report in (1980–2000) and our current study (1980–2019), there was no difference for estimations of RA prevalence (Table S7 in Online resource 5).

Risk-of-bias assessment

The risk of bias was low in 59 studies (88%), moderate in eight studies (12%) [13, 39,40,41,42,43,44], with no high-risk of bias across studies (Online Resource 7). In six out of eight studies with a moderate bias risk, there was no preliminary validation study for diagnostic codes used to estimate RA prevalence. In the two remaining studies, zero and low RA prevalence were reported in two population-based survey studies from Nigeria [13] and Thailand [39]. All studies with a moderate risk of bias were included in our analysis as they may represent the real clinical picture and the nature of the RA disease.

Publication bias assessment

We assessed the publication bias by funnel plot (Online Resource 8) which showed some asymmetry. We also found significant bias using Begg–Mazumdar (P = 0.022) and Egger tests (P = 0.039), but no significant publication bias using the Harbord test (P = 0.4175). When studies were grouped by prevalence methods (Table 2), there was no significant publication bias for the point- and period-prevalence.

Table 2 Prevalence of RA according to different categories

Data synthesis

Sensitivity analyses

We checked the global RA pooled prevalence through three sensitivity analyses (Online Resource 9). Based on a leave-one-out method, there was no significant difference in either the global pooled prevalence or the heterogeneity. The pooled prevalence of RA was between 0.44% (P = 0.976) and 0.47% (P = 0.992), while the heterogeneity was similar (I2 = 99.9%, P value of the Q statistics (P < 0.001). Including and excluding studies with moderate risk bias assessment studies [13, 39,40,41,42,43,44], showed no significant difference in global pooled prevalence (P = 0.984). Finally, included and excluded cohort studies with potential outliers. The Baujat plot identified five studies with z score > 2 [13, 43, 45,46,47], but there was no significant difference in global pooled prevalence summary results before and after excluding the extreme outliers and residuals (P = 0.976).

Global pooled prevalence analyses

The global prevalence of RA by the random-effects model (Fig. 3 in Online Resource 5) was 0.46% (95% CI 0.39–0.54; I2 = 99.9%) with a 95% prediction interval (0.06–1.27). Similarly, the overall pooled point-prevalence was 0.45% (range 0.38% and 0.53%) between 1986 and 2014, while the pooled period-prevalence was 0.46%, (range 0.36% to 0.57%) from 1955 to 2015. Based on the I2 index and p val.Q, heterogeneity was higher in period-prevalence studies (I2 = 100%, p val.Q < 0.001) than point-prevalence studies (I2 = 86.2%, p val.Q < 0.001).

Pooled prevalence over time

There was a period-prevalence cohort study completed before 1982 in Finland, with a prevalence of 1.9% [45]. The global pooled point-prevalence of RA was increased from 0.44% (95% CI 0.29–0.63; I2 = 88.3%, p val.Q < 0.001) during the period from 1983 to 2000 to 0.48% (95% CI 0.39–0.58; I2 = 82.6%, p val.Q < 0.001) in the 2001–2008 period. At the same time, the global pooled period-prevalence decreased from 1.9% (95% CI 1.59–2.23) before the 1982 period [45] to 0.41% (95% CI 0.24–0.62, I2 = 99.6%, p val.Q < 0.001) between 1983 and 2000. Then it was increased to 0.45% (95% CI 0.33–0.59, I2 = 100%, p val.Q < 0.001) in the period between 2001 and 2018. The I2 index for heterogeneity across point-prevalence studies was reduced from 88.3% to 82.6% over two separate periods: 1983–2000 and 2001–2018, respectively. The 95% prediction interval for global pooled point-prevalence of RA was (0.15–0.91%) and narrower, as compared to global pooled period-prevalence (0.04–1.31%).

Sub-group analyses

The sub-group analyses were defined by prevalence methods, the study's stratified periods, sampling methodology, geographical location, RA classification criteria, data sources, geographic population settings, country income levels, risk of bias assessment, the participation rate for population-based-survey cohort studies, and COPCORD studies. The details are summarised in Table 2 and Online Resource 9).

High heterogeneity existed following sub-group analysis with I2 ≥ 75% and p val.Q < 0.001, for all defined sub-groups except for modified ARA criteria (I2 = 60%, p val.Q = 0.019) and registry data (I2 = 60%, p val.Q = 0.080), which represented moderate levels of heterogeneity. Also, there were no significant differences in RA pooled prevalence between all sub-groups except for geographical location; the risk bias assessment level; period-prevalence method and urban population setting over the stratified periods (Table 2). The 95% prediction intervals for future RA prevalence estimates of different subgroups are presented in (Table 3).

Table 3 The 95% prediction intervals for future RA prevalence estimates of different sub-group category

Pooled prevalence by sampling methodology

Sub-group analysis by sampling methodology showed pooled prevalence of 0.48% (95% CI 0.40–0.57%) and 0.42% (95% CI 0.30–0.57%) for sampled population studies and population database studies, respectively.

Pooled prevalence by continents and across countries

Meta-analysis showed the pooled prevalence of RA was the highest in North America at 0.70% (95% CI 0.57–0.86), followed by Europe at 0.54% (95% CI 0.50–0.59) and Africa at 0.52% (95% CI 0.00–1.74). The lowest prevalence of RA was in Asia at 0.30% (95% CI 0.23–0.37) and in South America at 0.30% (95% CI 0.09–0.62). Based on the country level, the highest RA prevalence was reported in Cuba at 2.67% [46], followed by Finland at 1.90% [45], then Lesotho at 1.8% [47]. The lowest RA prevalence was reported in Nigeria at 0.0%, followed by Taiwan (0.05–0.12%), then Thailand (0.12%). The lowest heterogeneity was observed in South America (I2 = 86.4%) in two studies from Brazil [48] and Argentina [49] with RA prevalence of 0.46% and 0.20%, respectively.

Pooled prevalence by classification criteria

The highest RA pooled prevalence was 0.58% (95% CI 0.03–1.68) using ARA criteria 1956 [50] derived from three cohort studies: one study from Lesotho and two cohort studies from Indonesia [47, 51]. Meanwhile, the lowest RA pooled prevalence was 0.30% (95% CI 0.130.53), which diagnosed using the Rome criteria 1961 [52] in one study from Japan [53]. The most commonly utilised RA classification criteria were the revised ARA criteria 1987 [54] which were used in 37 cohort studies, with an RA pooled prevalence of 0.40% (95% CI 0.350.46). Moreover, the pooled prevalence of cohort studies based on the modified ARA criteria 1987 was 0.41% (95% CI 0.320.50) in a combined analysis of seven cohort studies from five countries: Turkey [55, 56], Italy [57, 58], Sweden [59], Spain [60], and Denmark [61].

However, the cohort studies using verified clinical diagnoses by a doctor (n = 19) reported a considerably higher RA pooled prevalence 0.55% (95% CI 0.450.67) compared with cohort studies using the revised ARA criteria 1987 and the modified ARA criteria 1987. The modified ARA criteria 1987 cohort studies have the lowest heterogeneity (I2 = 60%) in the sub-group analyses, but non used ACR/EULAR 2010 criteria to estimate the RA prevalence.

Pooled prevalence by data sources

The highest pooled estimate of RA prevalence was 0.69% (95% CI 0.47–0.95) and observed in four cohort studies using linked data from three countries, namely Canada [62, 63], Italy [64] and Sweden [65]. While the lowest RA pooled prevalence was 0.37% (95% CI 0.25–0.51) and estimated from 14 cohort studies using administrative data (Table S6 in Online Resource 5).

Pooled prevalence by geographic population settings

The highest pooled prevalence was observed in the urban populations, with 0.48% (95% CI 0.42–0.57), which were reported by 25 cohort studies, while the lowest pooled prevalence was in rural populations 0. 36% (95% CI 0.21–0.53), which was estimated from 12 cohort studies. The mixed populations derived from 30 cohort studies were reported close to the urban populations, with pooled prevalence estimates of 0.48% (95% CI 0.38–0.58). The lowest heterogeneity was noted in rural populations (I2 = 86.7%). There was a significant difference between RA prevalence among urban populations in 1986–2000 and 2001–2018, with a pooled prevalence of 0.39% (95% CI 0.28–0.50) and 0.70% (95% CI 0.63–0.77), respectively.

Pooled prevalence by income country classification

The pooled prevalence of RA was higher among cohort studies from high-income countries (0.49% [95% CI 0.39–0.59]) compared to all studies from upper-middle-income countries (0.47% [95% CI 0.34–0.62]) and lower-middle-income countries (0.35% [95% CI 0.20–0.53]).

Pooled prevalence by risk bias assessment level

The sub-group analyses were performed for low- and moderate-risk bias studies to explore the impact of different levels of risk bias on the pooled prevalence. The pooled prevalence for low-risk cohort studies (n = 59) was 0.51% (95% CI 0.46–0.58), while the pooled prevalence of moderate risk cohort studies (n = 8) was 0.17% (95% CI 0.07–0.31).

Correlation analyses

Spearman's rank correlation was used to assess the association between prevalence estimates and study variables (Table S10 in Online Resource 9). It was significant only between RA prevalence and geographical location (r = 0.42, P < 0. 001) and the risk of bias assessment (r = 0.40, P < 0.001).

Meta-regression

Univariate and multivariate meta-regression analyses were performed to explore potential sources of between-study heterogeneity (Table S11 in Online Resource 10). In the univariate analyses, the geographical location were the potential sources of studies' heterogeneity (65.34%, P < 0.001) and the risk bias assessment level contributed to 29.57% of the overall variance (P < 0.001).

In the multivariable meta-regression model that explained 92.44% of between-study heterogeneity (Table 4), three variables were found to be associated with the heterogeneity: publication year (P = 0.001), sample size (P < 0.001), and the risk of bias assessment (P < 001).

Table 4 Multivariable model of meta-regression for the prevalence of RA

According to these results, we built a second multivariable model that included geographical location, sample size, and the risk of bias assessment (Online Resource 10), which was able to explain 93.48% of the studies' heterogeneity (QE = 12,545.8, df = 60, P < 0.001).

Discussion

Using meta-analysis, we found a global prevalence of RA of 0.46% in the period 1980 to 2018 which is nearly two times higher than estimated RA prevalence of 0.24% by the global burden of disease study [21, 66]. Our data align better with the studies of Cross et al. [21] and Safiri et al. [66], even though the RA prevalence was underestimated in their studies due to including higher number of self-reported studies (52 out of 56) in their systematic review composition. In the absence of a detailed risk bias assessment, it is not clear if included studies by Cross et al. were truly representative of the target population with half of studies (n = 30, 52%) were rated as high risk of bias studies. In our meta-analysis composition, we included only population-based survey studies (n = 46) to obtain accurate global RA prevalence estimation and with our stringent inclusion criteria found no high-risk bias studies in our data. Other methodological issues in Cross et al.’s study were lack of publication bias assessment, not considering other type of prevalence data resources (i.e. linked data), and using mathematical modelling to replace missing data from certain regions including Oceania. The Safiri et al. [66] study estimated an RA prevalence of 0.13% in Oceania, while Cross et al.’s study estimated a prevalence of 0.09% in male and 0.25% in female patients in this region. Both prevalence estimates are considerably lower than reported from Australia [9, 67].

In Australia, the estimated prevalence of RA in Australian populations was 2%, based on self-reported data as part of the NHS (2014–2015) [9], which made Australia have the highest RA prevalence in the world [1]. However, the data sources were not specificity, and they may have overestimated RA prevalence as there was no case verification by clinical examination or one of the RA classification criteria [68]. Recently, Australian national primary health care database (Medicine Insight) reported a RA prevalence of 0.8% (95% CI 0.8–1) for 2000–2016 period [67], which is higher than our global prevalence estimation. However, RA prevalence estimation in that study was severely biased, due to unexplained inclusion of patients with polymyalgia rheumatica.

We found no significant difference between point- and period-prevalence pooled estimates for RA, but heterogeneity between studies was considerably lower (13.8%) for point-prevalence studies, suggesting between-method differences can lead to false publication bias in the funnel plot (Online Resource 8). The RA point- and period-prevalence increased by 9% and 9.75%, respectively, over time, consistent with previously published data [66, 69].

The global point-prevalence of RA was slightly increased (7.4%) from 229.6 per 100,000 population in 1990 to 246.6 per 100,000 population in 2017 [66]. The increase in RA prevalence trend due to included different populations (i.e. USA and Taiwan) with different data sources (administrative datasets) in the study [66] compared with the previous study [21]. Similarly, Minichiello et al.’s study [69] found increase in RA period-prevalence trend from 0.62% in 1995 to 0.72% in 2005 in two studies from western populations, although there are only a limited number of prevalence studies included in the study (n = 3) [69]. While our RA global pooled prevalence trends were explored in 67 prevalence studies by prevalence methods over the last four decades.

In urban populations, the global RA pooled prevalence increased from 0.39% for the period 1983–2000 to 0.70% in the period 2001–2018. It can be attributed to changing health practices in RA diagnosis in the last two decades, or to improved access to cooperation between primary and specialist health care in urban areas compared with rural areas [70, 71].

In contrast, the global pooled prevalence in rural populations did not change in the stratified periods, although confounded by poor health care access, RA misdiagnosis, low income, less education and low life expectancy [51, 72,73,74,75,76].

In the present study, the pooled RA prevalence from sampled population studies was higher than database studies, which might arise from weaknesses in sampled population studies such sampling time frame, size, and participation rate [77,78,79]. In contrast, the databases' population studies mainly underestimate RA prevalence related to the extent of the observation period and data source accuracy in the published studies [80]. In our study, there was no significant difference in RA pooled prevalence between these two types of studies. These results may be attributed to our stringent inclusion criteria and clear definitions with robust assessment for bias and heterogeneity.

There was a significant impact of geographical location on pooled RA prevalence with highest pooled prevalence in North America, specifically Cuba (2.7%), reporting the highest point-prevalence in the world. However, that estimate does not fall within our predicted global prevalence interval (0.05–1.26%), which may be due to the small Cuban sample size (300 participants). We observed the same in Lesotho's study with a small sample of 1070 participants. Meta-regression analysis found sample size acted as a suppressor variable similar to a study from Finland, where RA prevalence was 1.9% with a sufficient sample size (n = 7124) [45]. As there is no diagnostic test for RA, classification criteria provide the best clinical guidance. The 1956 ARA criteria produced the highest RA pooled prevalence (0.58%), but these criteria lacked specificity for RA [22], while the revised 1987 ARA criteria are much more specific [81]. In the present study, the modified 1987 ARA criteria demonstrated the lowest heterogeneity among RA classification criteria, despite the prevalence data coming from different data sources, sample sizes, geographic population settings, and countries. These results illustrate the consistency of the modified 1987 ARA criteria compared with the revised 1987 ARA, with enhanced sensitivity of modified 1987 ARA criteria. These findings were congruence of previous studies [57, 59], where the modified 1987 ARA criteria was used in parallel with traditional criteria. Although the latest ACR/EULAR 2010 criteria were established to improve the sensitivity of RA classification a decade ago, none used these criteria to estimate the RA prevalence.

The predictive interval for the clinical diagnosis by a doctor was double that for the revised and modified 1987 ARA criteria, attributable to doctor’s experience, preferences, and training [82]. However, there was no significant difference in pooled prevalence between all RA classification criteria, based on univariate meta-regression analysis.

The highest RA pooled prevalence estimate was 0.69% derived from linked data in high-income countries. This illustrates the advantage of using linked data to improve case ascertainment from multiple health care settings, including rheumatology clinics, emergency departments, and inpatient facilities. The RA diagnosis in all included linked data studies was confirmed by a rheumatologist, which adds credibility to the presented data and confirms that diagnostic accuracy is an essential requirement to minimise measurement errors and misclassification.

It is noteworthy that the prevalence of RA using registry data showed the lowest heterogeneity compared with linked data, suggesting its superior precision and reliability. However, registry data have limitations compared with linked data, including selection bias, a lack of control group, randomisation, and bias by indication [83]. Therefore, registry data may underestimate the true prevalence of RA as observed in administrative data, for example, when general practitioners treat RA patients or the disease remains undetected [65].

Implications

The findings of this meta-analysis suggest that for accurate estimates of RA prevalence, both point- and period-prevalence data are similar, but period-prevalence allows for estimation fluctuation over time. Cross-linking of different data sources with a long follow-up period leads to the highest RA prevalence estimations as it better captures RA patients' life course. Linked databases appear to provide most accurate estimate of RA prevalence due to improved case of ascertainment from different data resources especially if RA diagnosis verified by a rheumatologist. The doctor diagnosis or use of newer validated diagnostic criteria is essential for more accurate diagnosis, informing RA prevalence.

The difference in RA prevalence based on geographical location suggests that genetics and/or environment may be important factors in disease development [1, 8]. With an increasing RA prevalence noted in urban, but not rural environments, factors associated with urbanisation (e.g. pollution) may be essential for RA development and warrant identification and modification.

Strengths

The strength of this meta-analysis is that several moderators were applied to reduce bias, and sources of heterogeneity were identified. Possible causes for the high heterogeneity in the present meta-analysis were explored using sensitivity analysis, publication bias analyses, sub-group analysis, and meta-regression analyses. Strict inclusion criteria, precise definitions, and robust assessments of biases were applied to improve the generalizability of the findings. The robustness of global pooled prevalence results was not influenced by any single study, moderate bias studies, or potential outliers and residuals. The findings of using different sensitivity analysis methods suggested robustness of pooled prevalence proportion estimates. Heterogeneity of included studies was identified and described, and different approaches with predictive intervals for estimated prevalence reported to inform the design of future studies.

Limitations

Limitations of this study were not using Scopus to retrieve non-English publications and not searched abstracts of major international rheumatology conferences to identify unpublished articles [84]. Also, non-representation from Oceania and the small sample size of some studies which will affect global estimation. The limited number of studies restricts statistical power in sub-group analysis. The continental data also may be biased due to the dominance of studies from Europe and Asia and the paucity of studies from Africa and South America. Thus, a true reflection of RA prevalence for these regions may interfere with generalizability. In our study, there were limited available studies from low-income countries and studies using old RA classification criteria (e.g., 1956 ARA criteria and 1961 Rome criteria), registry data, and linked data. Heterogeneity was higher, and some heterogeneity tests have been shown to lack power to detect publication bias in these studies. Also, the 95% prediction intervals for these studies were not reported due to unavailability and to avoid incorrect prediction intervals. In six out of eight studies with a moderate bias risk, there was no preliminary validation study for diagnostic codes in administrative data used to estimate RA prevalence, raising questions about the validity of data coding in the absence of adequate validity study. Nevertheless, the results of our study using sensitivity analysis, sub-group analysis and meta-regression to investigate the sources of heterogeneity and provided several useful pieces of information for health policymakers and practical insights for researchers planning future prevalence studies of RA.

Conclusions

The global prevalence of RA was 460 per 100,000 population from 1980 and 2018, with a 95% prediction interval (0.06–1.27%). There was no significant difference between global point- and period-prevalence in RA pooled prevalence estimates. RA prevalence estimates were influenced by geographical location, the risk bias assessment of studies, period-prevalence method and urban population setting over time. Linked data are the preferred data source to estimate RA prevalence due to complete case ascertainment. Conducting a preliminary validity study for linked data is warrant before interpretation and estimation of RA prevalence.