Introduction

China has experienced rapid economic growth and dramatic changes in demand for health care services [1]. Consumers, including those living in rural areas, request a wider scope and improved quality of health services. There has been increasing consensus about the importance of including subjective accounts of health in monitoring medical care outcomes in China [217].

Researchers have developed many patient-reported measures for assessing quality of health [2, 18, 19]. The majority of such instruments have originated from western countries and are more reflective of urbanized living contexts [217]. Despite the rapid urbanization and massive rural-to-urban migration, a large number of rural residents in China, especially the elderly and frail, are still living traditional rural lifestyles. These rural Chinese, which comprise 55% of the total population, often live in poorer socio-economic conditions and have poorer literacy skills compared to their urban counterparts. In spite of the economic gap, previous studies have revealed that rural residents had a better health-related quality of life than urban residents both physically and mentally [11, 20, 21]. However, research examining differences between rural and urban residents in their understanding and conceptualization of perceived quality of health is lacking. Therefore, the objective of this study was to explore the application of a health questionnaire based upon urban-living contexts to a Chinese population living a traditional rural lifestyle.

Methods

This study was conducted in a rural village with a population of more than 5,000 near Chengdu of Sichuan province. Villagers 18 years or older were invited to participate in the survey. The SF-36 was chosen as a measure of quality of life in this study, because it is one of the few measures that are applicable to the general population, and a Chinese version is available. It has been validated in urban populations, but not yet in rural populations [21].

Each respondent was interviewed by one trained interviewer in a private environment. Due to the low level of literacy skill of the participants, the interviewer read the informed consent form aloud and obtained oral consent from the participants. The SF-36 was administered by having the questions read out and asking the respondents to choose an answer and explain the reasons for their choice of answer. A total of 1,603 residents completed the survey.

Cronbach’s α reliability coefficients were computed, with greater than 0.7 being considered as acceptable [22]. A repeated SF-36 survey was undertaken 2 weeks after the first round of survey to 81 randomly selected respondents. An intraclass correlation coefficient (ICC) greater than 0.4 is considered acceptable [21].

The construct validity of the SF-36 was examined using exploratory factor analysis (principal components extraction with promax rotation). Based on the original structure of the SF-36, eight factors were anticipated to be extracted, including limitations in physical activities because of health problems (PF: 10 items); limitations in usual role activities because of physical health problems (RP: 4 items); bodily pain (BP: 2 items); general health perception (GH: 5 items); vitality (VT: energy and fatigue, 4 items); limitations in social functioning due to health problems (SF: 2 items); limitations in usual role activities because of emotional problems (RE: 3 items); and mental health (MH: 5 items) [23]. The extracted factors should explain at least 40% of the total variance. Each item should have the highest loading (>0.4) on its priori designations [22].

The changes in semantic meanings of the items of the SF-36 were identified through a comparison of the rank-order of item-cluster mean scores, item variances, and item-subscale correlations with the original assumptions. It was hypothesized that a semantic equivalent Chinese version of the SF-36 would not change the rank-order of item-cluster mean scores, and the items in the same subscales should have approximately equal variances and correlation coefficients with their underlying subscales [24].

The respondents’ explanations about their choices of answers were categorized and summarized. Particular attention was paid to the items with changes of semantic meaning identified in the quantitative analysis. A possible connection between the respondents’ explanations and the identified problems in validity were established through a group discussion involving key interviewers.

Results

Demographics

Of the 1603 respondents, 82% were full-time farmers, 31.8% had a maximum of 5 years of education, 15.3% were illiterate, By comparison, the study population was older and had poorer literacy skills than the national average [1].

Reliability of the SF-36

The Cronbach’s α reliability coefficients were acceptable, with only one subscale (SF) falling below 0.7. All of the subscales had an α coefficient greater than the subscale-subscale correlation coefficients (Table 1).

Table 1 Reliability of SF-36 in a rural Chinese population

The test–retest reliabilities were relatively low, in particular, for those measuring mental health. Half of the eight subscales (BP, SF, RE, MH) had an ICC below 0.6 (Table 1).

Validity of the SF-36

The exploratory factor analysis extracted eight factors, which explained more than 70% of the total variance. However, the composition of these eight factors was not in full accordance with the priori assignment of items to scales. All but six items had the highest loading (>0.4) on their priori subscales (Table 2). The exceptions were PF1, VT3, VT4, MH3, MH5, and SF1. PF1, an item measuring vigorous activities, had a secondary loading on its priori subscale (>0.4). The highest loading went to the eighth factor, on which four other PF items also had loadings that exceeded 0.4. The items that had been intended to measure three domains (VT, MH, and SF) fell into only two factors. VT3, an item measuring feeling worn out, and VT4, an item measuring feeling tired, had the highest loading on Factor 6, along with those items measuring SF and MH. Meanwhile, MH3, an item measuring feeling calm and peaceful, and MH5, an item measuring feeling happy, had the highest loadings on Factor 5, along with those items measuring VT (Table 2).

Table 2 Item loadings on factors extracted from exploratory factor analysis (n = 1603)

Semantic meaning equivalence

The profile of the SF-36 subscales of this rural population was consistent with that of the urban Sichuan population [21]. The gaps in the subscales measuring mental health were much larger between the urban Sichuan population and the rural Chengdu population than in the subscales measuring physical health (PF, RP, BP, and GH) (Fig. 1).

Fig. 1
figure 1

SF-36 mean subscale scores: rural Chengdu, urban Sichuan population

Similar item-subscale correlation coefficients and approximately equal variances of the items within subscales were demonstrated (Table 3). With regard to the rank-order of item-cluster mean scores, a few items violated the hypothesized order. The change in order of these items indicated a change in semantic meaning according to the explanation offered by the respondents.

Table 3 Order of item means in hypothesized subscales

PF: The respondents expressed difficulties in understanding the concept of walking distance. Although “mile” had been replaced in the Chinese version of SF-36 by “kilometer”, the rural residents were likely to interpret it as the Chinese measure “Li”, which means “half kilometer”. In addition, the concept of “Block” did not exist in the mind of the rural residents, most likely due to the fact that houses were not clustered into blocks in rural villages. Furthermore, participants in this study scored relatively higher on PF7, PF8, and PF9 compared to the original US validation sample, which is consistent with reports by rural residents that walking was considered as one of the easiest activities in their daily lives. When these three items were removed, the rest of the items formed a perfect fit into the hypothesized order of items.

RP and RE: The items measuring “accomplishment” (RP2 and RE2) had relatively higher mean scores and violated the item-cluster order. The respondents explained that their job as farmers was too volatile to establish a target. During quiet periods, they often were unoccupied, whereas during busy seasons, they felt they had to “accomplish” whatever was necessary regardless of their level of motivation.

GH: The items GH3 and GH5 had relatively higher mean scores compared to other items in this domain and violated the item-cluster order. The respondents explained that they would not say that they were “as healthy as anybody else” (GH3) when they felt either “better” or “worse” than others. Such an interpretation ignores the positive meaning of “healthy”. In Chinese, the term “healthy” is sometimes interpreted as a neutral term, similar to “health status”.

VT: The participants expressed difficulty in understanding VT1 “feel full of pep” and VT2 “have a lot of energy”. The terms “pep” and “energy” were unfamiliar and difficult for the rural residents to understand.

Discussion

Questions based on urban-living arrangements have led to confusion and misunderstanding among those living in rural areas. The mental components pose a particular challenge to the rural Chinese population [15, 25]. Although it is difficult to determine exactly how participant characteristics such as education level and living circumstances affect poor psychometric properties demonstrated in this study, it is certain that when a questionnaire is adapted for a new country/culture, it must be pilot tested with a representative sample of the general population of that country/culture.

Health-related quality of life measures perceived health. Different values of self-rated functions, which have often been found between urban and rural [20], are determined by not only the actual functions but also the life expectations and interpretation of study questions, even the mode of completing the questionnaires [26]. Even when researchers claim that a questionnaire has been “adapted” for a certain population, it is crucial that researchers carefully consider the questionnaire items and the adaptation process before finally deciding whether to use the measure.