Introduction

Measuring the health status of the population is of great relevance for the planning, design, implementation, and evaluation of interventions, as well as in the estimation of their impact on different health outcomes such as disability, frailty and mortality (OECD, 2018). To seek comprehensive indicators of health status beyond the presence or absence of diseases and / or mortality, various conceptual models of health related quality of life (Bakas et al., 2012; Fernández-Mayoralas & Rojo, 2005; Urzúa, 2010) and instruments to asses it have been developed. These incorporate the point of view or perception of health at the individual level, and their feelings towards their life experiences and the impact of these on their health and well-being (Cooke et al., 2016; McDowell, 2006; Ware, 1995). In this sense, within the medical literature and measures of health status, the constructs of quality of life (QoL) and health-related quality of life (HRQoL) were created as multidimensional measures of self-perceived health status (Karimi & Brazier, 2016; McDowell, 2006; Oksuz & Malhan, 2006).

At the individual level, HRQoL has to do with those aspects related to the perception of health experienced and declared by the person, particularly in the physical, mental, and social dimensions, and their general perception of health and its associated factors including social support, socioeconomic status, and health condition, among others. On the other hand, at the context level, it includes resources, policies and strategies that impact people's perceptions of health and their functional capacity (Bowling, 2001; Bowling et al., 2002).

To measure HRQoL, numerous scales and questionnaires have been developed with different dimensions and scopes (McDowell, 2006; Velarde-Jurado and Avila-Figueroa, 2002; Yanguas Lezaun, 2006). The SF-36 Health Survey, developed as a multidimensional measure to assess health statuses and outcomes from the patient's point of view, is an example. Designed for use in clinical practice, research, health policy evaluation, and in general and population-specific surveys (McDowell, 2006; Ware & Gandek, 1994; Ware & Sherbourne, 1992), it is one of the most widely used measures since 1985 (Hickey et al., 2005). Its predictive validity was first documented by the International Quality of Life Assessment Project (IQOLA), which translated, validated and adapted the SF-36 Health Survey in seven European countries, followed by its application in more than forty countries (Gandek & Ware, 1998; Ware & Gandek, 1998a, 1998b). In these studies, the SF-36 Health Survey showed high internal consistency and reliability and being suitable for use with different age groups and populations (Aaronson et al., 1992; Anderson et al., 1993; Bullinger et al., 1998; Leplège et al., 1998; Sullivan et al., 1995; Vilagut et al., 2005; Ware et al., 1998).

After those initial works, a large number of studies have been conducted to determine the feasibility of translating the SF-36 into other languages and populations. Reliability and validity have been established for the population of China (Lam et al., 1998; Li et al., 2003) Hong Kong (Lam et al., 2005); New Zealand (Scott et al., 1999), in Mexican Americans (Peek et al., 2004) and Chinese Americans in the United States (Ren et al., 1998), in Peru (Salazar & Bernabé, 2015), Jordan (Khader et al., 2011), Nigeria (Mbada et al., 2015), ethnic minorities in the Netherlands (Hoopman et al., 2006), among others (Maciel et al., 2018; Hajian-Tilaki et al., 2017).

In more recent years, HRQoL has been recognised as an important outcome measure used in the field of geriatrics and gerontology (Hickey et al., 2005). Studies have evaluated the reliability and validity of the use of the SF-36 Health Survey in older persons, and reported adequate psychometric properties in studies with older persons in China (Azen et al., 1999; Ran et al., 2017), Korea (Kim et al., 2013), Vietnam (Ngo-Metzger et al., 2008), Spain (Alonso et al., 1995; López-García, et al., 2003), the United States (Barile et al., 2016; Gandek et al., 2004), as well as in Latin American countries, such as Costa Rica (Solano-Mora et al., 2015; Valdivieso-Mora et al., 2018), Brazil (Laguardia et al., 2011; Lima et al., 2009), Chile (Lera et al., 2013), and Colombia (Massa, 2010).

In Mexico, few studies have evaluated the psychometric properties and normative data of the SF-36 for Mexican population groups, finding optimal validity and reliability (Zúniga et al., 1999; Durán-Arenas et al., 2004). Additionally, the SF-36 has been used as an association measure between health-related quality of life and sarcopenia (Manrique-Espinoza, et al., 2017), the use of preventive health services and the practice of physical activity (Gallegos-Carrillo et al., 2019), social networks (Gallegos-Carrillo et al., 2009), and the effectiveness of health care interventions and programs in older persons (Gallegos-Carrillo et al., 2008). However, the use of the SF-36 in the Mexican older persons has been limited. Only one study on the factor structure of the SF-36 Health Survey in Mexican older persons was identified (Aguirre et al., 2022) and, to our knowledge, previous studies did not provide extensive investigation of the overall psychometric properties of the SF-36 applied specifically for adults 60 years and older.

The present study aimed to evaluate the psychometric properties of the Short Form-36 Version 2 Health Survey, in a representative sample of older persons in Mexico living in the community.

Methods

Data collection

Data for the study was collected as part of a larger project, the Health and Living Conditions of Older Persons, conducted in Mexico City and Xalapa, Veracruz. Both cities are important metropolitan areas, in two of the federal entities in the country with the largest proportion of adults 60 years and older. In 2020, 16.2% and 14.4% of total population in Mexico City and Veracruz respectively, were aged 60 years and over (INEGI, 2021). The study included a probabilistic sampling design representative of the community dwelling population aged 60 years and older in both cities. The sampling frame used is the National Statistics Institute, INEGI, National Housing Framework 2016, developed from the cartographic and demographic information obtained in the 2010 Population and Housing Census (INEGI, 2016). A multi-stage clustered sample was used. The first stage consisted of a systematic random sample selection of census tracts (Areas Geográficas Estadísticas Básicas or AGEBs in Spanish) (primary sample units). The second stage consisted in selecting a systematic random sample of census blocks with proportional probability to the size of the AGEBs (secondary sample units). In the third and fourth stages a simple random selection of (a) households (tertiary sample units) and (b) persons aged 60 years or above who lived in those houses (quaternary sample units of the survey) was performed. The expected sample size (2,341 households), considering a 15% of non-response rate, guaranteed an 85% statistical power. The sampling design ensured a representative sample across both cities and representing all socioeconomic strata. Data collection was carried out from September 2018 to January 2019 using face-to-face paper interviews conducted by trained interviewers in the respondents’ home. Inclusion criteria to participate in the project included being 60 years and over, not having cognitive impairment and being regular residents of the selected household. The Spanish version of the Mini-Mental State Examination (MMSE) (Villaseñor-Cabrera et al, 2010) was completed to find whether the participants showed signs of cognitive impairment. Those with a score < 24 were excluded from the study. A total of 2,024 direct in-person interviews were conducted (response rate = 85.4%). For the current analyses, the responses of 109 participants were excluded. Thus, the final analytical sample comprised 1,915 participants.

A comprehensive questionnaire was purposely developed to investigate the demographic, socioeconomic and health characteristics of the study population, their social support networks, and intra-family relationships. To evaluate Health Related Quality of Life, the questionnaire included the Short Form 36 Health Survey Questionnaire (SF-36) (Ware & Sherbourne, 1992). The SF-36 is a 36-item scale grouped in eight subscales, each representing a health-related domain: physical functioning (ten items); physical role limitations (four items); bodily pain (two items); general health perceptions (five items); energy/vitality (four items); social functioning (two items); emotional role limitations (three items) and mental health (five items) and a single item that provides an indication of perceived change in health and is excluded from the scoring (Ware et al., 1993). The SF-36 score was calculated using the standard approach consisting of the sum of the items on each subscale. Item responses were transformed so that each subscale presents values between 0 and 100, where 0 represents the worst possible state of health and 100 represents the best possible state of health. Missing values were not imputed as recommended in the SF-36 User Manual, to report actual scores (Ware et al, 1993; Ware and Gandek, 1994; Ware et al., 1998). After each item response was transformed, the eight SF-36 subscales were generated (Ware et al., 1994).

Analysis

Following the IQOLA Project process and standards for validation and psychometric testing of the SF-36 Health Survey (Ware & Gandek, 1998b; Ware et al, 1993; Ware and Gandek, 1994; Ware et al., 1998), and several studies using these methods (Leplège et al., 1998; Sanson-Fisher & Perkins, 1998), the following analyses were conducted.

We examined data distribution, completeness, as well as out-of-range data and descriptive statistics were generated to obtain the percentage of missing data for each SF-36 item and the eight subscales. To evaluate response distributions, we calculated the mean, standard deviation, coefficient of variation, skewness, and kurtosis. The distribution of responses for each question was assessed visually and the percentage of responses on extreme values was examined for each subscale to detect floor or ceiling effects which were considered present if at least 15% of respondents achieved the lowest or highest possible score, respectively (Terwee et al., 2007). Item-level analyses were conducted next to examine if items could be aggregated into multi-item scales and if the scale structure in our sample conforms to the assumptions underlying the original SF-36 studies. For each of the SF-36 subscales, the following item-level analysis were included: (1) item internal consistency through item-scale correlations. Data were considered substantial and satisfactory when correlation between an item and its hypothesized scale was at least 0.40, the accepted standard (Perneger, et al., 1995); (2) item discriminant validity of each subscale by evaluating the average of item-total score with the average of the correlations of its items with the remaining subscales. Discriminant validity is considered successful when the correlation between an item and its own subscale is significantly higher, by two standard errors or more, than its correlation with other scales; (3) internal consistency reliability of the scale and the eight subscales was evaluated using Cronbach's Alpha coefficient (Cronbach, 1951; Henson, 2001).

In addition, we used factor analysis to explore the underlying structure of the SF-36 in our sample compared to the structure derived from the original US study and subsequent validation studies in other countries and population groups. The original US study and most studies validating the SF-36 though factor analysis have produced first order eight factor structures and two-factor structures of physical and mental health (Gandek & Ware, 1998; Reed, 1998; Ware & Sherbourne, 1992; Ware et al., 1998), while analyses in some non-Western and low- and middle-income countries have turned different factor structures (Kim et al., 2013; Suzukamo et al, 2011; AboAbat et al., 2020; Salazar & Bernabé, 2015). To evaluate the factor structure of the SF-36 in our study sample we conducted the following analysis. To evaluate if the data were appropriate, we estimated the correlation matrix among the eight subscales, the Bartlett sphericity test and the Kaiser–Meyer–Olkin (KMO) sample adequacy test (Campo-Arias et al., 2012). High correlation between the subscales, a KMO measure of sampling adequacy ≥ 0.7 and a score close to 0.05 in Bartlett’s sphericity test were defined to establish the adequacy for this data. After verifying the data were appropriate, we conducted an exploratory factor analysis. Factors were extracted using the principal components method, establishing Eigenvalues ≥ 1 as criteria for retaining factors, in addition to the examination of the sedimentation graph (scree plot) of the factor analysis (Guttman, 1954).

Convergent validity of the SF-36 was tested by estimating the strength of the correlation between measures of economic situation, mental health, and perceived social support from spouse/partner, children, and friends. Depressive symptoms were measured by the 15-item Geriatric Depression Scale (GDS-15) (Sheikh & Yesavage, 1986; Yesavage et al., 1982), which has been previously validated in the Mexican population showing reliability and validity (Acosta Quiroz et al., 2021). The GDS-15 scale includes dichotomous responses (Yes/No); a score equal to or greater than five positive items indicate depressive symptoms. Economic situation is self-reported by participants and categorised as Very Good/Good; Fair, and Poor/Very poor. To measure social support, we use a set of questions included in the Mexican Health and Aging Study, MHAS, a longitudinal and nationally representative study of adults 50 years and older living in the community. Social support is attained in the MHAS as self-reported perceived support by respondent from spouse-partner/children/friends in four different areas, if they understand their feelings, if they feel he/she can confide in them, if they listen to them (a lot, little, nothing), and if they disappoint them (reverse score) (MHAS 2012). Our hypothesis postulates to find a modest to moderate correlation between the total score of each SF-36 subscales and the other measures.

Criterion-based discriminant validity was also assessed using known-groups validity. This allowed to explore the ability of the questionnaire to discriminate among subgroups of respondents known to differ in criteria assessed independently such as key clinical variables. Our hypothesis is that those individuals in the group of good health conditions, will have a higher score in each of the eight subscales of the SF-36 (better health-related quality of life), in contrast to those who are in the group of poor health conditions. Respondents were assigned to mutually exclusive groups differing in each of these characteristics. Participants with good health conditions included those who reported having none of the following health conditions: diabetes, hypertension, chronic respiratory illness, heart attack, stroke, disabling pain, urgency and stress related urinary incontinence, and painful arthritis. Those in the bad health group presented at least one of these same health conditions. The t-test were performed to test the statistical significance of the observed differences between the subgroups. All statistical analyses were conducted using Stata v.14.0 (StataCorp, 2015).

Ethical considerations

The protocol was evaluated and approved by the Ethics in Research Committee and the Research Committee of the National Institute of Geriatrics (registration DI-PI-007/2018). Each participant who was invited to participate received a fact sheet about the project and had the opportunity to ask questions about it. Informed consent was obtained in writing from all participants who agreed to participate. The project was funded by the National Council of Science and Technology of Mexico (CONACyT, for its acronym in Spanish) under the Call for Basic Scientific Research (CB-2016/287302).

Results

Our working sample consisted of 1,915 Mexican older persons of which 64% were women, with a mean age of 72 years (SD 8) and general low educational attainment with 52% of the sample completing primary level education or had no schooling.

Item Descriptive Statistics

Completion rate was high (> 97%), the domain with the higher missing data was physical function with 2.9% (results not shown). Table 1 presents the description of each of the SF-36 domains. Item mean scores were lowest in the dimensions of General Health (58.6), Social Functioning (58.9) Role Physical (66.5) and Vitality (67.8), and the highest mean scores were obtained in the Role Emotional (82.2), Mental Health (76.9), Bodily Pain (72.2) and Physical Functioning (70.3).

Table 1 Descriptive data of the eight scales of the 36-item Short Form Health Survey in Mexican Older Persons (n = 1,915)

Skewness calculations showed all subscales were negatively skewed with the Role-Emotional, Mental Health, Physical Functioning, Bodily Pain and Role-Physical subscales presenting a moderately skewed distribution. The highest floor effects were observed for the Role-Physical and Role-Emotional subscales, at 25.9% and 12.4%, respectively. Regarding ceiling effects, these were found also in these two subscales (58.9% and 76.5%, respectively), while moderate effects were present for the Bodily Pain (41.5%) and Physical Functioning (26.1%) subscales (Table 1).

Scaling Assumptions, Validity, and Reliability

Table 2 presents the psychometric properties of the SF-36 subscales. All items passed the test for discriminant validity, with correlation coefficients between items and the hypothesised scale, higher or equal than 0.40, except for the Social Functioning and Role Emotional subscales. The highest correlation coefficient was between the Mental Health and the Vitality subscales, which measure the emotional dimension of health, while Social Functioning presented the lowest correlation coefficients. On the other hand, the correlations between three of the dimensions of physical health: Physical Functioning, Role Physical, and Bodily Pain were also high (> 0.50). Cronbach alpha coefficients ranged from 0.79 to 0.87, indicating good reliability (Table 2).

Table 2 Internal validity, reliability, and scale properties of the 36-item Short Form Health Survey in Mexican Older Persons

Tests of adequacy of the data for performing factor analysis showed the determinant of the correlation matrix was equal to 0.000, a Bartlett test of sphericity Chi-square = 40,678.335 (p-value = 0.000) and a value of 0.951 in the Kaiser–Meyer–Olkin Measure of Sampling, confirming the variables are not intercorrelated, the data is adequate, and one can proceed with factor analysis estimations. The exploratory factor analysis on all the items of the SF-36 resulted in the extraction of six factors: Physical Function, Vitality and Mental Health, Role Physical, Role Emotional, General Health, and Bodily Pain, with eigenvalues equal to or greater than one. Factors 1 and 2 account for almost half of all the variance (47%), with eigenvalues of 13.4 and 2.9, respectively (Table 3).

Table 3 Principal component analysis (six factor) of the 36-item Short Form Health Survey in Mexican Older Persons

Physical and health items loaded within their hypothesized scales (General Health, Physical Function, Role Physical and Bodily Pain), but some items in the Role Physical and Bodily Pain present cross-loadings. Mental Health and Vitality items loaded on the same factor instead of loading on a separate factor each and several items show cross-loadings. The Social Function items showed problematic results with the first item (Q. 20) presenting low factor loadings (< 0.40) and cross-loading among three factors, and the second item showing even lower factor loadings (< 0.30) and large uniqueness (86%), showing it does not fall into any of the factors (Table 3).

Convergent Validity

Table 4 presents the correlation coefficients between the eight SF-36 subscales and the variables mental health, economic situation, and social support. Correlations were all negative between depressive symptoms and economic situation and all SF-36 subscales. The hypothesis that respondents presenting higher depressive symptoms and worse economic situation would present lower health-related quality of life was confirmed. All coefficients were statistically significant (p < 0.001) apart from economic situation and Social Functioning.

Table 4 Correlation coefficients of mental health, economic situation and social support and the 36-item Short Form Health Survey in Mexican Older Persons

For the three social support groups, spouse/partner, children, and friends, correlations were negative for the questions on how much participants perceived they understood their feelings, that they can confide in them, and that they listen when they need to talk about their worries. This shows that the less social support participants perceive (higher score), the lower their perceived health related quality of life is (lower score). These coefficients are moderate to low and statistically significant (p < 0.05 or lower) for all subscales, except for the Social Function subscale that was only significant for how much participants perceive friends, but not their spouse or children, understand their feelings and listen when they need to talk about their worries. For all three social support groups, as participants reported less disappointment (reverse score), the higher they reported their perceived health related quality of life. As with the rest of the social support questions, the correlation coefficients are moderate to low and statistically significant apart from the Social Functioning subscale (Table 4).

Discriminant validity (known-groups approach)

The results of the known-groups validity analysis are presented in Table 5. The second hypothesis that respondents in good health would report better scores in all SF-36 subscales was confirmed. Those in the bad health subgroup (n = 1,303) show consistently lower means in most subscales that those in the good health subgroup (n = 610), and these differences are statistically significant. The only subscale with a minor difference was Social Functioning, and this difference was not statistically significant.

Table 5 Difference between known groups in mean scores of the 36-item Short Form Health Survey in Mexican Older Persons

Discussion

This article describes the results of the study of the psychometric properties of the 36-item Short Form Health Survey (SF-36) in a sample of Mexican older persons. The Spanish SF-36 scale previously translated and tested in other Spanish-speaking populations (Durán-Arenas et al., 2004; Vilagut et al., 2005) was administered as part of a comprehensive questionnaire. This paper shows that psychometric properties of the Spanish version of the SF-36 Health Survey in older persons were satisfactory according to the criteria set by original studies, showing the validity and convenience of its use in community dwelling older people in Mexico.

Previous studies of normative data of the SF-36 Health Survey in Mexico have been conducted in samples of adults (Durán-Arenas et al., 2004; Zúniga et al., 1999), and few studies have explored the association of HRQoL with use of health services, depressive symptoms, chronic diseases and sarcopenia (Gallegos-Carrillo et al., 2009; 2008; Manrique-Espinoza et al., 2017) in adults 60 years and older. To our knowledge, there is only one study that explores the factor structure of the SF-36 Health Survey in older adults (Aguirre et al., 2022), but no previous studies conducted a comprehensive validation of the SF-36 survey in this population group. Therefore, this is the first comprehensive study validating the SF-36 Health Survey, using a representative sample of Mexican older persons. The acceptability of the questionnaire was considered high given the missing responses were less than 3%. Floor and ceiling effects were comparable to those found in other studies, with highest scores observed for the Role Physical and Role Emotional subscales (Garratt & Stavem, 2017; Salazar & Bernabé, 2015; Sanson-Fisher & Perkins, 1998). High positive mean scores have been found in previous studies with older persons, mainly on the Role Emotional subscale (Meng et al., 2013; Hoopman et al., 2006).

The internal consistency coefficients are satisfactory and within expected range for all the subscales according to results from the original validation studies and previous studies in other countries (item-test ≥ 0.70, item-total ≥ 0.50, and inter-item correlation 0.35–0.48). Highest correlations were observed between the subscales measuring physical health aspects. The Role Emotional subscale showed slightly lower correlation with subscales measuring physical health aspects, but higher correlations with most of the subscales measuring mental aspects. In addition, our findings also show the highest correlation between the Mental Health and Vitality subscales (0.70), given their strong association with mental health rather than with physical health (Fuh et al., 2000; Lam et al., 1998; Leplège et al., 1998; Lewin-Epstein et al., 1998; Lim et al., 2008; McHorney et al., 1994; Montazeri et al., 2005; Ngo-Metzger et al., 2008; Ren et al., 1998).

The Social Functioning subscale had significantly lower correlations with all subscales showing low reliability. Interestingly, the low or unsatisfactory results of the internal consistency reliability of the Social Functioning subscale was previously noted in several studies in Asia (Azen et al., 1999; Fuh et al., 2000; Li et al., 2003; Lim et al., 2008; Ngo-Metzger et al., 2008; Tseng et al., 2003), and Brazil (Laguardia et al., 2011).

This studies note that different cultures give different meaning to physical and mental health constructs, and that there are cultural differences in the concept of social functioning as well as in perceptions about concepts such as pain and vitality (Fuh et al., 2000; Lim et al., 2008; Ngo-Metzger et al., 2008; Tseng et al., 2003). For example, it has been noted that traditional health beliefs of many Asian populations do not involve a strict dichotomization of physical versus mental health, but instead, health is viewed as a balance of yin and yang (hot and cold) principles. Also, it has been noted how Asian populations may conceptualize mental and physical health differently than Western populations (Fuh et al., 2000; Ngo-Metzger et al., 2008; Tseng et al., 2003).

With respect to convergent validity, all subscales of the SF-36 were negatively correlated with depressive symptoms, and these were statistically significant (p < 0.01). This negative and statistically significant association was also found in previous studies (Azen et al., 1999). The SF-36 also shows good discrimination between groups of people with and without chronic diseases as found in previous studies (Hoopman et al., 2006; Laguardia et al., 2011; Tyack et al., 2018), also suggesting good construct validity. There was a statistically significant difference between the scores of all scales between the two groups, except for the Social Functioning scale where the scores were almost equal, and the difference was not statistically significant.

Exploratory factor analysis of individual items resulted in a partial correspondence of the items to their hypothesized scales, with exception of the Vitality, Mental Health, and Social Functioning subscales, resulting in a six-factor solution. Variations in the factor structure of the SF-36 survey have been noted previously. The results of a recent study validating the factor structure of the SF-36 in a sample of older persons in two states in Northern Mexico support a four-factor structure (Physical Function, Body Pain, Role Physical and Psychological Health), with a reduced number of items in each of these subscales (Aguirre et al., 2022). A study in general population in Korea produced a six-factor solution (Kim et al., 2013) where, as with our study, the Social Function items loaded with the factors that include the Mental Health and Vitality items. In our study, the Social Function items also loaded with the factors that include the Role Emotional and Role Physical items. These similar results show an important point about how different populations perceive and value their physical and mental health and how these impact their overall wellbeing and engagement. The way cultural systems influence and shape physical and mental health, health behaviours, perceptions, how people cope and seek help has been well documented (Angel & Thoits, 1987; Hwang et al, 2008; Lora, 2012).

Of particular interest is the Social Function subscale, where some authors have noted difficulties as a result of the existence of only two items in this subscale (Walters et al., 2001), the high difficulty in translating its two questions (Wagner et al., 1998), or due to problems with the conceptualization of social function, the local understanding of the meaning of social activities, as well as how well individuals understand the differences between the extent of time included in the questions (i.e. all of the time, most of the time) (Kim et al., 2013; Ngo-Metzger et al., 2008; Tseng et al., 2003; Wang et al., 2008, 2015).

Following the results obtained with the Social Function scale of our study, an adaptation of the two items included in the survey which better represent local cultural norms, is suggested as follows. First, in Mexico, the notion of social activities can be frequently interpreted as relationships with or help with daily activities to people from other households. Therefore, the connotation of social activities should be replaced by a term that conveys the meaning of contact between the people interviewed with others close to them. Specifically, including specific references to participation in family gatherings, visiting friends, neighbours or relatives. The second recommendation is to change the structure of the questions in Spanish so that they first ask about the difficulty that the person has had to make contact, visit or participate in gatherings with family or other acquaintances, as a result of their health or emotional problems. The recommended wording in Spanish for these two items is as follows:

  • ORIGINAL: Durante las 4 últimas semanas, ¿hasta qué punto su salud física o los problemas emocionales han dificultado sus actividades sociales habituales con la familia, los amigos, los vecinos u otras personas?

  • MODIFIED: Durante las 4 últimas semanas ¿ha tenido dificultad para participar en reuniones familiares o visitar amigos, vecinos o familiares, por problemas de salud física o emocional?

  • MODIFIED: In the past four weeks, have you had difficulties participating in family reunions or visiting friends, neighbours or family due to physical or emotional health problems?

  • ORIGINAL: Durante las 4 últimas semanas, ¿con qué frecuencia la salud física o los problemas emocionales le han dificultado sus actividades sociales (como visitar a los amigos o familiares)?

  • MODIFIED: Durante las 4 últimas semanas, ¿con qué frecuencia se le ha dificultado participar en reuniones familiares o visitar amigos, vecinos o familiares, por problemas de salud física o emocional?

  • MODIFIED: In the past four weeks, how often have you had difficulty participating in family reunions or visiting friends, neighbours or family due to physical or emotional health problems?

Given the importance of assessing older persons’ wellbeing beyond their chronic conditions, there is still a growing need for the development or adaptation of optimal outcome measures that are culturally appropriate. Moreover, by using adequate tools, researchers and health practitioners can promote better quality of life among older persons.

This study advances knowledge in the topic by presenting the psychometric properties of the SF-36 Health Survey and its validity. As such, this study has important strengths, first, compared to previous studies in Mexico that have focused on general population (Zúniga et al., 1999; Durán-Arenas et al., 2004) or few aspects of the scale validation (Aguirre et al., 2022), to our knowledge, this is the first to include a comprehensive evaluation of the psychometric properties of the SF-36 Health Survey using a representative sample of older persons, allowing for a better understanding of the scores and the application of the SF-36 Health Survey in Mexico. Considering the study’s findings, it also presents the relevance of cultural factors in generating, adapting, validating, and applying health questionnaires. It also has methodological strengths based on the large random representative sample of older persons, and conducted under strict standardisation of all fieldwork staff, in addition to very high response rates and data completeness.

Notwithstanding, we identify two limitations in the study that should be considered when interpreting the results. First, while the study uses a representative sample of older persons in Mexico, the localities are mostly urban and therefore, results cannot be generalised to rural areas. Second, this study did not include test–retest reliability data, as at this time the full questionnaire was only administered once. However, a second wave of the larger project is currently underway and so we expect to add this to the validation tests.

In the future, research should extend to other regions of the country, including rural areas, to elicit similarities and possible differences in validating scales as the SF-36. To fully understand how Mexican culture and values influence the scores of this and other quality of life measurements, future studies should consider including qualitative methods that explore these factors in-depth. In turn, this would also allow for a better understanding of the few divergent results from the original IQOLA project studies and previous studies validating the SF-36 Health Survey, particularly in the role of mental health and social functioning in the country. These differences do not invalidate the validity and reliability of the SF-36 to measure health-related quality of life in Mexican older persons but point to the need of investigating which tools are most appropriate for local use.