Background

Children’s health and health behaviours are one of the global public health issues. Health-related habits that consolidate in adolescence subsequently persist into adult wellbeing, development of health complaints, tobacco use, diet, physical activity levels, and alcohol use [1, 2]. Thus, it is of high of importance to use valid and reliable research instruments to assess children’s and adolescent’s health and health-related behaviour.

One of the largest international surveys on adolescent health and health-related behaviours is the WHO collaborative Health Behaviour in School-Aged Children (HBSC) study which focuses on 11-, 13- and 15-year-old boys’ and girls’ health and well-being in their social context (school, family, and friends) [3]. The HBSC questionnaire is currently used in 50 countries, mostly in Europe and North American [4]. To date, the HBSC questionnaire has been used within linked projects in Australia, Brazil, Chile, China, Hong Kong, Lebanon, Mozambique, Sao Tome and Principe, Saudi Arabia and Taiwan. Thus, there is a strong need to examine the validity and reliability of the survey instrument in different continents and cultures [5].

In the past 30 years, Vietnamese economy has rapidly grown, and Vietnam which had been recognized as one of the world’s poorest countries is now considered a low-middle income country. Since the 1990s, 30 million people have exceeded the official poverty line, and until 2015, GDP per citizen increased from $100 to $2,300 [6]. The GDP growth rate in 2018 was the highest since 2008 [7]. Despite these developments, there is significant economic inequality affecting opportunity, health, and education [6, 8]. The population of Vietnam was 94.7 million people (49.4% male) in 2018. There are 16.5 million pupils in general schools of whom 5.4 million are in Vietnamese lower secondary schools (aged 11–15). Approximately a third of lower secondary school pupils are in urban areas (1.8 million) while the remaining pupils are in rural areas (3.6 million) [8].

The rapid socio-economic development has brought new challenges in lifestyle and health behaviour. Weiss et al. [9] reported that approximately 12% of Vietnamese children experienced significant mental health problems. In 2013, one in five (20%) lower secondary school pupils in Ho Chi Minh City was classified as overweight with a further 8% with obesity [10, 11]. Child obesity continues to increase [12] at the same rate in which more families have improved economic situations [10, 13]. Vietnamese families with a higher socioeconomic status can also provide their children with a modern lifestyle, such as computers, the internet, and television, which causes these children to be more physically inactive by increased leisure time spent by screen-time activities [14]. Overall, the majority of adolescents in Ho Chi Minh City spent >  = 2 h/day in screen-time activities which are also highly associated with the increase in their overweight and obesity prevalence (21%) [15]. Eating behaviour has also changed over the last decade which contributes to childhood overweight and obesity [16,17,18]. The replacement of the traditional diet based on cereal, tuber, and vegetables expanded to include meat, eggs, milk, fat, and sugar. Fast foods and drinks, animal-based foods, and refined carbohydrates (sugars, sweets) represent the most significant phenomenon of the nutritional transition [16].

In this context, the importance of valid and reliable research tools for adolescents is highly relevant. Although some test–retest studies of selected HBSC questionnaire items have been carried out [19,20,21,22,23,24], the number of studies assessing the reliability of research instruments in Asia conditions is still limited [5]. Thus, the aim of the present study was to examine the test–retest reliability of selected well-being, physical and screen-time related siting activities, and eating behaviour items of the HBSC questionnaire in a sample of Vietnamese adolescents.

Methods

Sample and procedure

The data for this test–retest study were collected between November and December 2018 in Vietnam. The study is based on the HBSC study and its methodology [25].

Four elementary schools located in rural as well as urban areas in the Hồ Chí Minh City, Đồng Nai Province, and Tiền Giang Province were randomly selected for the study. Trained researcher assistants administrated the questionnaires during regular class time in the 6th, 7th, 8th, and 9th grades (615 registered pupils) without the presence of the teacher.

In the first part of the study (Test), we collected data from 525 adolescents (response rate 85.37%). Non-participation of in this study was due to illness (2.76%) and because of other reasons that the research team were not able to identify (11.87%). The second part of the study (Retest) was conducted 3 weeks after the test. In the retest, we collected data from 504 adolescents (response rate: 81.95%). 19 adolescents who had participated in the first part of data collection (Test) dropped out. 94 adolescents could not be paired by ID codes. The final sample consisted of 410 adolescents (boys 40.2%) and was stratified by grade, sex, and place of residence (Table 1).

Table 1 Demographic characteristics of sample

Questionnaire items

All items were part of the 2018 HBSC questionnaire and had been previously tested in HBSC countries [19, 23]. Before the start of the study, the translation and back-translation of the questions was conducted, as well as a focus group with students to ensure the comprehensibility and understanding of the questions.

Self-rated health was assessed using the question: “Would you say your health is…?” Possible responses included: Excellent; Good; Fair or Poor. The cut-off point for self-rated health was excellent [4, 26].

Life satisfaction was measured by the question: “In general, where on the ladder do you feel you stand at the moment? Tick the box next to the number that best describes where you stand.” Responses ranged on a scale from 0 to 10. The question was preceded by the explanatory text: “Here is a picture of a ladder. The top of the ladder “10” is the best possible life for you and the bottom “0” is the worst possible life for you.” Higher scores indicate a greater level of perceived satisfaction. The cut-off point for the dichotomization of life satisfaction was 8 and more [4, 27].

Individual health complaints (headache, stomach ache, backache, feeling low, feeling irritable, feeling nervous, sleep difficulties, and feeling dizzy) were measured by asking a question for each individual health complaint: “In the last 6 months: how often have you had the following….?” Please tick one box for each line”. Responses were rated on a five-point scale: About every day; More than once a week; About every week; About every month; Rarely or never. The cut-off point the for dichotomization of individual health complaints was more than once a week [4].

Moderate to vigorous physical activity (MVPA) was measured by the question: “Over the past 7 days, on how many days were you physically active for a total of at least 60 min per day? Please add up all the time you spent in physical activity each day.” Possible responses were selected on a scale from 0 to 7 days. The question was explained by text: “Physical activity is any activity that increases your heart rate and makes you get out of breath some of the time. Physical activity can be done in sports, school activities, playing with friends, or walking to school. Some examples of physical activity are running, brisk walking, rollerblading, biking, dancing, skateboarding, swimming, soccer, basketball, football, & surfing”. The cut-off point for the dichotomization of MVPA was 60-min MVPA 7 days [4, 21, 28].

Vigorous physical activity (VPA) was measured by asking the question: “Outside school hours: how often do you usually exercise in your free time so much that you get out of breath or sweat?” The following multiple-choice answers were offered: Every day; 4–6 times a week; 2-–3 times a week; Once a week; Once a month; Less than once a month; Never. The cut-off point for VPA was 4 or more times a week [4].

Screen-time related siting items were evaluated by asking the question for each item: “In your leisure time, how many hours a day do you spend.…: a) … playing games on a computer, game console, tablet, smartphone, or smart TV?; b) … using a computer or another electronic device (for example smartphone or tablet) for a different purpose (for example, social and communication networks – Instagram, Twitter, Snapchat, Facebook, etc., chatting or surfing the internet)?; c) … watching internet videos (for example on YouTube or Twitch)?; d) … watching TV, DVDs or videos (do not include internet videos on websites such as YouTube)?; Please tick one box in each row.” Possible multiple-choice responses: Not at all; About half an hour a day; About 1 h a day; About 2 h a day; About 3 h a day; About 4 h a day; About 5 h a day; About 6 h a day; About 7 or more hours a day. The cut-off point for screen-time related siting items (gaming, social media, internet videos, TV or DVDs) was 2 h a day [4, 19, 29, 30].

Breakfast consumption on school days was assessed using the question: “How often do you usually have breakfast (more than a glass of milk or fruit juice)?” Response options included: Never; One day; Two days; Three days; Four days; Five days. The cut-off point for breakfast consumption on school days was every school day [4].

Eating behaviour (Fruit, vegetables, sweets, and sugared soft drinks consumption) was evaluated using the question for each item: “How many times a week do you consume….? Please tick one box in each row.” Possible response options were: Never; Less than once a week; Two to four times a week; Five to six times a week; Once a day; More than once a day. The cut-off point for eating behaviour items (Fruit, vegetables, sweets, and sugared soft drinks consumption) was daily [4].

Statistical analyses

Descriptive statistics and the Chi-square test were used to characterize the responses in the test and retest (Table 2). To assess the test–retest reliability of the selected items, the single measure of the Intraclass Correlation Coefficient (ICC) was used. ICCs were computed from a two-way mixed-effects ANOVA model for absolute agreement of single measures [31]. All of the selected items were stratified by class, sex, residence. The strength of the test–retest agreement was evaluated according to Koo and Li [31]: Larger than 0.90 indicates excellent agreement; 0.75–0.9 indicates good agreement; 0.5–0.75 represents moderate agreement and below 0.50 is classified as poor agreement.

Table 2 Rates of dichotomous health behaviour responses and means during Test–Retest

All of the selected items were dichotomized and cut-off points for the dichotomization were set according to the international HBSC report, “Spotlight on adolescent health and well-being” [32]. Then, the Cohen’s simple Kappa statistic was used to measure the agreement between binary variables. According to Cohen [33] the Cohen’s Kappa correlation is classified as: greater than 0.5 indicates large agreement; 0.3–0.5 indicates moderate agreement, 0.1–0.3 small agreement and less than 0.1 is classified as trivial. For all statistical analyses, the IBM SPSS 20 for Windows was used.

Results

In the final sample of 410 Vietnamese adolescents (mean age = 12.6, SD = 1.2), there were more girls (60%) than boys (40%) and slightly more respondents were from urban areas (54%) than rural areas (46%). The differences in the rates of each of the health behaviour item cut-offs were not statistically significant (Table 2), proportions of no response shift between test and retest varied from 37 to 75% (Fig. 1), implying good stability of items among Vietnamese adolescents. The highest rate was watching TV or DVDs for 2 or more hours (71%) while the lowest rate was stomach ache more than once a week (10–11%).

Fig. 1
figure 1

Percentages of test–retest response shifts on selected items of HBSC questionnaire

Based on the total sample, there was poor to good agreement on the use of the full scale measured by ICC (Table 3) and moderate to large agreement for binary variables (Table 4).

Table 3 ICC for selected items of HBSC questionnaire by sex grade and residence
Table 4 Cohen’s kappa for selected items of HBSC questionnaire by sex, grade and residence

The ICCs of self-rated health and life satisfaction items ranged from 0.65 to 0.85, depending on sex, age, or place of residence. After testing for changes based on the cut-off values, there was also large agreement with kappa ranging from 0.56 to 0.81, depending on sex, age and place of residence.

There was poor to moderate agreement in the health complaint items with ICCs ranging from 0.24 (boys feeling dizzy) to 0.70 (8th graders experiencing sleep difficulties) after stratifying by sex, age and place of residence (Table 3). The Kappas for the dichotomized items ranged from large to almost trivial (Table 4). For example, items with large agreement were stomach ache for boys (k = 0.58), 6th graders (k = 0.58), 8th graders (k = 0.53) and urban residents (k = 0.51). After dichotomizing the item for feeling dizzy, there was trivial to small agreement among boys (k = 0.1), yet there was moderate agreement among girls (k = 0.42). Furthermore, there was small agreement for 7th graders (k = 0.25) and urban residents (k = 0.28) for the feeling dizzy item.

There was also poor to moderate agreement in physical activity items as a scale (ICC = 0.45–0.68) depending on sex, age or place of residence. When treating the items as dichotomous variables, there was moderate (boys, girls, 6th, 9th graders, and urban) to large agreement (7th, 8th graders, and rural) for MVPA. Using the cut-off for at least 4 times a week of vigorous exercise, there was small agreement for girls (k = 0.22), large agreement for 9th graders (k = 0.51) and rural residents (k = 0.56), as well as moderate agreement for boys, other age groups and urban residents.

Depending on sex, age, or place of residence, in screen-time items there was poor to good agreement (ICC = 0.45–0.77), with the exception of 7th graders reporting TV or DVD time (ICC = 0.34).

Based on the cut-off for 2 h per day per purpose, the stability was greater with moderate to large agreement depending on sex, age or place of residence.

There was poor agreement in breakfast consumption for boys (ICC = 0.40) and for 6th graders (ICC = 0.45), and also in sweets consumption for boys (ICC = 0.42) and for 6th graders (ICC = 0.40). Otherwise, there was moderate to good agreement for other sex, age, or place of residence analyses in eating behaviours, with eating vegetables (ICC = 0.57–0.76) being the most stable eating behaviour across the different strata. There was small agreement in consuming sweets daily among boys (k = 0.22) and urban residents (k = 0.29) as well as 6th graders reporting daily soft drink consumption (k = 0.26). The other daily food consumptions were interpreted as having moderate to substantial test–retest agreement (k = 0.35–0.65) depending on sex, age, or place of residence.

Discussion

The present study provided a snapshot of health behaviours among Vietnamese adolescents and examined the test–retest reliability of selected well-being, physical and sedentary activity, as well as eating behaviour items of the Health Behaviour in School-Aged Children (HBSC) survey. If these results were representative of Vietnamese adolescents and were compared with the other countries in the HBSC study [32], Vietnamese adolescents would have the highest rates of daily feeling low (44%) and the lowest rates of vigorous exercise on at least 4 days a week (22–23%). Considering the majority of the items dealing with self-rated health, life satisfaction, physical activity, screen time and eating behaviour being interpreted to have poor to good agreement, there is confidence that these items are sufficiently reliable for use in Vietnamese conditions for population-based studies. The health complaints items seem to have borderline reliability and their use should be further researched alongside the reasons for the low demonstrated reliability. Our results also indicate the stability of the items on the population level, while shifts in responses could be seen on the individual level. To our knowledge, this is the first study assessing test–retest reliability of well-being, physical and sedentary activity, as well as eating behaviour related items that are commonly used in large population-based studies such as the HBSC or Global School-Based Student Health Survey (GSHS) in the Vietnamese environment.

Similarly to earlier studies that tested these measures in different contexts such as China [5] and central Europe [19], the findings from this study continue to demonstrate the versatility of some measures in different cultural contexts. In some cases, the low ICC and Cohen’s Kappa were reported. The reasons for this might be that the stratification yielded a low number of responses per category which indicates the necessity to plan future studies that can detect sufficient counts of low incidences of certain cut-off values or limit the choice of statistical methods. Secondly, the low ICC and Cohen’s Kappa also indicates that in a specific set of questions, cultural context plays an important role and the use of such questions should be carefully considered. We believe this to be another important result of our study whereas the number of self-reported studies in Asia is growing, similar sets of questions are commonly used in other population-based studies and self-reported measures are often the only feasible method for the measurement of health behaviour in developing countries [34]. On the other hand, our study followed the Koo & Li [31] guideline for reporting ICC while many other studies [5, 19, 20, 23] used milder criteria. In this context, it should be pointed out that other significant studies in the field also recommend even stricter criteria [35].

The results from this study concerning the instruments such as self-rated health and life satisfaction were similar to previous studies in the field [36, 37]. The good to large level of agreement for the self-rated health item extends the existing knowledge of it being a valid and reliable measure of physical and mental health across 19 countries in Europe [38] to the Vietnamese adolescent population. Moreover, the Cantril ladder used to measure life satisfaction in the adolescent population was tested as valid and reliable [36], seems to be stable over time [39] and shows very good repeatability reliability with Pearson correlations with r = 0.70, p < 0.001 [37].

The reliability of the health complaint items seems to be the lowest in our study. This is in contrast to a previous study conducted in Norway [22] in which the ICC coefficient showed substantial to almost perfect (ICC = 0.61–0.81) agreement. There was more stability across the items among girls than boys for both the full-scale response and the use of cut-off values, particularly for the nervous and feeling dizzy items. As the scores were the lowest, it should be questioned whether answering the set of questions on health complaints might be culturally determined or inappropriate in some countries which may reduce children’s willingness to answer consistently. The Vietnamese culture has been influenced by the Confucian tradition characterized by affective control (group-disturbing emotions such as anger), harmony, and self-control in interpersonal relationships [9, 40]. This might suppress certain child behaviours (e.g., aggression) and support others (through modelling and empowerment), with the result that the relative levels of different types of child psychopathology vary across cultures [9, 41, 42].

Similar results concerning the measures of MVPA and VPA were also reported by other studies based in European countries such as the Czech Republic, Slovakia, Poland (ICC = 0.60–0.62) [19] or Finland (ICC = 0.69–0.77) [43]. Moreover, improved agreement was reported by other studies across the globe with ICC ≥ 0.72 [5, 21, 44]. Ng et al. [43] highlighted that the test–retest period for the MVPA item should be considered when comparing results from different studies. Shorter periods between test-retests can cause respondents to remember their previous answers which may result in higher reliability, but it also depends on the cognitive load between testing. It is also necessary to take sociocultural differences between Europe and Asia into account. These differences may increase or decrease item reliability in some countries. It might be important to know the number of physical education classes per week and other culturally determined facts that could affect physical activity levels and thus the answering of repeated questions in the test–retest procedure. Also, the seasonable character of some sports activities, sociocultural issues, current morbidity (seasonal epidemics), and other specific circumstances, such as different levels of demands placed on students during the school year, might influence the results. Countries with a more stable climate and weather could note greater agreement in any season as this element is eliminated and does not interfere with the results.

Overall, screen time measures showed poor to good ICC and moderate to large (Kappa) agreement for the test–retest reliability. Similar results were reported by another study based on adolescents in the Central European region [19]. In other contexts, Liu et al. [5] examined the reliability of similar screen-time-related sitting items in Asia on a sample of 91 Chinese school-aged children and reported a wider variance in the results. As new technologies keep on evolving and spreading quickly (smartphones, tablets, etc.), we modified and restructured the question to a single set of questions and focused only on weekdays to minimize the suspected overestimation of the previously used HBSC screen-time related sitting questions.

The reliability of eating behaviour items reached moderate to good test–retest agreement in most of the categories. The questions on eating behaviours have been previously recognized as simple and easy to understand in the adolescent population [24]. Most of the studies in the field across the countries in the HBSC network reported similar results. The responses from adolescents in European countries showed very good repeatability reliability [24, 45, 46]. A study from New Zealand [47] showed good to excellent reliability and reasonable validity. Speck, Bradley, Harell, & Belyea [48] who used a 24-h diet recall for assessing the validity of the questionnaire reported confidence in the reliability and validity of the measures to assess the majority of adolescent food intakes. Thus, we assume that eating-related questions have substantial reliability in the identification of the basic dietary habits such as having breakfast, fruit and vegetables, sugar-sweetened beverages and sweet consumptions among adolescents in Vietnam. These eating habits can be further associated with non-communicable diseases such as obesity, cardiovascular disease, cancer, and diabetes [49].

Strengths and limitations

To our best knowledge, this is the first study assessing the test–retest reliability of well-being, physical and screen-time related siting activities, as well as eating behaviour related items among adolescents in Vietnam. However, there are also some limitations of the study to be mentioned. This test–retest reliability study evaluates only the stability of the questions over time and does not investigate the construct validity of the measures. Although there are some ways to evaluate against a criterion from other clinical surveys, other health behaviours may be more complex and may require a separate study, for example device-based measurements of physical activity behaviour.

The sample contained 40.2% of boys and 59.8% of girls, although the average proportion of female students in lower secondary level is around 48.5% [50]. The reason for this is the higher rate of incomplete questionnaires which were not paired in the test/retest survey round as well as the higher rate of absence of boys.

When taking into account the economic status and cultural background, it seems to be challenging to interpret the results without a diverse sample in Asian countries. This can undermine representativeness and population generalizability. Thus the representativeness and generalizability of our results should be carefully considered. Also, the results should not be generalized to all Asian cultures.

Although a three-week interval was used before the administration of the retest, which seems to be sufficiently long to avoid the retention of previously chosen answers, it can also be a limitation. Some changes may have occurred in the behaviour of some participants between the test–retest administration period. Another limitation is that screen-time related siting activities can be pursued at the same time (e.g., social media while watching TV). However, the items presented in the study do not take this into account.

Conclusions

The self-reported health, life satisfaction, physical and screen-time related siting activities, as well as eating behaviour items from the HBSC study were seen to have acceptable test–retest reliability test to be used among Vietnamese adolescents. The health complaints items showed borderline reliability; its use should be further researched and the socio-cultural context should be carefully considered. As the HBSC study aims to understand adolescent health and health behaviours in a cross-national way as well as within countries, data collection in Vietnam is desirable. The outcomes of such work could shed light on further health inequalities and allow for the monitoring of the sustainable development goals agenda.