Introduction

Accurately measuring dietary intake is considered one of the greatest challenges for epidemiological purposes [1,2,3]. In children, the collection of dietary data is particularly difficult due to the requirement of using a proxy respondent. Considering this, the evaluation of the performance of dietary intake assessment methods among children is essential to obtain high-quality data on food intake in this population.

Dietary methods commonly used to assess the diets of children include respondent-based methods, such as food diaries (FD) and food frequency questionnaires (FFQ) [3]. Specifically, FFQs are widely used in large-scale prospective studies investigating links between diet and disease in both adults and children [4,5,6,7], because they are easy and simple to administer, relatively inexpensive and have a low respondent burden. Even though FFQ are broadly used, there are limited valid and reliable brief FFQ for measuring whole-of-diet intake in young children [8].

Multiple 24-h recalls and FD are usually used for the evaluation of the validity of a FFQ [9, 10]. When available, biomarkers may be an alternative or supplementary reference method for the validation of some nutrients intakes since their measurement errors are independent of those of FFQ [11]. A combination of methods, self-reported methods and biomarkers (for specific nutrients), improve the process of validation of an FFQ, and has been used in several large prospective studies [12,13,14]. In a given population, one of the main components of the adaptation of an FFQ is the portion size assessment. Inappropriate portion sizes can lead to significant inaccuracy in dietary intake estimation. The classical method that uses a “standard” portion applied equally to all responders might reduce sensitivity if portion sizes vary within the population [15]. Researchers have tested other methods to handle missing portion size in FFQ, in both adults and adolescents, such as stochastic methods [16], or a combination of FFQ with FD or 24-h recall data [17,18,19]. New methods to calibrate FFQ portion size data are necessary to be tested in children.

To improve children’s diet it is required to accurately measure their current consumption. Moreover, the use of methods with low validity seriously attenuates the associations between nutritional intakes and outcomes in health [20]. Therefore, it is necessary to develop and validate age-specific instruments for the evaluation of usual food and nutrient consumption.

This study aimed to evaluate the performance of a short FFQ to assess diet of children at 4 and 7 years of age, against 3-non-consecutive-day FD and serum biomarker measurements, using two methods to convert the FFQ to nutrients:

  1. a.

    the standard method, classical method of multiplying the frequency response option by a standard mean portion, specified for each item.

  2. b.

    the z-score method, adjusted with data from 3d FD, at 4 and 7 years of age.

Subjects and methods

The present study was based on the population-based birth cohort Generation XXI (G21), previously described [21]. Briefly, newborns and respective mothers were recruited during 2005–2006 at the five level-III maternity units of metropolitan area of Porto, Portugal. Recruitment was conducted according to the following eligibility criteria: mothers living in one of the six municipalities of the metropolitan area of Porto; delivering at the public hospitals covering those municipalities; and giving birth to live babies with gestational age > 24 weeks At enrolment, these maternity units were responsible for 91.6% of the deliveries in the whole eligible population. Mothers were invited to participate 24–72 h after delivery, and of the invited mothers, 91% accepted to participate. A total of 8647 children and 8495 mothers were enrolled at baseline. Data were collected by trained interviewers using structured questionnaires that gathered information on sociodemographic, clinical and behavioral characteristics. Anthropometric measures were also performed. At 4 and 7 years of age, the entire cohort was invited to participate in the follow-up evaluation, and 86 and 81% of the children were reevaluated, respectively. Trained interviewers, in face-to-face interviews and using structured questionnaires, were responsible for data collection on demographic and social conditions, lifestyles (including dietary intake), child’s health status and objective anthropometric measures, at baseline and follow-up evaluations. Children’s body mass index (BMI) was classified according to age- and sex-specific BMI standard z-scores developed by WHO [22].

The study was approved by the University of Porto Medical School/S. João Hospital Centre Ethics Committee. The signed informed consent, according to the Helsinki Declaration, was required for all participants and was taken for legal representative of the children. The present analysis includes children that had data from the FFQ and 3d FD in each follow-up, achieving a sample of 2482 children at 4 years and 3511 at 7 years of age.

Food frequency questionnaires (FFQ)

At the 2 years follow-up evaluation from G21 cohort, a 17-item-FFQ was developed to evaluate the consumption of energy-dense foods, not so often consumed [23]. At 4 and 7 years the FFQs were developed with the aim of assessing habitual dietary intake of children and queried frequency of intake for 35 food items at 4 years, and 38 food items at 7 years. At 4 years the FFQ was based on the 2-year-FFQ and on the information collected from 2d FD at 2 years.

At 7 years, a few items were included in the FFQ, taking into consideration information reported in the 3d FD from 4-year-evaluation and difficulties in reporting children’s dietary intake in the 4-year-FFQ. The items included were milk with chocolate, breakfast cereals and fresh fruit juice; some alterations were also performed such as separation between salty pastry based on fish or meat, all meat was grouped together (red and white meat), as well as yogurts with or without sugar. We decided not to separate the yogurts due to its low consumption at 4 years and the difficulty of caregivers to distinguish the type of yogurts. As previously described [24], parents or another caregiver were asked how many times on average the child had consumed each food item in the previous 6 months. The nine frequency response options, ranging from “never” to “4 times or more per day”, were transformed into daily frequency of consumption. At both ages, a standard mean portion was defined for each food item (Supplemental Table 1) and similar food groups were created: “Dairy”, “Cereals, cereal products and potatoes”, “Fruit & Vegetables”, “Meat, fish and eggs”, “Drinks”, “Fat spread”, “Sweets”, and “Salty snacks”.

Three-day FD

As previously explained [25], when the child was 4 and 7 years of age, parents or another main caregiver were asked to complete a 3 non-consecutive days estimated FD, before the face-to-face interview. Oral and written instructions were given to parents for the correct use of FD and how to quantify food portions; they were also instructed to let the children follow their usual diet. Parents were asked to provide a detailed description of each food and drink consumed by the child, including the method of preparation, recipes and place of consumption, whenever possible. A team of trained nutritionists was responsible for reviewing and coding the FD, using an age-specific food coding manual previously developed by our research team. The proportion of reported days was similar across different days of the week and weekend, at both ages.

Nutrient conversion

Nutrient intake was estimated using the software Food Processor SQL (2004–2005 ESHA Research, Salem, Oregon), based on the Food Composition Table of the United States of America Department of Agriculture [26]. For typically Portuguese foods or culinary dishes, new codes were created with national nutritional information, as previously described [25, 27].

Biomarkers

A fasting blood sample was collected on the morning of the evaluations, at 4 and 7 years of age. Serum samples were stored in approximately 500 µ aliquots at −80 °C until analysis. Using a subsample of 160 children (50% from each follow-up evaluation), measurements of vitamin A and folate were conducted by S. João Hospital Center. Folate was measured by chemiluminescence using the immunoassay analyzer Architect i2000 SR (Abbott, USA) and vitamin A were measured by an HPLC Liquid Chromatography (Gilson, USA). UV detection was performed by a detector model 116 (Gibson, USA).

Statistical analysis

All statistical analyses were performed using the SPSS statistical software package version 22.0 (SPSS inc., Chicago IL., USA) and R 3.01. A significance level of 5% was adopted.

At both ages, to estimate daily consumption from the FFQ (as grams per day), two methods were performed and tested:

  1. a.

    Standard method: each frequency response option was multiplied by a standard mean portion (Supplemental Table 1) specified for each item;

    OR

  2. b.

    z-score method: adjustment of each food item, with the overall sample mean and standard deviation (SD) of that food item from FD, applying the formula:

$$\frac{{(y - \bar y)}}{{Sy}} = \frac{{(x - \bar x)}}{{Sx}}$$

\(\Leftrightarrow\)

$$y = \frac{{\left( {x - \bar x} \right)}}{{Sx}} \ast Sy + \bar y$$

Legend:

y = grams from food diaries, per each food item

x = frequency from FFQ, per each food item

\(\bar y = {{\mathrm{mean}\,{\mathrm{of}}\,{\mathrm{grams}}\,{\mathrm{from}}\,{\mathrm{food}}\,{\mathrm{diaries}},\,{\mathrm{per}}\,{\mathrm{each}}\,{\mathrm{food}}\,{\mathrm{item}}}}\)

\(\bar{x}\) = mean of frequency from FFQ, per each food item

Sy = standard deviation of grams from food diaries, per each food item

Sx = standard deviation of grams from FFQ, per each food item

Dietary intake overestimation from FFQ is widely described in the literature [12, 28,29,30] and theoretically is possible to calibrate for this bias. In the present study the FD data were used to calibrate the dietary information from the FFQ.

To evaluate absolute agreement at both ages, mean intake of dietary intake obtained from FFQ (standard and z-score method) was compared with those from the FD using intra-class correlation coefficients and respective 95% confidence interval [ICC (95%CI)]. In the z-score method, all negative values were transformed into zero. Guidelines for interpreting ICC statistics suggest that values between 0.81–1.00 indicate almost perfect agreement, 0.61–0.80 substantial agreement, 0.41–0.60 moderate agreement, 0.21–0.40 fair agreement, and values less than 0.21 indicate a poor or slight agreement [31].

At the nutrient level, several statistical analyses were performed. To evaluate the strength of association at the individual level, Pearson’s correlation coefficients were calculated to estimate the association between nutrient intake derived by the FFQ (standard and z-score method) and those obtained through FD, or serum biomarkers. Pearson’s correlation coefficients were de-attenuated using the following formula:

$${\mathrm{Cr}} = ro\surd 1 + \left( {{\it{\lambda x}}{\mathrm{/}}{\it{nx}}} \right)$$

In the formula, Cr is the corrected correlation coefficient, ro is the crude correlation coefficients, λx is the ratio of within-person variance and between-person variance for each nutrient, and nx is the number of reports per child.

To evaluate the presence, direction and extent of bias at the group level, Bland and Altman’s statistical method [32] was applied to nutrient data, including the mean difference, limits of agreement and correlation between mean and mean difference. We also calculated the paired t-test for the difference in all nutrients, to assess agreement at the group level. However, as we have a large sample, any difference was considered statistically significant (p < 0.05), so we decided to calculate the Cohen effect size [33] to understand how substantial the differences were. The effect size was calculated by dividing the mean change in nutrient data (between FFQ and food diaries) by the SD of the difference. Cohen classified effect sizes as small (d = 0.2), medium (d = 0.5), and large (d ≤ 0.8) Bland and Altman analysis [32] was generated for the difference between the mean obtained with 3d FD and FFQ, using the equations [mean of FD–mean of FFQ], against the average of the 2 methods ([mean of FD + mean of FFQ]/2). Based on the Bland and Altman methodology, two methods are considered comparable if 95% of data plots lay within the limits of agreement (mean difference ± 1.96 s.d. of the difference). Regarding total energy intake, Bland and Altman’s plots were also generated (for standard and z-score methods).

Results

Table 1 compares the individual and socio-demographic characteristics of the children with complete data on food intake through FD and FFQ (our sample), with those without FD information, at both ages. No significant differences were found for child’s sex and BMI. At both ages, our sample had mothers slightly older and higher educated. At 4 years, our sample had a higher mean daily intake of “Fruit & Vegetables” (p < 0.001) and a lower intake of “Drinks” (p = 0.007). At 7 years, the same trend was observed. Furthermore, at 7 years our sample also had a higher mean intake of “Meat, fish and eggs” (p = 0.006) and a lower mean intake of “Sweets” (p = 0.001) and “Salty snacks” (p < 0.001).

Table 1 Participants’ characteristics, comparing the sample of children with FFQ data plus food diaries (study sample) with children with only FFQ, at 4 and 7 years of age

At 4 years, comparing with FD, the standard method seemed to overestimate the food consumption more than the z-score method. Overall, the mean daily food intake obtained using the z-score method had a higher agreement with those from the FD, than the standard method (Table 2).

Table 2 Mean daily food intake and intraclass correlation coefficients (FFQ and food diaries) at 4 years of age

The lowest ICC obtained was 0.048 (95%CI: 0.002,0.094) for “Carbonated soft drinks (except colas)” in the standard method, while the same food item using the z-score method had an ICC of 0.139 (95%CI: 0.102,0.178). The highest ICC was obtained for “Vegetable soup” (ICC:0.536; 95%CI: 0.508,0.564), using the z-score method, compared to an ICC of 0.373 (95%CI: −0.013, 0.673), using the standard method (Table 2).

At 7 years, using the same methodology, the z-score method was still the best method to estimate the food consumption through the FFQ, comparing to the consumption obtained through the FD. Similar to the results obtained at 4 years, the standard method seemed to overestimate the consumption in comparison with the z-score method. The highest ICC obtained was also for “Vegetable soup” [(ICC = 0.539, 95%CI: 0.515, 0.562)], using the z-score method, comparing to an ICC of 0.430 (95%CI: 0.080, 0.637) using the standard method (Table 3).

Table 3 Mean daily food intake and intraclass correlation coefficients (FFQ and food diaries) at 7 years of age

Table 4 presents the de-attenuated correlation coefficients between the FFQ and FD, for daily energy and nutrient intake, at both ages. Significant correlation coefficients were observed for all nutrients and were similar using the standard or z-score method. However, the conversion of the FFQ using the z-score method presented averages of mean nutrient intake more similar to the FD, than the conversion using the standard method. At 4 years, and using the z-score method, the correlation coefficients ranged from 0.112 for vitamin B12 intake to r = 0.565 for total fat intake. The average of correlation coefficients was 0.39. At 7 years of age, and using the same methodology, the average of correlation coefficients was 0.42.

Table 4 Mean daily intakes of nutrients and de-attenuated Pearson’s correlation coefficients (FFQ and food diaries) at 4 and 7 years of age

For the subsample of 160 children (80 at 4 years and 80 at 7 years of age), the de-attenuated correlation coefficients between FFQ standard method and plasma concentration of vitamin A [r = 0.531 (p = 0.008), at 4 years and r = 0.282 (p = 0.120), at 7 years] and folate [r = 0.176 (p = 0.365), at 4 years and r = 0.425 (p = 0.027), at 7 years) were similar to the correlation coefficients between FFQ z-score method and plasma concentration of vitamin A [r = 0.552 (p = 0.005), at 4 years and r = 0.269 (p = 0.187), at 7 years) and folate [r = 0.183 (p = 0.345), at 4 years and r = 0.340 (p = 0.079), at 7 years]. The ICCs between plasma concentration and FFQ were similar for z-score and standard method, at both ages, although not statistically significant.

Table 5 presents ICC, mean differences, limits of agreement, correlation between mean and mean difference and Cohen effect size, for daily energy and nutrients obtained from FFQ z-score method and FD, at 4 and 7 years. The same information for the standard method is shown in Supplemental Table 2.

Table 5 Intraclass correlation coefficients (ICC), Bland Altman analysis and Cohen effect size between daily intakes of energy and nutrients from the food diaries and FFQ (z-score method) at 4 and 7 years of age

At 4 years, the ICC ranged from 0.036 for omega 6 to 0.350 for calcium, using the z-score method, and raging from 0.013 for vitamin E to 0.265 for sodium, using the standard method. At 7 years, the ICC ranged from 0.032 for iron to 0.328 for calcium, using the z-score method, and raging from 0.041 for iron to 0.313 for Vitamin B6, using the standard method. Overall the correlation between mean and mean difference, as well as the Cohen effect size, was lower using the z-score method (Table 5), in comparison with the standard method (Supplemental Table 2), at both ages. The limits of agreement were wide for most nutrients and ranged from positive to negative values. The Bland-Altman plot for energy intake at 4 years (Fig. 1a, b) and 7 years (Fig. 1c, d) indicated that around 95% of data plots fell within the limits of agreement, at both ages and using both methods. However, the graph suggested a lower concordance using the standard method (Fig. 1a, c), with a more marked trend of overestimation of energy intake.

Fig. 1
figure 1

Bland-Altman plot of the difference between energy intake, estimated by the FFQ standard method at 4 years (a) and 7 years (c) or z-score method at 4 years (b) and 7 Years (d) and 3d food diary

Discussion

The present results showed that the FFQ performed reasonably well in estimating intake of a number of food items and nutrient intake, at 4 and 7 years of age, using a z-score calibration.

Considering the definition of ICC values [31], the results showed that 17 out of 30 food items had “fair” to “moderate” agreement with FD using the z-score method, against only 12 food items using the standard method. At 7 years, using the same z-score equations, the results were similar, the z-score method had a higher agreement with the FD. Low agreement for food items such as “Meat”, “Sweets” and “Salty snacks” was observed. Due to the nature of our method of reference 3d FD collected over a 1 week period, it is expected that items eaten more often, such as “Fruits & Vegetables” or “Dairy products”, would be more correlated compared to food items ate less often, such as candy or fast food [34]. An sensitivity analysis was perfomed in random sample at 4 years. Using 70% of the sample we calculated the z-score equations and apllied it to the other 30% of the sample. The ICC obtained to major groups were similar to those obtained using the all sample.

The differences in the two methods (standard and z-score method) regarding the food intake translated into differences in nutrient intake. For most of the nutrients, the z-score method obtained a higher correlation with FD, than the standard method. Previous studies reported similar correlation coefficients between nutrient intake estimated by FFQ and FD or 24-h recall [12, 35]. In addition, at 4 years the correlation between FFQ and plasma concentration of vitamin A and folate was also higher for the z-score method in comparison to the standard method. At both ages, in the majority of nutrients the correlation between mean and mean difference, as well as the Cohen effect size, was much lower using the z-score method in comparison with the standard method. This shows a decrease in the proportional bias using the z-score approach. The limits of agreement were wide for most nutrients and ranged from positive to negative values, implying that both overestimation and underestimation occurred in children’s dietary intake estimation from FFQ, comparing to FD. Although the z-score method performed better, the limits of agreement for some nutrients fell outside the dietary reference intakes (DRI). For example, the estimated energy requirement at 4 years ranged from 1113 to 1629 kcal in girls, and from 1195 to 1763 kcal in boys [36]. The mean intake obtained from the z-score method was 1741 kcal (LOA: 870–2385 kcal) and from the standard method was 2470 kcal (LOA: 689–2567 kcal). Regarding the calcium intake (recommended dietary allowances: 1000 mg/day, 4–8 years [37]), the LOA (using the z-score method) ranged from 403 to 1764 mg/day at 4 years and from 299 to 1754 mg /day. Although the lower limit was below the estimated average requirement (800 mg/day [37]), the upper limit was below the tolerable upper intake level (2500 mg/day). On the other hand, some nutrients had the LOA within the recommendation. For example, the LOA of protein (DRI: 10–30%) ranged between 9 and 27% of total energy intake, at 4 years and between 10 and 28% at 7 years.

The validation of usual food consumption’s measurements is an essential part of large-scale epidemiological studies especially in prospective studies in which it is possible to relate food habits with health outcomes. Accordingly, the impact of measurement error on measures of association is of greatest relevance. In the present study, we observed that using a z-score calibration approach estimated food and nutrient mean intake more similar to the mean obtained through a reference method (i.e., FD), which support a reduction on the exposure measurement error.

Our classical method of converting a FFQ, using specific standard portions, showed an overall overestimation of dietary intake, comparing to the FD. This trend of the FFQ to overestimate the dietary intake was also described in previous studies, in both adults and children [28, 29, 38,39,40]. Such overestimation has been explained by the large number of foods asked in the FFQ, providing a broader selection of options as compared to other methods, or an inaccurate reporting of the frequency of consumption of commonly consumed foods. However, it is possible that the portion sizes assumed in the FFQ are incorrectly high, or that increasing frequency of consumption translates in decreasing the portion size. Young children seem to self-regulate their energy intake by adjusting their portion sizes depending on the number of eating occasions per day [41]. On the other hand, the caregiver could decrease the portion size with increasing frequency.

A previous study [16] among adults also handled missing portion size in the FFQ. They reported a bias when using median imputation and described advantages in using stochastic methods to substitute missing portion size values instead of using standard portions or medians. Although the amounts consumed by individuals are considered an important component in estimating food intake, it is still controversial as to whether or not to include portion size questions in the FFQ. As the frequency of consumption, comparing to portion sizes, has been found to be a greater contributor to the variance in intake of most foods, some researchers prefer to use FFQs without the additional respondent burden of reporting portion sizes [12].

Previous studies have shown that portion size estimation is difficult for the majority of people, varying with personal characteristics, such as appetite status, sex, age, and BMI [28, 42,43,44] or foods’ characteristics, such as energy density and number of food standard units (e.g., 2 apples) [29]. Furthermore, portion size may be intentionally misreported due to the social desirability effect [45]. A qualitative study among mothers of 6–7-year old and 10–11-year-old children in the UK, showed that mothers have difficulties perceiving what is the recommended age-appropriate serving sizes for their children [46]. These results raise questions regarding the quality of parents’ report on children’s portion data intake.

The few number of days included in the FD is one of the limitations of this study. We only use 3 days over a period of 1 week to represent a period of 6 months (FFQ). It is described that to capture the day-to-day variability of some nutrients it may be necessary to include more than the 3 days; it was predicted between 2 and 6 days to estimate nutrient intake with good accuracy (r= 0.8), and even more days to estimate food intake [47]. Although in the present study, it was not possible to have more than three report days from children, 2 weekdays and 1 weekend day, as that would increase the burden of the caregiver and could result in more losses to follow-up. Since the food collection was performed every season on both weekdays and weekends, the average of 3 days might be an accurate estimate of long-time usual intake for food groups frequently consumed. However, we could not exclude a less precise estimation for food groups not expected to be consumed daily. This may have contributed to the low agreement of ICC obtained, for example, for sweets and salty snacks.

As usual in studies using FD as the method of food assessment [3], more educated individuals are more prone to participate. In our sample, in both ages, statistical differences were not found for children’s sex or BMI, but mothers were slightly more educated and older. The socio-demographic characteristics of our sample seem to influence the consumption of particular food items, increasing the intake of fruit and vegetables and decreasing the intake of energy-dense foods. This might not influence the internal validity, since we expected to have this effect in both assessment methods and ages. Lastly, with the proposed method (z-score calibration), we obtained the same unconditional mean and variance as the 3d FD, estimating the population distribution of the 3d FD, and not the distribution of the true usual intake.

In conclusion, the short FFQ used to evaluate dietary intake among children performed reasonably well and seems to be a useful instrument for evaluating a wide range of food groups and key nutrient intake in children at 4 and 7 years of age. These results also support that adjusting the portion size when converting a FFQ, by using a z-score method, increase the accuracy of dietary data in young children.

Supplementary information is available at European Journal of Clinical Nutrition’s website.