Introduction

Assessment of food habits and nutrients and their associations with a specific disease outcome can be determined based on pre-existing evidence, that is, a priori methods. This is usually done by constructing scores and indices based on food guidelines and nutritional recommendations [1]. This method is useful to evaluate adherence and the magnitude of the effect of dietary recommendations on disease outcomes [2]. However, because it is only based on a prior selection of foods and nutrients, it does not consider and describe the overall dietary patterns of the population group under the study [2, 3]. Therefore, methods to explore the association between overall diet intake and disease outcomes through a systematic consideration of the correlations between components are increasingly used [4]. Such methods are referred to as a posteriori—a method based on collected data (data driven) in a specific group of population.

There are two main approaches to a posteriori methods [4]. In the first approach, the dietary variables are combined into fewer variables (or factors) based on their correlation, and the latent variables are virtually constructed to represent the original dietary variables [5]. Principal component analysis (PCA) and explanatory factor analysis (EFA) are examples of these approaches [4]. The second approach is cluster analysis, where unlike PCA and EFA approaches, non-overlapping clusters of individuals are constructed [6].

Another approach in dietary data analysis is a “hybrid” method of the a priori and a posteriori methods. In this approach, response variables that mediate dietary risks and outcomes are determined based on a “priori” knowledge. These variables can be biomarkers, nutrient intakes or an overall dietary quality that are known to have association with the outcome of interest [3]. These methods mathematically work by creating a linear combination of the predictors (food groups) and response variables [3, 7]. The two most common examples of these methods are partial least-squares (PLS) and reduced-rank regressions (RRR). The two methods are considered as alternatives for PCA [3].

Studies have reported different recommendations in terms of the utility of the methods [3, 8]. When investigating the association between dietary patterns and bone mass, most studies have used a posteriori methods, although hybrid methods are being increasingly used in recent years [9, 10]. However, the relative advantages and a thorough evaluation of the methods used to identify dietary patterns associated with bone mass have not been investigated. Thus, for the first time, we evaluated the three dietary analysis methods (PCA, PLS and RRR), in this study, to determine dietary patterns associated with bone mass among ageing Australian population.

Methods

Detailed methods are presented previously [11]; however, some of the important issues in this specific study are highlighted below.

Study design and population

The study population was selected from participants of the North West Adelaide Health Study (NWAHS), which is a community-based cohort study. Three major stages of data collection have been conducted between 1999 and 2003, 2004 and 2006 and 2008 and 2010. In the cohort, data were collected using self-complete questionnaire, Computer-Assisted Telephone Interview (CATI) and clinical assessments. Adults (both sexes and aged 18 years and above; n = 4056) from randomly selected households were recruited at the inception of the study [12]. The focus of this specific study is the bone mass data [bone mineral density (BMD) and bone mineral content (BMC)] collected at Stage 2 from those aged 50 years and over (2004–2006, n = 1,588). Data related to both BMD/C were provided in a total of 1182 adults (545 males, 45.9%) aged 50 years and above. Dietary data were collected at Stage 3 (2008–2010, n = 2500) (Fig. 1 ).

Fig. 1
figure 1

Sampling description of study participants with dietary intake and bone mass records, the North West Adelaide Health Study, South Australia

Diet and other covariates assessment

Dietary intake was assessed using the Dietary Questionnaire for Epidemiological Studies (DQES-V3.1) from Cancer Council of Victoria [13]. The questionnaire assesses intake of foods and beverages over the previous 12 months. Analyses of total daily intake of food items and nutrients were performed using the Australian NUTTAB 95 (Food Standards Australia New Zealand, Canberra, 1995) food composition database. For each study participant, the amount of food items consumed per day was calculated in grams and aggregated into 39 food groups [14].

At Stage 2 of the NWAHS, sex, age and family history of osteoporosis were assessed. Annual household income was determined and categorized as up to $20,000, $20,001–$40,000, $40,001–$60,000, and more than $60,000. Marital status was classified as married or living together with partner (in union), separated/divorced, widowed or never married. Alcohol risk was assessed using frequency and number of standard drinks [15]. Smoking was defined as non-smokers, ex-smokers and current smokers. Height and weight of the study participants were measured during the clinic assessment. Body mass index (BMI) was calculated and classified based on the World Health Organization (WHO) standard [16]. Participants with diabetes were identified by either self-report or laboratory diagnosis using blood samples collected at the clinic visit.

Assessment of leisure time physical activity levels (PAL) was undertaken using the Australian National Health Survey (NHS) questions [17], considering the number of times a person exercised in the last two weeks and the total amount of time spent walking or doing moderate or vigorous exercise at Stage 2. Job-related PAL was also assessed based on the type of profession. Detailed methods of both forms of PAL are published elsewhere [18]. Medication use (for hypertension, high cholesterol, mental health problems and asthma) and sun light exposure were also assessed at this stage.

At Stage 3, health literacy was assessed using the Newest Vital Sign test tool [19]. For 31 cases with missing values, we used data collected from the short Test of Functional Health Literacy in Adults (sTOFHLA) tool [20], which was also collected at this stage. Health literacy was classified as limited or adequate.

The number of missing values for each variable includes smoking status (n = 5 cases), alcohol risk (n = 39), diabetes (n = 5), family history of osteoporosis (n = 4), marital status (4), leisure time PAL (n = 128), job-related PAL (n = 129), total energy (n = 48) and health literacy (n = 31). We excluded all cases which had at least one missing value of these variables from the analysis (n = 388, 32.8%).

Prodigy and DPX + Dual Energy X-ray Absorptiometry (DXA) (GE Lunar, Madison, WI) was used to assess whole body BMD/C as part of the clinic visit at Stage 2. Details of the DXA measurement procedures can be found elsewhere [21]. BMD and BMC were reported as grams/cm2 and grams, respectively. T-scores for BMD were also reported for each study participant. Study participants who were osteopenic or osteoporotic (T-scores of less than −1) [22] were classified as having low BMD.

Response variables for PLS and RRR analyses

To identify potential response variables, we reviewed previously published studies and chose the dietary intake of four nutrients (protein, calcium, potassium and vitamin D). These nutrients have been strongly linked with bone mass [23,24,25,26,27]. Diet was also found to be a considerable source of vitamin D in the study population [mean intake = 3.5 mcg/day (140 international unit/day)]. We calculated the percentage of energy from total protein intake, calcium, potassium and vitamin D densities and used these values as response variables. The percentage of energy from total protein intake was calculated as follows: total energy intake from protein (kJ) divided by total energy intake, multiplied by 100. Calcium, potassium and vitamin D densities were expressed as absolute intakes of calcium (mg/day), potassium (mg/day) and vitamin D (ng/day), respectively, divided by total daily energy intake (kJ/day) [10].

Statistical analysis

Dietary analysis

To reflect the larger population dietary intake, factor scores and dietary patterns were calculated and constructed for 2453 study participants after excluding 47 participants who had a significant amount of missing dietary data. Data reduction techniques using PCA, PLS and RRR were used to identify dietary patterns out of 39 food groups. Using PCA, a similar number of factors (39 factors) to food groups were produced; however, we retained four factors, of which the first two were chosen based on scree plot, eigenvalues (>1) and interpretability. These two factors were used to investigate the association between dietary patterns and bone mass as only these gave meaningful interpretations of the dietary groups [11]. Varimax rotation was applied, and sample adequacy was checked using the Kaiser–Mayer–Olkin (KMO) test. Linear regression analysis of the factor scores and response variables described above (percentage of energy from total protein intake, calcium, potassium and vitamin D densities) was used to obtain the variance of the response variables explained by the two factors of PCA. An explained variance measures the proportion of variation of a dietary pattern that can be attributable to the food groups or response variables (in this case, nutrients).

The PROC PLS statement in SAS (SAS Institute Inc., Cary, NC, USA) was used to conduct both PLS and RRR analysis, defining each in turn in the “method=” [3]. In the analysis, we used a dietary data file containing the 39 food groups coded as fg1, fg2…fg39 and the four response variables. Four factors were specified and retained in each method.

Different algorisms are applied to construct the scores in each of the three methods. For each method, we calculated the continuous factor scores [the linear functions of food groups (predictors)] and response scores were used in the subsequent statistical analyses and interpretations.

In PCA, the factors explain as much variation as possible of the food groups [7]. Unlike PCA, RRR uses a covariance matrix of responses and predictors (food groups) in calculating the scores. PLS combines the two methods and produces scores considering both the predictor (food group) and response matrixes simultaneously [3]. In this case, the explained variance of both the response variables and food groups is expected to be between the other two methods. Tertiles [T1 (lowest intake), T2 and T3 (highest intake)] of each of the factor scores were constructed. Factor loadings of each food group on the factors were also calculated. Factor loadings are the correlation between the factors and food groups. The proportion of factor-specific and all factor variances across all three methods that explain the response variables and food groups was also determined. Correlations (response scores) between the factors of each method and the response variables were computed. Pearson correlation coefficients for the response variables were also calculated.

Descriptive analysis and modelling

Mean and standard deviation (for continuous normally distributed variables), median and interquartile ranges (for continuous non-normally distributed variables) and proportions (for categorical variables) were calculated. The tertiles of factor scores produced by PCA, PLS, and RRR analyses were used to assess the association of dietary patterns with bone mass. We applied linear regression models to evaluate the associations between tertiles of each factor scores, and BMD and BMC. The initial models (model 1) were adjusted for sex and age. Model 2 was additionally adjusted for socio-economic and lifestyle factors (income, marital status, smoking, alcohol intake, health literacy, leisure time and job-related PAL), chronic conditions (diabetes mellitus, family history of osteoporosis and BMI) and height (BMC). The last models (model 3) were additionally adjusted for total energy intake to assess the potential confounding effect of energy intake in the associations. To compare the relative quality of the models, Akaike’s information criterion (AIC) was determined for each model.

Trend of associations across tertiles of each factor was assessed by entering the tertiles of factor scores as continuous variables in the models. Additional adjustments for medications, season of DXA measurement, sunlight exposure and dietary supplements did not materially affect estimates and were not retained in the final models. PLS and RRR analyses were performed using SAS version 9.4 (SAS Institute Inc., Cary, North Carolina). All other statistical analyses were conducted with Stata/SE version 14.1 (StataCorp, College Station, TX, USA).

Comparison of methods

Previous studies have used different approaches to evaluate and compare dietary assessment methods [3, 8]. In this study, we compared PCA, PLS and RRR methods mainly based on the relative loading of food groups within each dietary pattern and its association with bone mass [28]. We additionally evaluated the methods based on the magnitude of variances of each method that explained the response variables and food groups.

Ethical considerations

Ethics approval for the NWAHS was provided by The Queen Elizabeth Hospital, Ethics of Human Research Committee. Participants provided written informed consent.

Results

A total of 1182 (45.9%, males) study participants provided dietary and BMD data. In the multivariable analysis, we excluded those who had missing data from covariates, leaving a total of 794 (67.2%) of participants. The median age of the participants was 62 years (interquartile range = 56.0, 69.0). One-fifth (19.2%) of the participants reported a family history of osteoporosis (Table 1 ).

Table 1 Participants’ characteristics of adults aged 50 years and above, the North West Adelaide Health Study, South Australia (n = 1182)

Dietary patterns

Food groups are provided in the supplementary material. Factor loadings (standardized correlations of the food groups with the dietary patterns) derived by PCA, PLS and RRR methods are shown in Fig. 2. The first factor (dairy, vegetables and fruit pattern—“prudent” pattern) was similar across the three methods and was characterized primarily by high intakes of medium fat dairy, vegetables and fish and, low intake of soft drinks, processed meat and take away foods. Factor 2 of each of the three methods was also similar in terms of the constituents of the food groups. Factor 2 (“western” pattern) of the PCA method was characterized by high intake of processed meat, take away foods, white bread, red meat and soft drinks and low intake of dairy products and nuts. Factor 2 (“western” pattern) of the other two methods was characterized by high consumption of animal foods (poultry, eggs, red and processed meat, fish and high fat dairy) and low intakes of medium fat dairy, fruit and nuts.

Fig. 2
figure 2

Factor loadings of food groups in dietary patterns identified using principal component analysis, partial least-squares and reduced-rank regressions, the North West Adelaide Health Study, South Australia (n = 2453) (The colour gradation denotes the strength and direction of the correlation between the food groups and the dietary patterns. Deep green colour represents a relatively higher correlation (a higher intake) of the food groups with the corresponding dietary patterns. Deep red represents relatively a lower correlation (a lower intake) of the food groups with the corresponding patterns.)

Factor 3 (dairy pattern) was generally characterised by high intake of dairy products; however, a slight difference in food groups was identified using PLS and RRR. Factor 4 (vegetables and fruit pattern) was primarily characterized by low intake of dairy products and high consumption of vegetables (Fig. 2). The intake of major foods and nutrients across tertiles of dietary patterns is provided in the Supplementary material.

Explained variations in response variables and food groups

The two factors of PCA explained 37.1% of the response variable variation (proportion of energy from protein, calcium, potassium and vitamin D densities). Both PLS (75.5%) and RRR (70.6%) explained a larger amount of variation in the response variables. In PLS and RRR, the largest explained variations of responses were observed in vitamin D (65.2%) and calcium (80.0%) densities, respectively. Potassium density was the most explained response in the other two methods (22.7% in PCA and 43.4% in PLS) in the “prudent” pattern; however, calcium density was the most explained (56.5%) response in RRR (Table 2).

Table 2 Explained variation in responses and food groups  by dietary patterns identified using principal component analysis, partial least-squares and reduced-rank regressions, the North West Adelaide Health Study, South Australia

Using PLS, 21.1% of variation in predictors (food groups) was found, compared to 16.7% of PCA and 14.0% of RRR. Whereas factor 1 explained 10.3% of variation of predictors in PCA, only 3.4 and 7.3% variations were explained by this factor for RRR and PLS, respectively (Table 2 ).

The correlation (response scores) between factors and response variables estimated using PCA, PLS and RRR methods are depicted in Fig. 3. Factor 1 of the PCA was positively correlated with protein energy, calcium and potassium densities. Using PLS, the proportion of energy from protein was positively correlated with all factors. RRR analysis estimated a positive correlation between calcium density and factors 1 and 3. We also found that the correlation among proportion of energy from protein, calcium and potassium densities was positive and high (Supplementary material).

Fig. 3
figure 3

Correlation (response scores) between factors and response variables obtained from principal component analysis, partial least-squares and reduced-rank regressions, the North West Adelaide Health Study, South Australia

Dietary patterns and bone mass

Table 3 provides the different associations of factors identified by PCA, PLS and RRR with BMD and BMC. In the more fully adjusted models, none of the factors of PCA was significantly associated with bone mass; and more dietary factors determined by RRR compared to PLS were found to be associated with BMD and BMC. However, in all methods, the coefficients increased across tertiles of the “prudent” and dairy patterns, and decreased across the tertiles of vegetables and fruit pattern. Participants in T3 of vegetables and fruit pattern determined by PLS had a 17.3 mg/cm2 (β = −17.29; 95% CI −34.0, −0.58) decrease in BMD compared to those in T1 of model 3. No significant association of this dietary pattern determined by PLS with BMC was observed.

Table 3 Coefficients (β) and 95% confidence intervals for bone mineral density and content and tertiles of factor scores derived using principal component analysis, partial least-squares and reduced-rank regressions, the North West Adelaide Health Study, South Australia (n = 794)

In model 3, the “prudent” and dairy patterns of RRR were significantly and positively associated with BMD and BMC. Participants in T2 and T3 of “prudent” pattern had a 21.4 mg/cm2 (β = 21.36; 95% CI 5.70, 37.02) and 27.0 mg/cm2 (β = 26.99; 95% CI 10.94, 43.04) increased BMD than those in T1, respectively. Those in T3 of dairy pattern had a 24.6 mg/cm2 (β = 24.58; 95% CI 8.44, 40.72) higher BMD than those in T1. Compared to those in T1 of “prudent” and dairy patterns, a 69.7 g (β = 69.65; 95% CI 16.67, 122.63) and a 55.5 g (β = 55.49; 95% CI 2.26, 108.73) increase in BMC was found among participants in T3, respectively. Vegetables and fruit pattern was negatively and significantly associated with BMC. Participants in T2 of vegetables and fruit pattern had a 52.8 g (β = −52.79; 95% CI −104.10, −1.47) decrease in BMC compared to those in T1. The AIC was comparable across the corresponding dietary patterns of each of the dietary analysis methods (Table 3).

Discussion

We identified and compared dietary patterns (PCA = 2; PLS = 4; RRR = 4 patterns) using three analysis methods. The first pattern (“prudent” pattern) of all methods was characterized by high intake of dairy products, vegetables and fruit. The second pattern (“western” pattern) was characterized by high intake of fish, poultry, high fat dairy, processed and red meat and low intake of medium fat dairy, vegetables and fruit. In assessing the association between factors and bone mass, RRR identified more (plausible) factors which were significantly associated with bone mass than the other two methods.

Whereas the “prudent” pattern of RRR was significantly and positively associated with bone mass, the one computed by PCA and PLS was not. This dietary pattern was characterized by high intake of vegetables, fruit and dairy products. In numerous studies, intake of these food groups has been linked with a decreased risk of reduced bone mass [27, 29,30,31]. However, despite the similarity in contents of the food groups, only the “prudent” pattern determined by RRR was significantly and positively associated with bone mass. In line with this finding, an absence of association between Mediterranean dietary pattern derived by PCA and indices of bone mass was reported [32]. Furthermore, in the RRR analysis, the correlation of factor scores of "prudent" pattern with calcium desnity—which has an indispensable role as a component of bone mass—was the highest (0.71) compared to the other two methods (PCA = 0.19 and PLS = 0.50). As there was a low correlation between the “prudent” pattern with protein in the PCA and PLS, this may also be an explanation for the absence of significant positive association, as evidence suggests that the role of calcium on bone mass is enhanced when there is an adequate intake of protein and vice versa [33]. In addition, RRR extracts dietary patterns that combine eating behaviours and the pathway to the outcome (through the response variables) taking into account the physiological importance.

Our findings show that the dairy pattern of RRR was positively associated with bone mass. However, there was a non-significant positive association across the tertiles of the dairy pattern and bone mass with PLS. This could be due to the following reasons. First, a careful observation of the factor loadings of vegetables and fruit showed that the intake of vegetables and fruit in the RRR analysis was not as low as those in PLS. Second, we also found an inverse correlation between potassium and vitamin D densities, and dairy pattern of PLS. With regard to this, evidence has shown a significant positive role of vegetables and fruit [30] as well as potassium [34] and vitamin D on bone mass. Third, despite these two methods use existing knowledge of the association between nutrients and diseases, the fact that RRR mainly focuses on explaining variation in the response nutrients [3] rather than the food groups can partly explain why dairy pattern of RRR analysis is significantly associated with bone mass.

Dairy products are the most important food groups which assist in the prevention of osteoporosis [29]. In line with this, our finding also supports the importance of dairy products in building bone mass. The vegetables and fruit pattern of RRR, which is characterized by low consumption of dairy products and high consumption of vegetables and fruit, was negatively and significantly associated with bone mass, highlighting the imperative role of dairy on bone mass. In our previous study, we have also highlighted the importance of dairy products as part of “prudent” dietary pattern [11].

Information obtained by PCA can give clearer understanding of dietary patterns within a specific population which helps in the formulation of tailored nutrition interventions [35, 36]. However, PCA does not necessarily explain the variation and amount of nutrient intake in the identified patterns, rather it explains the cultural and behavioural aspects of food [37]. The effects of diet could be also mediated through specific nutrients which cannot be captured by this method [28] and could create difficulty in providing a plausible interpretation of findings. In line with this, our results showed that although PCA explains the highest variation in food groups (considering all four factors), no factor was significantly associated with bone mass in the most adjusted models. This supports the view that PCA is unlikely to identify dietary patterns associated with bone mass. The selection of the dietary patterns in PCA is subjective, although aided by methods such as eigenvalues and scree plots. However, these subjective decisions could introduce a bias in identifying the optimal number of dietary patterns. Without due consideration of selecting the optimal number of factors, investigators could also miss disease-related dietary patterns. Thus, it is important to note that critical evaluation is required when selecting the number of patterns using this method.

PLS, a method mathematically thought to be between PCA and RRR, is an alternative method for deriving dietary patterns. In this method, the covariance matrixes of both response (nutrients) and predictors (food groups) are explained in the latent variables [3]. In the current study, none of the factors identified by PLS was significantly associated with BMD and BMC. Although no study has evaluated dietary analysis methods in association with bone mass, some studies have used these types of analyses for different outcomes. For instance, DiBello et al. claimed that PCA and PLS were found to be more appropriate in identifying dietary patterns associated with cardiovascular diseases [8]. However, it may be that the differences in the findings of our study and this study could be impacted by the disease outcome used and the types of response variables.

In the current study, we found more dietary patterns associated with bone mass using RRR which are plausible in the context of existing evidence. In line with our findings, a study by Hoffmann et al. compared PCA, PLS and RRR in identifying dietary patterns associated with diabetes and concluded that RRR is the most appropriate method in extracting more dietary patterns that are significantly associated with diabetes [3]. RRR is also the most commonly used hybrid method in nutritional epidemiology [9, 10]. The method is better to explain the dietary patterns in the responses [8] and dietary patterns can be evaluated based on the response variables for their plausibility in their association with disease outcomes. Although most of the previous studies used a posteriori methods [38, 39], in recent years, RRR is being increasingly used in identifying plausible dietary patterns associated with bone mass [9, 10, 27, 40].

Some limitations should be acknowledged when interpreting the findings. First, dietary information was collected between 2008 and 2010 while bone mass was determined between 2004 and 2006 with a 4.3-year median difference (minimum = 2.8 and maximum = 6.1 years). Although habits of elderly people in relation to the choice of the food groups have been found to be stable over years [41], eating behaviours of the study participants, particularly change of behaviours towards a healthy pattern among participants diagnosed with chronic diseases, could exist. In addition, since study participants were told the result of DXA measurements, those who knew they had low BMD could also change their behaviour towards a favourable diet. Thus, the association between dietary pattern and bone mass in our study may be underestimated. To investigate the effect, we did a sensitivity analysis by dividing study participants into two groups based on the median gap of time (i.e. early and late measures of dietary data after bone mass measurement). The estimates of associations for the early measures were either consistent or stronger compared with the whole sample. On the other hand, estimates of participants with late measures were attenuated, further highlighting the underestimated associations between dietary patterns and bone mass.

Although food frequency questionnaires (FFQs) have limitations in providing valid dietary information, they are commonly used to measure usual dietary habits [42]. In this regard, measurement error for every diet component will tend to underestimate the effects in the statistical analysis [43]. However, in the presence of correlation between dietary variables, the direction of bias associated with measurement error is unknown [44, 45]. Furthermore, in ranking intake levels of dietary components, FFQ is relatively robust [37]. Recall bias is also another potential limitation associated with FFQ.

In conclusion, although PCA, PLS and RRR are similar in terms of their mathematical foundations (use of covariance matrix to reduce dimensionality) and extraction of factors that are not correlated, studies have reported different recommendations regarding their utility. In this particular study, RRR was found to be more appropriate in identifying dietary patterns that are associated with bone mass than the other two methods. Nevertheless, the advantage of RRR over the other two methods (PCA and PLS) should be confirmed in future studies in different settings, population groups, response variables and disease outcomes.