Introduction

The ramifications of osteoporotic bone fractures on public health are well-recognized [1]. Postmenopausal osteoporosis is a primary risk factor and an important predictor of bone fractures [2]. There are defined and well-known risk factors for postmenopausal osteoporosis such as age, sex, white race, family history of osteoporotic fractures, smoking, diabetes mellitus, low BMI, and reduced calcium consumption [1]. In contrast, the effect of other factors on the development of osteoporosis is not clear-cut. One of those factors is primary adult lactase deficiency (PLD).

Milk is one of the main food sources of energy, protein, carbohydrates, vitamins, and calcium. The milk of ruminants contains 5% lactose, a disaccharide that consists of glucose and galactose. The absorption of lactose in the intestines requires monosaccharide hydrolysis by the enzyme lactase [3]. Lactase secretion reaches its peak following birth. Over the first month of life, the level of lactase activity begins to drop. In most people, the lactase level becomes very low soon after weaning, a phenomenon called PLD, which is one of the most common health problems in the adult population. Two thirds of the world’s population suffer from this problem [4]. The prevalence of PLD varies widely among different populations of the world at 99% in China, 70% in some regions of Italy, 20% in the USA, and less than 10% in Scandinavia and Holland [5, 6].

These differences in prevalence rates have been explained by historical evidence showing that the descendants of European nations, who survived by domesticating cattle, maintained their capacity to digest milk up to adulthood [6]. An autosomal dominant gene is responsible for lactase persistence throughout life [7].

There are several methods to diagnose lactose intolerance. The standard reference for evaluating lactose intolerance is the assessment of lactase activity in mucosal biopsies. This method has several limitations including invasiveness and the heterogenic expression of lactase in the mucosa of the gastrointestinal tract [6].

Genetic testing could be beneficial in the detection of lactase deficiency since homozygosity (CC) for the single nucleotide polymorphism (SNPS) C/T−13,910 is responsible for 86–98% of non-lactase persistence in most regions of the world. However, genetic testing is not practical for widespread testing. Another test is the lactose tolerance test, which is based on measurements of blood glucose levels after a lactose challenge and is affected by swings in blood glucose levels [6]. The H-2 breath test, which is considered the test of choice today, also has its limitations. False positive results can result from intestinal bacterial overgrowth and rapid GI transit and false negative tests can result from bacterial non-producers [5, 6]. There are reports of a good correlation between genetic diagnosis and the breath test [8, 9].

In clinical practice, PLD can be asymptomatic. When lack of absorption of lactose causes GI symptoms, the condition is called lactose intolerance (LI) [10]. There are varying reports on the relative percentage of patients with PLD who suffer from LI, with some finding a relatively good correlation between diagnostic tests for PLD and symptoms [11] while others found a poor correlation [12, 13]. Furthermore, many patients who report LI did not have lactose malabsorption using various diagnostic tests [14]. Many studies have shown that self-reported LI and not lactose malabsorption is associated with reduced consumptions of dairy products [15,16,17]. However, one study showed that in people with self-reported LI, daily consumption of calcium was not affected because they were aware of their situation and took calcium supplements [17]. In addition, most individuals with PLD are capable of consuming 250 mg of milk without GI symptoms [10]. A study that compared absorption of calcium for milk and yogurt in individuals with and without lactase deficiency showed good and identical absorption of calcium in both groups for both sources of calcium [18].

This literature reviews points to complex and even contradictory information on the association between PLD, clinical symptoms, and consumption of dairy products and calcium. The latter are important nutritional factors for the prevention of osteoporosis [19]. In addition, relatively low BMI scores have been reported in individuals who suffer from PLD [20], a well-known risk factor for osteoporosis. There may be additional mechanisms by which PLD affects the development of postmenopausal osteoporosis.

Studies conducted to date have produced contradictory results on associations between lactase deficiency, bone mineral density (BMD), and osteoporosis. Some studies found a significant association [21, 22], while others found no association at all [23, 24]. There were methodological differences among these studies. They were conducted in countries with different prevalence rates for lactase deficiency and included mixed groups of men and women and individuals in a broad range of ages. The diagnostic methods of lactase deficiency included self-report, H-2 breath tests, tolerance tests, and genetic testing. The methods used to measure BMD and diagnose osteoporosis were also varied: x-rays, histomorphometric studies, biochemistry tests, a history of bone fractures and osteoporosis, and dual energy X-ray absorptiometry (DXA). To our knowledge, no review has been published that conducted a systematic analysis of the association between PLD, BMD, and osteoporosis in postmenopausal women. That was the aim of the present study.

Methods

Data sources and searches

We searched PubMed, Scopus, and the Web of Science electronic databases over the month of July 2017 to identify studies that addressed associations between PLD and bone density/osteoporosis in postmenopausal women. We did not limit the date of publication. Two investigators conducted the search for relevant studies. Additional studies were identified by reviewing the bibliographies of the full-text papers that were included in the systematic review. No search software was used.

The search was conducted using the following key words: lactase deficiency and osteoporosis, lactose intolerance and osteoporosis, lactose intolerance and bone mineral density, and lactase deficiency and bone mineral density.

To ensure that the studies introduced into the review were consistent and reliable, we decided to focus on studies that used the following diagnostic methods:

  1. 1.

    Methods for the diagnosis of PLD. We used hydrogen breath testing or genetic testing since these tests are very reliable and studies have shown that the correlation between them is good [4, 7, 8].

  2. 2.

    Methods to measure BMD and diagnose osteoporosis. To ensure comparability and generalized reliability among the studies in the review, we decided to select studies that used new and reliable testing methods for measuring BMD including dual energy X-ray absorptiometry (DXA), single energy X-ray absorptiometry, quantitative computerized tomography (QCT), quantitative US (QUS), and digital X-ray radiographics [25].

Study selection

The search for suitable studies was conducted by two investigators (YTG and RP) in two phases. In the first phase, all the abstracts were evaluated in relation to the a priori inclusion and exclusion criteria:

  1. 1.

    The inclusion criteria were original research that assessed associations between PLD, BMD, and/or osteoporosis with a study population of postmenopausal women.

  2. 2.

    The exclusion criteria were studies that were not original research, e.g., reviews or case reports, and/or included a study population that was not comprised exclusively of postmenopausal women.

All the abstracts were assessed independently by the two investigators and were either included into or excluded from the study. In cases of disagreement, the abstract was discussed until a joint decision was reached.

In the second phase, the investigators read the full texts of the selected abstracts chosen in the first phase and reviewed the bibliographies to identify relevant papers. In the event that a paper could not be obtained, the investigators contacted the authors by email and requested the full article.

The following criteria were used to include papers into the review:

  1. 1.

    It included data on associations between PLD, BMD, and/or osteoporosis,

  2. 2.

    The diagnosis of PLD was made by genetic testing or H-2 breath tests,

  3. 3.

    The measurements of BMD and/or the diagnosis of osteoporosis were based on the tests detailed above in the “Methods” section on the diagnosis of osteoporosis.

The following criteria were used to exclude papers from the review:

  1. 1.

    The diagnosis of PLD was based on self-report or a lactose tolerance tests,

  2. 2.

    The diagnosis of osteoporosis was based on less reliable tests or self-report without a record of diagnostic testing, plain x-ray, histomorphometric studies, or a history of bone fracture.

Each investigator (YTG and RP) conducted a comprehensive, independent review of all the papers. In cases of disagreement, the paper was discussed until a joint decision was reached.

Data extraction

The two investigators independently extracted data relevant to the study purpose. All discrepancies were agreed upon by discussion. The data recorded included the authors and the year of publication, the age and origin of the participating patients, the sample size that was calculated for the study population, the number of absorbers and non-absorbers, the method of diagnosis for lactase deficiency, the site of measurement of BMD, and/or the results of the measurements as median (range) and mean (± standard deviation) for absorbers and non-absorbers. Only homozygosity for CC defined PLD and only these participants were classified as non-absorbers. Participants who were homozygotes for TT and heterozygotes for TC were classified as absorbers.

Quality assessment

Observational studies with cross-sectional and cohort designs are the primary study designs that facilitate the evaluation of associations between diseases and exposure. All the papers that were identified in the search and included in the study were of this type. Randomized controlled studies which hold a preeminent position in the hierarchy of evidence-based medicine are not applicable for this purpose [26]. We used an adapted Newcastle-Ottawa scale to assess the risk of bias in the included studies [27]. The Newcastle-Ottawa scale is a simple, convenient tool for the quality assessment of non-randomized studies. It uses a star system and is based on three domains: selection, comparability, and outcome [28]. The Newcastle-Ottawa scale, as adapted for case-controlled studies, contains five criteria. There are three criteria for selection bias assessment with a maximum one star for each, one criterion for comparability assessment with a maximum of two stars and one criterion for outcome bias assessment with a maximum of one star. The risk of bias assessment was carried out by the two investigators in a blinded process and, in cases of disagreement, a consensus process was used.

The study characteristics of the full-text articles included in the review were described to gain insight into the homogeneity of the study populations. The decision to include only studies that used breath tests and genetic tests for PLD diagnosis and reliable tests for the diagnosis of BMD/osteoporosis reduced the degree of heterogeneity between the studies significantly.

Data synthesis and analysis

Meta-analysis was performed to assess differences between absorbers and non-absorbers using the random effects model. In some of the studies, BMD was expressed as g/cm2, in others as a Z-score and in some of them, both. Mean differences (MD) with 95% confidence intervals (CIs) were calculated for BMD in both the g/cm2 and Z-score expressions. Additionally, the mean differences for each site of bone density measurement were calculated. The analysis was performed with the STATA software version 12.1. Heterogeneity across the studies was assessed using the I2 measure to describe the percentage of the variability of the effect due to heterogeneity. In studies where the results were presented as a median (range), they were converted to mean (±SD) by the method described by Hozo et al. [29]. In studies where absorbers were by both TT and TC genotypes, the subgroups were combined and compared to non-absorbers (CC genotype).

Role of the funding source

No funding.

Results

Selection of studies included in the review

Eight hundred fourteen studies were identified in the databases using the key search words. Of these, 598 were duplicates. Two hundred sixteen abstract and paper titles were evaluated in the first phase of the review. One hundred ninety were disqualified since it was clear from the abstract or the paper title that they did not meet the study inclusion criteria. Twenty-six papers were evaluated in the second phase of the review. All of them were in English. We did not succeed in obtaining three of the papers through the electronic databases, so we asked the authors, by email, to send a full copy, but only one of them complied with this request. A review of the bibliographies of the full papers that were included in the second phase of the review did not produce any other studies that fulfilled the study inclusion and exclusion criteria. Thus, at the end of the screening process, five papers were entered into the systematic review and meta-analysis. The process of paper selection is described in the flow diagram in Fig. 1.

Fig. 1
figure 1

Flowchart of review process

Study characteristics

The characteristics of the studies that were selected for meta-analysis are shown in Table 1. All were case-control studies published between 1995 and 2009. The total number of participants was 2223, of who 765 were diagnosed with PLD. All the women were postmenopausal, and their mean age ranged from 57 to 70 years in the different studies. The mean age was presented separately for absorbers and non-absorbers in three studies and there were no statistically significant differences between the groups [30, 33, 34]. The studies included women from Finland, Austria, Italy, and Spain. All the studies except one [31] used genetic testing to diagnose PLD. All the studies used DXA to measure BMD/osteoporosis. Two papers [30, 34] presented their data on BMD in both g/cm2 and Z-scores. Two papers presented only Z-scores [31, 33] and one presented it only as g/cm2 [32]. Two studies assessed BMD only at one skeletal site [30, 31], while the other three used multiple sites [32,33,34]. One study did the measurement in the heel [30], four in the lumbar spine [31,32,33,34], three in various sites in the femur and in the femoral neck [32,33,34], and one in the radius [33].

Table 1 Studies included in the meta-analysis on PLD and bone mineral density in postmenopausal women

Outcomes

Differences in bone mineral density in g/cm2 between lactose absorbers and lactose non-absorbers

A forest plot of the meta-analysis is shown in Fig. 2. Two studies compared BMD in the femoral neck between absorbers and non-absorbers [32, 34] and found no significant difference between them (MD [95% CI] = 0.10 [− 0.14, 0.35], P = 0.412), with no significant heterogeneity between the studies (I2 53.1%, P = 0.144). The study in which BMD was measured in the heel showed a statistically significant higher level in non-absorbers compared to absorbers (MD [95%CI] = 0.94 [− 1.19, − 0.69], P = 0.000). Two studies compared BMD (g/cm2) in the lumbar spine [32, 34]. There were no statistically significant differences between the two groups (MD [95% CI] = 0.05 [− 0.11, 0.21], P = 0.551) without significant heterogeneity between the studies (I2 24.6%, P = 0.249). In one of the studies, in which BMD was compared in the total hip between absorbers and non-absorbers, there was a significant difference between the groups (MD [95% CI] = 0.36 [0.07, 0.65], P = 0.014) [34]. However, the same study did not find any difference in BMD (g/cm2) in Ward’s triangle (MD [95% CI] = 0.25 [− 0.03, 0.54], P = 0.084) [34].

Fig. 2
figure 2

Forest plot of meta-analysis of difference in bone mineral density (g/cm2) between lactose absorbers and non-absorbers

In general, in all the skeletal sites that were tested in the studies, the difference in BMD (g/cm2) between absorbers and non-absorbers was not significant (MD [95% CI] = 0.01 [− 0.28, 0.30], P = 0.935) with significant heterogeneity between the studies (I2 92.1%, P = 0.000).

Differences in bone mineral density in Z-score between lactose absorbers and lactose non-absorbers

A forest plot of the meta-analysis is shown in Fig. 3. The results of one study showed significantly higher Z-score in the femoral head for absorbers compared with non-absorbers (MD [95% CI] = 0.23 [0.06, 0.39], P = 0.006) [33]. Another study using the femoral neck found a significantly higher Z-score in absorbers compared to non-absorbers (MD [95% CI] = 0.35 [0.06, 0.63], P = 0.019) [34]. The results of the one study that calculated the Z-score in the heel did not find and statistically significant differences between absorbers and non-absorbers (MD [95% CI] = −0.07 [− 0.31, 0.17], P = 0.550) [30]. Three studies compared the Z-score in the lumbar spine between absorbers and non-absorbers [31, 33, 34]. The meta-analysis showed that the Z-score in absorbers was significantly higher in non-absorbers (MD [95% CI] = 0.18 [0.04, 0.31], P = 0.012, without significant heterogeneity between the studies (I2 0.0%, P = 0.729). The study that used the radius found a significantly higher Z-score in absorbers (MD [95%CI] = 0.23 [0.006, 0.39], P = 0.007) [33].

Fig. 3
figure 3

Forest plot of meta-analysis of difference (Z-score) between lactose absorbers and non-absorbers

A meta-analysis of the two studies that compared total hip Z-scores did not show any statistically significant differences between absorbers and non-absorbers (MD [95% CI] = 0.22 [− 0.01, 0.45], P = 0.059) without significant heterogeneity between the studies (I2 51.5%, P = 0.151) [33, 34]. A meta-analysis of the two studies that compared Z-scores in Ward’s triangle showed a significantly higher Z-score among absorbers (MD [95% CI] = 0.27 [0.13, 0.42], P = 0.000) without significant heterogeneity between the studies (I2 0.0%, P = 0.549) [33, 34].

In an analysis of all the studies, the Z-score in absorbers was significantly higher than in non-absorbers (MD [95% CI] = 0.20 [0.14, 0.27], P = 0.000) without significant heterogeneity among the studies (I2 3.7%, P = 0.408).

Risk of study bias

The results of the adapted Newcastle-Ottawa scale for the risk of bias assessment in the studies are shown in Table 1. Most of the studies had a high score representing a low risk of bias, whereas one study had an intermediate risk for bias [29].

Discussion

The present study is the first meta-analysis to compare BMD in postmenopausal women with and without PLD. The clinical significance of osteoporosis lies in the increased risk for bone fractures. There are other, non-skeletal, factors that increase the risk for fractures, but the measurement of BMD is a recognized quantitative tool for assessing the risk of future bone fractures and is a baseline value for monitoring treated and untreated patients.

Various methods have been used for the diagnosis of osteoporosis, but DXA is the most prevalent method in use today. DXA is the gold standard for the diagnosis of osteoporosis according to the WHO and is also a method of choice for the measurement of BMD in clinical trials and observational studies [25]. There are a lot of validation studies for this technique [25]. The test is easy to conduct and can estimate BMD in multiple anatomic sites, including the most sensitive sites for fractures. Thus, it is not surprising that this method was used to determine BMD in all the studies that were suited for the meta-analysis.

The operational definition of osteoporosis is based on the T-score, which describes the number of SDs by which the BMD in the individual differs from the mean value expected in young healthy individuals. However, all of the studies in our meta-analysis presented the results of BMD either in absolute values of g/cm2 or in Z-scores, which describe the number of SDs by which the BMD in the individual differs from the mean value adjusted by age and sex and is valuable for secondary osteoporosis assessment.

In this meta-analysis, we found no significant differences in BMD by g/cm2 between lactase absorbers and non-absorbers. There was a statistically significant difference in the meta-analysis using Z-scores, with lower scores for women with PLD. The Z-score, not BMD by g/cm2, is the accepted measure in clinical practice to assess the risk for bone fractures [35].

The capacity of DXA to detect fractures depends on the anatomic site tested. For example, the chance of detecting fractures in the femoral neck is higher when BMD is measured specifically at that site [36]. One important strength of our study is that we evaluated BMD in separate anatomic sites and not just a total BMD score.

When we compared BMD by g/cm2, there were no significant differences between absorbers and non-absorbers in the femoral neck [32, 34], the lumbar spine [32, 34], and Ward’s triangle [34], but a significant difference was found in the total hip [34] where the BMD was higher in absorbers by 0.36 g/cm2. The only study that compared BMD in the heel area [30] showed a paradoxically higher BMD in non-absorbers compared to absorbers, so there was significant heterogeneity among the studies.

Peripheral DXA of the heel was shown to have a good correlation with central DXA with a good capacity to predict fractures among women from different geographic and ethnic origins [37]. It is difficult to explain the paradoxical result in this one study [30], which apparently made a large contribution to the heterogeneity found in the g/cm2-based meta-analysis.

In the meta-analysis based on Z-scores, we found a significant difference between absorbers and non-absorbers in BMD in multiple sites (femoral head, femoral neck, lumber spine, radius, and Ward’s triangle) as well as in the total Z-score. The Z-score in absorbers was higher by a mean score of 0.20 compared to non-absorbers. The differences in Z-score between absorbers and non-absorbers ranged from − 0.07 in the heel to 0.35 in the femoral neck. It is noteworthy that differences between absorbers and non-absorbers in multiple sites (femoral head, femoral neck, and radius) were assessed only in specific studies. The Z-score was reported from the lumbar spine in the largest number of studies [31, 33, 34]. In the only study that reported BMD by Z-score in the heel, there was no difference between absorbers and non-absorbers. Although the difference in Z-score in the total hip was not significantly different between the groups, there was a clear trend for a higher score in the absorbers group. There was no significant degree of heterogeneity among the studies for individual anatomic sites or for the overall score. This consistency among the studies lends support to our findings, especially considering that the studies were conducted in different countries. It should be noted that while, for the most part, no statistically significant differences were found between absorbers and non-absorbers in the separate results, the greater statistical power of the joint meta-analysis yielded statistically significant results. The importance and the clinical ramifications of the findings in this study are significant since a large number of older adults suffer from PLD, with prevalence rates reaching 50–90% in some countries.

One of the strengths of this study is the inclusion of postmenopausal women from countries with varying rates of PLD. Another strength is the rigid definition of precise diagnostic methods for both lactase deficiency and osteoporosis. Most of the studies, with one exception [31], were based on genetic testing as the basis for lactase deficiency and all the studies used DXA to measure BMD and diagnose osteoporosis. The evaluation of the quality of the studies and the risk of bias revealed a very low risk of bias and the use of multiple anatomic sites for the determination of BMD is very important from the clinical perspective.

A limitation of this study is the low number of studies that were included in the meta-analysis. BMD data were not available in all of the studies for some of the anatomic sites. However, when we consider the consistency of the Z-score results, one can reasonably conclude that there is a greater risk for osteoporosis in lactose non-absorbers. Since there has been paucity of studies on BMD at some anatomic sites, future research on these sites is needed.

Summary and conclusion

The results of this meta-analysis support the contention that PLD is associated with a lower Z-score at multiple anatomic sites. The identification of another risk factor for osteoporosis in postmenopausal women is very important because of its associated morbidity and mortality and can change the natural history of this disease by using existing therapy.