Introduction

Non-Hodgkin lymphomas (NHL) are expected to be the seventh most commonly diagnosed cancer both in women and men in the United States in 2017 [1]. Because the symptoms of lymphoma are vague and there are no widely available screening methods, lymphoma is usually diagnosed at a late stage. There are several clinical studies and case reports, going back to the 1970s, suggesting an association of hypo-cholesterolemia (i.e., low serum cholesterol) in patients diagnosed with lymphoid malignancies [2,3,4,5]. Deregulation of cholesterol homeostasis has been suggested to play a role in cancer development [6]. A potential relationship of low cholesterol to lymphoma as either cause or effect has not been established, but low cholesterol could indicate an altered systemic metabolic profile associated with the natural history of lymphoma pre-diagnosis and thus be a marker of subclinical disease.

We retrieved information from the automated data systems of six United States health plans belonging to the Cancer Research Network, from which the patient data records from regular medical care encounters were used to ascertain incident lymphoma cases and up to five matched controls and then combined to establish one of the largest single studies of lymphoma to date. We assembled this cohort as a resource to conduct pharmacoepidemiological studies of lymphoma, with a focus on statin exposure. In our first analyses, we found that the relationship between statin use and lymphoma was strongly confounded by cholesterol levels (i.e., cholesterol was associated in an inverse direction with both statin use and lymphoma), so we turned our attention to the relationship between cholesterol and lymphoma. In these analyses, we present the first data on serial changes in total serum cholesterol (TSC), high-density lipoprotein cholesterol (HDL), and low-density lipoprotein cholesterol (LDL) in the 10 years leading up to a lymphoid malignancy diagnosis, offering unique insight into the relationship of cholesterol to lymphomagenesis.

Methods

Study design and population

This study was conducted among participating health plans of the Cancer Research Network, an NIH National Cancer Institute-funded consortium of research organizations affiliated with non-profit integrated health systems. Six Cancer Research Network sites participated in this study: Henry Ford Health System-Health Alliance Plan in the Detroit area and Kaiser Permanente sites in Northern California, Southern California, Colorado, Georgia, and Seattle. The study was approved by the institutional review boards at each of the participating sites.

As part of a pharmacoepidemiological nested case–control study of risk factors for lymphoma (with an emphasis on statins exposure), we ascertained the 1998–2008 enrollees of each participating health plan, which provided a base population of over 21 million people. Members with a history of human immunodeficiency virus, cancer (except lymphoma or non-melanoma skin cancer), or organ transplant were excluded. Using each site’s tumor registry and International Classification of Diseases for Oncology (ICD-O) codes, we identified all incident lymphoma cases diagnosed between 1998 and 2008 in members who had prescription drug benefits and who were aged 40 years or more at the time of diagnosis (we excluded younger members as the research emphasis was on medication exposures). Lymphoma categories were defined using the ICD-O codes. Several authors including an experienced hematologist/oncologist (NJ) reviewed the codes and determined the lymphoma categories. For these analyses, the lymphoma categories assessed included classical Hodgkin lymphoma (Hodgkin’s), all non-Hodgkin lymphomas combined (NHLcombined), and six subtypes of NHL with large enough numbers of cases to permit reliable statistical modeling. These included the following categories: plasma cell neoplasms; chronic lymphocytic lymphoma/small lymphocytic lymphoma; diffuse large B-cell lymphoma; follicular lymphoma; marginal zone lymphoma; and all T-cell non-Hodgkin lymphomas. The ICD-O codes used to define the lymphoma categories are provided in the Table 1 footnotes. A number of rare lymphoma ICD-O codes with few cases who did not fit these categories, described in Online Resource Table S1, were excluded.

Table 1 Characteristics [N (percentage) or mean ± SD, (min, max) median] of the lymphoma cases and their matched controls, by lymphoma category and type of cholesterol measure

Controls were selected from the base population of health plan members aged greater than or equal to 40 years and matched to cases at time of diagnosis by age (2 year age-strata), sex, duration of health plan membership (2 year strata), and health plan. Controls were assigned an index date, which was the diagnosis date of their matched case. We selected up to five controls for each case. Health plan enrollment of greater than or equal to 2 years prior to cancer diagnosis (or index date for controls) was required to ensure a 2 year minimum of exposure observation time.

Those cases and controls who did not have at least one cholesterol measure in the 10 years prior to diagnosis were necessarily excluded. To avoid the influence on cholesterol measures by drug treatment, we restricted the primary analysis to the time periods prior to a prescription filled for a cholesterol altering medication (i.e., statins, niacin, bile salts, ezetimibe). These approaches resulted in instances where controls no longer had a matched case, that is, the case did not have any cholesterol measures, was on a cholesterol altering medication at the onset of observation, or both. In this instance, unmatched controls were dropped. In parallel, some cases no longer had a matched control; only cases with at least one control were retained. Sensitivity analyses were conducted using only subjects who had never had a statin exposure.

All the study information was obtained from the automated data systems of the participating centers, without any direct contact with the patients or manual review of hard copy or electronic medical records. The structure of reimbursement at each plan ensures that all data were captured for services in and outside of plan facilities except for services paid for out-of-pocket. Data used for this analysis included demographic data (age, sex, health plan, years of enrollment, and prescription drug coverage), tumor characteristics (histology, morphology, and date of diagnosis), and laboratory values for all measures of TSC, HDL, and LDL in the 10 years prior to diagnosis/index date. Distributions of each cholesterol measure were examined for normality, and conservative bounds were set for measurement reliability with probable errors dropped.

The tumor registry variable for general stage, when available, was used to separate out the cases into early and late stages. Categories for the general stage variable included 0, in situ; 1, localized; 2, regional by direct extension; 3, regional lymph nodes involved; 4, direct extension and lymph node involvement; 5, regional, not otherwise stated; 7, distant metastasis; and 9, unknown/missing. Unknowns and cases with missing stage were excluded solely for this specific analysis. Only NHLcombined as a broad category had a wide enough stage distribution for separation into early (0–2) and late (3–7) stage groupings, allowing analyses by stage.

Statistical analysis

Characteristics of cases and controls were compared for each lymphoma category. Comparisons (p values) were calculated using either t tests or Chi-square tests.

Multilevel longitudinal modeling (or MLM, also known as mixed effects modeling) [7] was chosen as the analysis method, since it can accommodate observation times that vary greatly from subject to subject. In addition, MLM allows more flexible (subject-specific) data fitting compared to the simpler modeling (population average) used with methods like generalized estimating equations. MLM models allow for the separate examination of both within-person variation (random effects) and between-person variation (fixed effects). Case–control matching criteria (sex, health plan site, age at diagnosis/index date, and length of enrollment) were controlled for by maintaining clustering. The age of the subject at the time of the cholesterol measure and the year of diagnosis were included as variables in the individual-specific component of the multilevel models.

We assessed models for each cholesterol measure (i.e., TSC, HDL, LDL) by the lymphoma categories. A polynomial fit was used for the curves showing the trajectories of cholesterol measures of the lymphoma cases and controls prior to the time of diagnosis/index date. To select the polynomial order, increasingly complex multilevel models were fit with orders ranging from linear to quartic. We also performed sensitivity analyses related to the number of cholesterol measures available, restricting the subjects to those who had at least two cholesterol measures in the 10 years prior to index date or diagnosis. The SAS procedure PROC MIXED (SAS version 9.4; SAS Institute, Inc., Cary, NC) with the full maximum likelihood estimation method was used for modeling. The goodness-of-fit of nested models was compared using the difference in the model deviance (− 2 Log likelihood), which is also called the deviance statistic. After the initial models were fit, they were reviewed for the lowest deviance statistic. Graphical displays of the trajectory models of predicted population means, with 95% confidence intervals, by cholesterol type (TSC, HDL, and LDL) were completed separately for Hodgkin’s, NHLcombined, and the six NHL subtypes. The same modeling method was used to assess differences in TSC, HDL, and LDL in NHLcombined, by stage, as well as for the sensitivity analyses assessing subpopulations that had never used a cholesterol-lowering drug.

Results

There were 15,430 lymphoma cases and 77,146 matched controls in the original study population. Omission of low-frequency lymphoma subtypes and their controls removed 107 cases (Online Resource Table S1) and their 535 controls. There were another 2,124 cases and 16,515 controls who could not be included due to either no cholesterol measurements or only cholesterol measurements outside the analysis window, which started 10 years prior to the diagnosis/index date and ended at that date (or at the start of statin use). When these 2,124 cases were excluded, an additional 7,136 controls were omitted because they were left without a matching case. In parallel, when the 16,515 controls without cholesterol measures were omitted, an additional 42 cases were left without any matching control and were therefore subsequently excluded. (The high ratio of unmatched controls to cases is due to the fact that every excluded case leaves ~ 5 controls unmatched, but an excluded control only leaves a case unmatched if the other ~ 4 controls for the case are also excluded, which is rare). Finally, 3 cases and 41 controls were excluded due to cholesterol values that were out of range, leaving a total of 13,154 cases and 52,919 controls included in at least one analysis. Among the subtypes studied, the 2,169 excluded cases had shorter average enrollment duration (7.7 versus 9.6 years), averaged 1.1 years older and included a higher percentage reporting white race (84%), versus 81% for the included cases.

Because the three cholesterol measures were not always all measured together over the study period, there is variation in the number of cases and controls available for the analysis of each cholesterol measure (i.e., TSC, HDL, LDL). The number of cases and controls are reported in Table 1 for each lymphoma group. Online Resource Table S2 shows the numbers of patients and cholesterol measurements included in the analysis for each of the 10 years prior to the diagnosis/index date for Hodgkin’s and NHLcombined.

Table 1 shows the distribution for cases and controls by lymphoma category for several demographic factors, including sex, age at first cholesterol measure, age at diagnosis (index date), cholesterol measure (TSC, HDL, or LDL), number of cholesterol measures, and years of enrollment. The largest case group is for the model of TSC in NHLcombined (cases = 12,573) and the smallest is for LDL in Hodgkin’s (cases = 434). The mean number of measures for TSC, HDL, and LDL was at least 3, and the average length of enrollment was 9 years or longer. The larger sample sizes in some lymphoma categories resulted in significant p values for case–control comparisons of demographic variables even when the differences in the means were quite small. However, all cholesterol measures were significantly lower in cases compared to controls in all but two instances, which were of borderline significance.

For all but one model, a quartic fit with full-term random effects was found to have the smallest deviance statistic (in one model reported in the supplemental data, the sensitivity analysis for Hodgkin’s and LDL among the subpopulation never exposed to statins, the quartic model failed to converge, and a cubic fit was used). Figure 1 shows the MLM trajectory of TSC for Hodgkin’s (Fig. 1a), NHLcombined (Fig. 1b), and the six NHL subtypes (Fig. 1b.1–1b.6), with parallel trajectories for HDL and LDL in Figs. 2 and 3, respectively (the statistics for the differences in the adjusted means are given in Online Resources Table S3, for each cholesterol measure by lymphoma category). Comparisons of the curves for cases and controls show statistically significant differences, which are noted for all eight lymphoma categories several years prior to diagnosis. A difference in TSC curves is detectable 5–10 years prior to diagnosis for NHLcombined and its subtypes, and at 2 years for Hodgkin’s, with a marked decrease of the slope as time approaches disease diagnosis. Similar separation of the curves is seen for HDL with statistically significant differences for all 10 years for NHLcombined and at 5–10 years prior to diagnosis for NHL subtypes. Curves separate at 8 years prior to diagnosis for Hodgkin’s. The patterns for LDL are very similar to TSC.

Fig. 1
figure 1

1a1b.6 Trajectories for total cholesterol for cases compared to controls, adjusting for age at cholesterol measure and index year, by lymphoma category

Fig. 2
figure 2

2a2b.6 Trajectories for HDL for cases compared to controls, adjusting for age at cholesterol measure and index year, by lymphoma category

Fig. 3
figure 3

3a3b.6 Trajectories for LDL for cases compared to controls, adjusting for age at cholesterol measure and index year, by lymphoma category

Overall, the graphs show lower levels of cholesterol measures for cases relative to controls prior to diagnosis, especially in the years closest to diagnosis. When we repeated the analysis requiring at least two measures of cholesterol in the 10 years prior to index date or diagnosis, the results (including 10,087 cases and 38,205 controls for total cholesterol for NHLcombined) were essentially unchanged.

The stage distribution for NHLcombined included 1,372 cases categorized as stage 1, which were combined with 41 stage 2 cases to be considered “early” stage cases in the analysis. Late stage included general stages 3 (n = 5), 4 (n = 3), 5 (n = 695), and 7 (n = 4,934) for a total of 7,050 cases (1,413 “early” and 5,637 late stage). There were 5,444 cases with an unknown or missing stage that were not included in this analysis. Figure 4a–c shows the changes in cholesterol for NHLcombined by stage (see Online Resource Table S4, for the differences in adjusted means for each cholesterol measure). The late-stage cases have a more dramatic drop in cholesterol prior to diagnosis.

Fig. 4
figure 4

ac Trajectories for total cholesterol, HDL, and LDL for early- and late-stage NHLcombined, adjusting for age at cholesterol measure and index year

The analyses comparing cases and controls were repeated for subjects who had never had a prescription for a cholesterol-lowering drug during the observation period, and very similar results were obtained. Details of those analysis results are found in Online Resources Tables S5, S6 and Fig. S1–S3. The cholesterol trajectories for the six additional NHL subtypes were very similar to those for both Hodgkin’s and NHLcombined. We also repeated the analyses for Hodgkin’s excluding those under 50 years (n = 560 or 22% of the 2,529) since the 40–49 year old cases may predominantly reflect a different etiology; the results (shown in Online Resource Fig. S4 and Table S7) did not materially change.

Discussion

To our knowledge, this is the first study to assess within-person changes of cholesterol in the 10 years prior to lymphoma diagnosis using multiple cholesterol measurements. Cholesterol is present in all human tissue and has important biologic functions, such as structural maintenance of cell membranes, intracellular and intercellular signaling, and metabolism. Therefore, cholesterol is central to cellular function. We found that cholesterol, in particular HDL, seems to progressively decrease starting as far out as 6–8 years prior to the diagnosis of lymphoma.

Several cohort studies have only one baseline measure of cholesterol, at study entry, and cancer as an outcome [8,9,10,11,12,13,14,15]. The type of cholesterol measured (i.e., TSC, HDL, LDL) varied between the studies, as did the average length of follow-up (7–17 years). The results were therefore mixed and depended upon timing of cholesterol measurement [10], extent of disease [8], sex [11, 12], and cholesterol type [9, 13,14,15]. In the Alpha-Tocopherol Beta-Carotene Cancer Prevention Study, Lim et al. [13] reported that baseline measures of HDL, but not TSC or LDL, were inversely correlated with risk of a NHL diagnosis, particularly in the first 4 years after baseline. In the Me-Can study [9], baseline measures of TSC were inversely associated with cancers of the lymph-hematopoietic tissue for men and women. The association changed based on the amount of time between the cholesterol measurement and the diagnosis, with more proximal diagnoses supporting an association with lower cholesterol. These studies support our finding that timing of cholesterol measure relative to diagnosis is integral to characterizing the relationship and that the inverse relationships are most striking in the 3–4 years prior to diagnosis. Our study overcomes the limitation of using a single measurement to assess the association of cholesterol with lymphoma. As a result, we are able to provide a first, unique look at the continuous relationship of cholesterol to lymphoma over the 10 years leading to a lymphoma diagnosis.

We examined cholesterol in a number of major lymphoma categories. For each category, the overall pattern remained relatively consistent. NHLcombined represents a heterogeneous group of diseases that arise from many different cell types. Despite this heterogeneity, there was a consistent drop of TSC, HDL, and LDL in this combined category prior to diagnosis, which was reflected in all subtypes studied. The most pronounced divergence in cholesterol measures in NHLcombined is for HDL cholesterol, in line with Lim et al. [13]. The prolonged and consistently lower HDL (present even 10 years prior to diagnosis) could reflect underlying lymphomagenic processes, such as chronic inflammation. HDL is known to have anti-inflammatory biologic activity [16, 17], as well as antioxidative, cytoprotective, vasodilatory, anti-infectious, and antithrombotic activities [18]. Therefore, low HDL may be a marker for the severity of systemic inflammation and perhaps inflammation-induced lymphoma risk. On the other hand, high HDL itself may be protective against lymphoma. The anti-inflammatory activity of HDL is multifaceted, including the inhibition of cytokine-induced expression of endothelial cell adhesion molecules and suppression of the chemotactic activity of monocytes and lymphocytes [19, 20]. In addition, HDL may protect the integrity of lymphocytes from oxidative damage [21, 22].

LDL is known to provide cholesterol for cell membrane structure and integrity [23]. Indeed, LDL is an essential component of the cell membrane. The need for LDL is therefore increased during cell proliferation. In addition, increased oxidation of LDL, which occurs during an inflammatory state, would result in decreased circulating LDL because the oxidated LDL would be taken up by macrophages at the sites of inflammation. A recent review supports a role for the cholesterol pathway in cancer development, suggesting that disruptions in cholesterol homeostasis are related to cancer incidence and survival, particularly increased intracellular cholesterol [6]. Increased intracellular cholesterol levels shut down cholesterol synthesis, which occurs in the liver where it is then transported through the bloodstream [24, 25]. These mechanisms suggest that cholesterol may be important as a risk factor or a biomarker in other cancers besides lymphoma. However, analyses need to take into account the complicating factor of lipid-lowering medication use as well as trajectories over time.

Interestingly, two clinical studies with only a handful of lymphoma cases have shown serum cholesterol measures that return to normal with treatment of the cancer [3, 19, 20, 26, 27], suggesting cholesterol could be a marker of tumor burden, tumor activity, lymphomagenesis, or some combination of these factors. In a case report describing a rare lymphoma (disseminated intravascular large B-cell lymphoma) that was diagnosed postmortem, Garg et al. noted that the patient’s TSC, LDL, and HDL decreased over time until death when they were 97 mg/dL, less than 33 mg/dL, and less than 4 mg/dL, respectively [2]. Our study design limits investigation of whether the lower levels of cholesterol represent a true “risk factor” for lymphoma or some level of pre-diagnosis disease activity. However, the analysis of NHLcombined cases by late versus early stage at diagnosis shows that there is lower serum cholesterol at diagnosis in late-stage disease, supporting the idea that it is the disease process itself driving the drop of serum cholesterol.

As an observational study, the cholesterol measures were not equally distributed between cases and controls or evenly spaced throughout the observation period. Our statistical approach addresses the between-individual and within-individual variation in order to provide the best estimation of the true relationship. To test whether the contribution of only one cholesterol measure by some cases and controls may have skewed our analysis, we ran a sensitivity analysis requiring every subject to contribute at least two measures for Hodgkin’s and NHLcombined. The results were essentially unchanged.

While our study harnessed existing large data sources, this limited us to the covariates and potential confounders that were recorded electronically and available across all health plans. Thus, we were not able to include some potential confounding variables that are known to be associated with both serum cholesterol measures and lymphoma risk, particularly smoking status, race, and body mass index. Long-term smoking has been suggested to increase the risk of lymphoma [28], and within the United States, whites are more likely than African or Asian Americans to develop lymphoma [29]. Some studies have shown an increased risk for lymphoma with high body mass index [30,31,32]. Our study also lacked information on diet, physical activity, or alcohol intake. However, Strohmaier et al. [9] in the Me-Can cohort and Lim et al. [13] in the Alpha-Tocopherol Beta-Carotene Cancer Prevention study showed that even with adjustment for age, body mass index, smoking status, diet, alcohol, and activity level, low cholesterol remained associated with cancers of the lymph-hematopoietic tissue, suggesting that these unmeasured confounders are unlikely to account for our observed differences. Another limitation is that even with this large dataset, we were unable to assess as a subtype a number of the exceedingly rare lymphoma histologies, and we did not have some additional tumor-related data, such as EBV status for the Hodgkin cases. Also, there was no centralized assessment of histology for a cancer type that is especially difficult to classify. Finally, since data are only available for subjects during the time period of health plan enrollment, left censoring of exposure data is present.

Cases of lymphoma were selected from tumor registries with standardized protocols and comprehensive case ascertainment. The data were based on electronic sources that were amassed as part of clinical care independent of the study’s conduct. We excluded from observation, for biologically based reasons, person-years during which an individual was on a cholesterol-lowering prescription drug. Therefore, cholesterol levels measured and used in our study should represent unaltered values. However, we do not know the entire potential impact of this methodological decision, although our sensitivity analyses of “never statin users” that show similar results partly address this concern. The analysis of cholesterol for NHLcombined by stage was limited by the number of cases missing stage information.

The results of our study suggest that cholesterol metabolism is associated with lymphomagenesis. However, further studies are needed to determine if the periods of lower cholesterol prior to diagnosis observed in our study reflect a systemic milieu that increases the risk of lymphoma, are due to reverse causality, or a combination of both.