Introduction

Carotid intima media thickness (cIMT) measured by ultrasonography has been used in many studies to assess factors contributing to atherosclerosis [1]. Therefore, carotid IMT has been used to assess whether HIV infection is associated with increased atherosclerosis [28], but few studies included adequate control groups. While some of those studies found an independent association of HIV infection with carotid lesions [24], others did not [58]. To better determine whether HIV infection is associated with increased pre-clinical atherosclerosis after adjusting for traditional cardiovascular disease (CVD) risk factors, cIMT was measured in the second examination of the Study of Fat Redistribution and Metabolic Change in HIV Infection (FRAM) and the results in 433 HIV-infected subjects were compared to those from a large number (5,749) of similarly aged control subjects from two studies: Coronary Artery Risk Development in Young Adults (FRAM controls) and the Multi-Ethnic Study of Atherosclerosis (MESA). Results from the FRAM study demonstrated that, even after adjusting for traditional CVD risk factors and ultrasound reader, HIV infection remains associated with increased atherosclerosis [9].

There is always some degree of error in any measurement made by a specific reader or laboratory [1013]. Within a single study, it is likely that the patients assigned to any one reader or laboratory will not depend on the patient characteristics. This with-in cohort variability due to reader measurement error will be non-differential and leads to a small attenuation of the association. As a result, most studies of cIMT appropriately decide to ignore with-in cohort measurement error effects [14].

However, when cohorts are compared (between cohort variability) the effect of reader is unclear as studies will have different participants. As the participants in each study cohort are different, it is no longer reasonable to assume that measurement differences will be unrelated to patient characteristics. The effect of this between cohort variability may be to increase or decrease the magnitude of the association under study. Under these conditions, the possibility for bias due to differential or systematic measurement error exists and the resulting changes in associations may be large. Study validity may, therefore, be more seriously impacted and to ensure that this between-study variability did not impact the validity of the reported results, the FRAM study adjusted for reader effects to remove this important source of potential bias [9].

This paper had two objectives in terms of quantifying the impact of neglecting sources of variation due to differences in both reader and/or ultrasound machine. First, we present data from the FRAM study demonstrating that confounding by reader/machine type may bias the results of a comparison of carotid IMT between cohorts if one does not properly account for between study variability due to systematic measurement error (SME). Therefore, we compared cIMT in the MESA and FRAM cohorts before and after adjustment. Second, we explored the with-in study measurement error in the MESA study, by demonstrating the effect on the association between cIMT and CVD endpoints due to SME introduced when failing to account for reader effects within a single cohort.

Methods

MESA is a population-based, prospective cohort study designed to determine risk factors for the development and progression of subclinical and clinical CVD [9]. The MESA study consisted of 6,814 participants between the ages of 45 and 84 years old at baseline. Participants were recruited from six field centers across the United States: Baltimore, MD; Chicago, IL; Forsyth County, NC; Los Angeles County, CA; New York, NY and St. Paul, MN.

The FRAM study was initially designed to evaluate the prevalence and correlates of changes in fat distribution, insulin resistance, and dyslipidemia in HIV-infected individuals. At the follow-up exam, FRAM also included measurements of cIMT to study sub-clinical CVD. FRAM HIV-infected participants were initially recruited from 16 HIV or infectious disease clinics or cohorts; FRAM control participants were recruited from two centers of the Coronary Artery Risk Development in Young Adults (CARDIA) study. The FRAM HIV-infected and FRAM control participants have been previously discussed in detail elsewhere [9].

The MESA, CARDIA and FRAM studies were approved by the institutional review boards of the participating study sites and the data coordinating center.

For all analyses, age was restricted to 37–78 years old and ethnicity was restricted to Caucasians, Hispanics and African-Americans. This restriction ensured comparability among FRAM HIV-infected subjects, FRAM controls and MESA participants. Participants from the FRAM cohorts with any prior history of CVD (physician-diagnosed heart attack, angina, stroke, TIA, heart failure, cardiac arrest or having undergone procedures related to CVD) were excluded as this was an exclusion criterion for the MESA cohort [9].

Carotid IMT assessment

Technicians, trained by Tufts Medical Center, at each of the 22 MESA or FRAM study sites performed B-mode ultrasonography of the right and left near and far walls of the internal carotid and common carotid arteries. A single ultrasound reading center (Department of Radiology, Tufts Medical Center) measured maximal intima media thickness of the internal (including the bulb) and common carotid sites as the mean of the maximum intima media thickness of the near and far walls of the right and left sides [15].

Ultrasound readers

Measurements of cIMT in the FRAM and MESA cohorts were taken at different time periods, ~2 years apart. There were six different ultrasound image readers for the MESA study, two for the FRAM HIV-infected cohort and three for the FRAM controls. One of the readers was in common between the FRAM controls and the MESA participants. For quality control, one of the readers for the FRAM HIV-infected subjects reread 134 MESA ultrasound images to create overlap between these populations as well. It was therefore possible to directly compare MESA to the FRAM HIV-infected cohort and MESA to the FRAM controls. However, because the FRAM HIV-infected cohort and FRAM controls had no ultrasound readers in common, it was not possible to directly calibrate these two cohorts. Instead, we assumed that the cohorts should yield equal mean cIMT conditional on demographic characteristics, traditional CVD risk factors and reader variation—a generally reasonable assumption.

Ultrasound machines

All MESA readings were made using a single machine (Logiq 700 ultrasound device). However, the FRAM cohort used a total of six different machines (GE Logiq 9, GE Logiq 700, Phillips (ATL) HDI 5000, Acuson Sequoia, HP 7500, Siemens). The acquisition protocol and sonographer certification process were the same for FRAM and MESA with certification by a single author [JP]. As different machines may produce slightly different images, these different machines were controlled for using indicator variables with the GE Logiq 700 as reference. Each site used only one type of machine which was used by all sonographers at that site. Therefore, it was not possible to simultaneously account for site or sonographer effects at the same time as we accounted for machine effects. Therefore, some of the variability associated with machine type could be due to differences in the sonographers using the machines.

MESA endpoints

The MESA cohort was followed for incident CVD events for a median of 4.6 years (max 6.5 y). At intervals of 9–12 months, a telephone interviewer contacted each participant to inquire about all interim hospital admissions, CVD outpatient diagnoses and procedures, and deaths. In addition, MESA occasionally identified additional medical encounters through cohort clinic visits, participant call-ins, medical record abstractions or obituaries. In order to verify self-reported diagnoses, copies of all death certificates and medical records were requested for all hospitalizations and selected outpatient CVD diagnoses and procedures. Next of kin were interviewed for out of hospital CVD deaths. Hospital records were obtained for an estimated 98% of hospitalized CVD events, and at least some information was obtained for 95% of outpatient diagnostic encounters.

Trained personnel abstracted any hospital records that suggested possible CVD events. Abstractors recorded symptoms, history, biomarkers, scanned ECGs, echocardiograms, cathetherization reports, outpatient records, and other relevant diagnostic and procedure reports; and transmitted these to the data coordinating center. The coordinating center collated the abstracted or original endpoint records and sent them to two paired physicians for independent endpoint classification and assignment of incidence dates. Cardiologists or cardiovascular physician epidemiologists reviewed non-neurovascular endpoints; neurologists reviewed all neurovascular endpoints. If the reviewing pair disagreed on classification, they adjudicated differences. If disagreements persisted, the full review committee made the final classification.

A composite endpoint of all CVD was defined to include Myocardial Infarction, Angina, Resuscitated Cardiac Arrest, Stroke (but not Transient Ischemic Attack), Coronary Heart Disease implicated Death, Stroke Death, Other Atherosclerotic Death, or Other Death related to CVD.

Statistical analysis

For the first study objective (comparing MESA and FRAM), we fit a multivariable linear generalized estimating equations regression model (to handle repeated measures of cIMT) in order to estimate the mean difference in cIMT associated with HIV infection. We compared estimates from models excluding covariates for both readers and machines to ones that included covariates for either reader and machine or both reader and machine. Reader and machine effects were included using indicator variables for readers, with the common reader between FRAM HIV infected subjects and MESA participants used as the reference category. Using indicator variables for reader and machine clusters is the simplest form of hierarchical model [16]. The goodness of fit of these models was evaluated using a Likelihood-ratio test comparing differences in log-likelihoods between models with and without reader or machine effects.

For the second objective (MESA only), we performed a time to event analysis (Cox proportional hazards model) to estimate the risk of CVD events by quartile of common and internal cIMT. We then compared models in which the differences due to reader effects are accounted for with indicator variables and those models where they are not. Pseudo-bias was calculated as the difference in estimates between models with and without reader adjustment, divided by the estimate from the model with reader adjustment.

All analyses were conducted using the SAS system, version 9.2 (SAS Institute, Inc., Cary, North Carolina, USA).

Results

The characteristics of the participants are presented in Table 1. FRAM HIV+ participants differed in many important respects from FRAM and MESA controls, as previously reported [9]. Because of these imbalances, a unadjusted differences in mean levels of cIMT should be interpreted with caution and attention should focus on estimates adjusted for differences in demographic factors.

Table 1 Descriptive statistics [either mean (standard deviation) or percentage] for the multi-ethnic study of atherosclerosis (MESA), the study of fat redistribution and metabolic change in HIV infection (FRAM) and the FRAM controls for all participants with at least one valid carotid intimal medial thickness ultrasound reading

In models controlling for both reader and machine, estimates of the mean difference in cIMT associated with HIV infection were 0.037 mm (95% Confidence interval (CI): 0.003–0.072) for common cIMT (Table 2) and 0.192 mm (95% CI: 0.076–0.308) for internal. In models that failed to account for reader or machine, the HIV effect was inflated (pseudo-bias: 116% for common, 32% for internal). Models that controlled for machine but not reader also produced inflated HIV effects (151% for common, 67% for internal). Finally, when reader effects were incorporated but machine was ignored, the HIV effects appeared to be slightly attenuated (−14% for common, −23% for internal) but this attenuation was much less than sampling error.

Table 2 Adjusted estimates of the mean difference in cIMT associated with HIV infection accounting for the two sources of principal measurement error (machine type and ultrasound reader) among 433 HIV positive participants and 5,749 controls: data from FRAM, MESA and CARDIA

The addition of machine to models containing reader did not significantly improve goodness of fit for either common (P = 0.8) or internal (P = 0.2) cIMT. In contrast, adding reader to models with only machine in them resulted in a statistically significant improvement for goodness of fit for both common (P < 0.0001) and internal P < 0.0001) cIMT. Clearly adding both machine and reader to the model at once also improved goodness of fit for both common (P < 0.0001) and internal (P < 0.0001) cIMT.

Within the MESA cohort, we considered the association of internal and common cIMT (quartiles) with CVD events (Tables 3, 4) in models excluding and including reader effects (the same machine was used for all MESA readings). In both cases, the association of cIMT with CVD events was weaker in models that excluded reader effects. In common cIMT, the estimate of the effect of being in the fourth quartile (compared with the first quartile) on the rate of all CVD events is increased from Hazard Ratio (HR) 1.62 (95% Confidence Interval (CI): 1.02–2.58) to HR 1.84 (95% CI: 1.12–3.03) which is a modest strengthening of the association due to properly accounting for with-in study measurement error. For internal cIMT, the estimate of the effect of being in the fourth quartile (compared with the first quartile) on the rate of all CVD events has a Hazard Ratio (HR) of 1.97 (95% CI: 1.32–2.95) when adjusted for reader, but the HR was 1.89 (95% Confidence Interval (CI): 1.28–2.80) with no adjustment, which is consistent with the effects seen with common cIMT.

Table 3 Association between quartiles of common carotid intima-media thickness (cIMT) and the 243 reported “all cardiovascular” (CVD) events in the MESA study after median of 4.6 years (max 6.5 years) of follow-up with adjustment for reader effects
Table 4 Association between quartiles of internal carotid intima-media thickness (cIMT) and the 243 reported “all cardiovascular” (CVD) events in the MESA study after median of 4.6 years (max 6.5 years) of follow-up with adjustment for reader effects

Discussion

Measurement error (either from general variability or due to systematic differences between groups) can introduce an important amount of bias into estimates, even under ideal conditions. In this paper we demonstrated the bias introduced by the use of multiple readers or machines for studies using carotid ultrasonography. The effect of systematic measurement error due to use of multiple readers or machines is most important when comparisons are made between two different study populations that were measured separately. However, we have shown that even within a single cohort, the failure to account for measurement error resulted in a change in the estimate of the risk of serious CVD events associated with increased levels of cIMT.

Pooling separate study cohorts may be unavoidable in several different types of studies. One possibility, as seen in the first example presented here from the FRAM Study, is the need to estimate the difference in cIMT between cohorts of differing characterizes. In such a case, comparisons with existing cohort studies are an attractive alternative to recruiting a new control series in parallel, due to issues of cost and feasibility. The failure to account for reader and machine differences between cohorts could create potential issues in the pooling of these studies.

We also show here that failure to account for reader measurement error with-in a single cohort on measurement of cIMT can distort the association between cIMT and important endpoints. Typically, within-cohort measurement error is not associated with participant characteristics, but dilutes associations. While the effect sizes seen in this example are modest, they do rise to the level of practical significance for common cIMT where the difference in estimates (pseudo-bias) is greater than 20%. This could make the measure less or more predictive when compared to other candidate measures of sub-clinical atherosclerosis [14].

Our results are compatible with Espeland et al. who found that reader effects made an 11% contribution to variations in cIMT measurements in the ACAPS study [12]. They suggested that the true correlation of cIMT with risk factors is about twice what is commonly observed [12] but that this association is diluted by uncontrolled measurement error when reader effects are not accounted for. Controlling for reader effects in the analysis of any cIMT endpoint should improve estimates of association between cIMT and predictors as we are reducing one source of non-differential misclassification.

Measurements of cIMT in the FRAM and MESA cohorts were taken at different time periods, ~2 years apart. Temporal bias has been observed in previous studies, and the approach used here (having ultrasounds reread) has previously been used to correct these issues [17]. Temporal drift may explain why reader effects seen in this study are larger than those seen in studies such as ELSA (European Lacidipine Study on Atherosclerosis) where ultrasound scans were all read contemporaneously by multiple readers [8, 18].

In this study, the large differences observed in residual reader and machine effects demonstrate that pooling measurements from different readers and machines will result in misclassification of participants. That this type of systematic measurement error can lead to a modest dilution of the estimates of the association between cIMT and CVD endpoints is a known phenomenon [1921], but often considered of little practical importance. However, the sheer magnitude of the effects seen when comparing two cohorts suggests that extra care should be taken in studies of cIMT when comparing cohort differences is the primary objective. Previous studies have often failed to account for reader effects [22] which can create vulnerability to high levels of systematic measurement error if, by bad luck, the readers in the two studies have very different mean estimates of cIMT. In the current study, a failure to account for reader effects would have dramatically overestimated the effect of HIV infection on cIMT due to the pattern of reader differences between the FRAM and MESA studies.

It is unclear if it is appropriate to adjust for machine type in this type of analysis and our previous FRAM comparative study only adjusted for reader [9]. Other studies have used careful protocols that avoid this issue entirely [23]. Unlike reader effects, which are strong and for which we have rereads to assess our model validity, machine effects are weaker and need to be estimated from data. However, there have been studies in which adjustment for a large number of machines was essential to seeing a clear result. For example, the EDIC study used 12 different ultrasound machines at 28 different sites and found it important to adjust for machine type when estimating progression of cIMT [24]. In the FRAM population, the effect of machine type is small and the benefits of including machine adjustments in the analysis are dubious. Not only is the effect small, but the MESA cohort used a single machine. This raises the concern that machine variability could be acting as a marker for HIV infection—a concern that could only be directly addressed by cross-calibration (which is not typically feasible). Therefore, the more conservative approach to presenting between cohort estimates of the HIV effect would adjust for reader but not machine type [9].

When modeling the association between cIMT and candidate risk factors, it is important to account for differences in readers as a form of measurement error. Current practice in studies that use IMT may be very suboptimal in terms of efficiently mitigating the reader error problem. Failing to account for measurement error due to reader or machine differences can result in important levels of bias and either a common reader or the use of rereads is essential to the calibration of differences between readers when comparing cohorts.