Introduction

When measuring patient-reported outcomes, one of the most important elements in symptom and patient-perceived problem assessment is the recall period or time span that patients are asked to consider when responding to health-related questions. For example, whether the time span is limited to a 1-day or to a 1-year period may influence both the frequency and severity of reported symptoms.

For the popular oral health-related quality of life (OHRQoL) questionnaires, such as the Oral Health Impact Profile (OHIP) and the Oral Impacts on Daily Performances, commonly applied recall periods are lifetime [1], 12 [2], 6 [3], 3 [4], and 1 month(s) [2]. However, to capture rapid symptom relief after dental interventions, shorter recall periods, such as 7 days, might be necessary. A 7-day timeframe is commonly used in health-related quality of life assessment in medicine [5]. For example, PROMIS® (Patient Reported Outcomes Measurement Information System)—a system of highly reliable, precise measures of patient-reported health outcomes for physical, mental, and social well-being—frequently uses a 7-day recall period in its questionnaires or test-item banks [5]. Many PROMIS researchers contend that 7 days “is on the upper limits of ecological validity for specific events (especially for subjective symptoms), yet long enough to allow time for people to experience enough events” [5].

Although the original OHIP publication specified that all 49 items should refer to a fixed time period, it did not recommend a specific time period [6]. To our knowledge, a 7-day time period has never been used with the OHIP. To expand the applicability of the OHIP to periods of rapidly changing perceived oral health status, we argue that a 7-day recall period should supplement existing timeframes.

The aim of this study was to investigate and compare the relative validity and reliability of OHIP scores referencing 7-day and 1-month recall periods in international prosthodontic patients.

Methods

Study setting, study design, and subjects

The study was an ancillary study initiated within the international Dimensions of Oral Health-Related Quality of Life (DOQ) Project [7]. The project analyzed 49-item OHIP [6] data from general population subjects and prosthodontic patients from six countries (Croatia, Germany, Hungary, Japan, Slovenia, and Sweden) with validated language-specific OHIP instruments [813]. The international collaborators of the DOQ Project came from the Department of Prosthodontics, University of Zagreb, Zagreb, Croatia; the Department of Prosthetic Dentistry, Center for Dental and Oral Medicine, University Medical Center Hamburg-Eppendorf, Hamburg, Germany; the Department of Prosthodontics, University of Pécs, Pécs, Hungary; Department of Prosthodontics, Showa University, Tokyo, Japan; the Department of Prosthetic Dentistry, University of Ljubljana, Ljubljana, Slovenia; and the Centre of Oral Rehabilitation, Prosthetic Dentistry, Norrköping, Sweden. In each participating center, the authors received study approval from the institutional medical ethic committees and targeted a consecutive sample of prosthodontic patients.

The intended sample size for each center was N = 50. This sample size was based on previous findings for OHIP-49 retest reliability coefficients [10]. These coefficients are commonly >.75, indicating that two OHIP scores with the same recall period are relatively stable and correlate highly. We used a coefficient of .75 for the predicted correlation between OHIP scores referencing different recall periods as our target for sample size calculation. Here, 50 subjects would allow us to determine r = .75 with a precision of .63-to-.87 (95 % confidence interval) in each country.

Patients were assessed on two occasions when their OHRQoL was assumed to be stable. Specifically, they were assessed either twice before the start of prosthodontic treatment or twice after the end of treatment. On average, 2 weeks elapsed between assessments. The order in which patients completed the two forms was determined by random assignment for each center using block randomization performed by the statistical software STATA [14]. The study design is similar to test-retest studies that were performed for testing the psychometric properties for language-specific OHIP versions in these six countries [813]. However, instead of receiving two OHIPs with the same recall period, one OHIP form had a new 7-day recall period, and the other OHIP form had the commonly used 1-month recall period. Not all subjects received questionnaires with the different recall periods, and not all OHIP questionnaires were complete. We dropped seven subjects who completed OHIPs with the same recall period and eight subjects who provided insufficient OHRQoL information (6 or more missing items representing a threshold used earlier [15]). Missing values for OHIP questionnaires with five or fewer missing answers were imputed using a median imputation (within person and occasion response vectors) for each OHIP item. The final sample size was N = 267 with 59 patients coming from Croatia, 37 from Germany, 49 from Hungary, 50 from Japan, 50 from Slovenia, and 22 from Sweden. Data management was performed with Stata/IC 13.1 [14], and data analyses were performed in R [16].

Oral health-related quality of life assessment and global oral health status

For each of the 49 OHIP items, a subject rated how frequently he or she has experienced a certain impact on a 5-point scale (0 = “never,” 1 = “hardly ever,” 2 = “occasionally,” 3 = “fairly often,” 4 = “very often”). Whereas the original OHIP used a 12-month recall period, a 1-month recall period has been used more frequently to capture recent oral health impacts. Additionally, following a recent suggestion [12], the word “jaw” was added to each OHIP item ending with the phrase “… because of problems with your teeth, mouth, and dentures,” so that subjects referenced the entire stomatognathic system. On each occasion, subjects were also asked to rate their global oral health status on a 5-point scale (0 = “excellent,” 1 = “very good,” 2 = “good,” 3 = “fair,” 4 = “poor”). Due to miscommunication, Japanese subjects reported global oral health status on a 2-point scale (0 = “good,” 1 = “poor”).

Built on Locker’s conceptual model of oral health [17], OHIP-49 items were initially grouped into seven domains: Functional Limitation, Physical Pain, Psychological Discomfort, Physical Disability, Psychological Disability, Social Disability, and Handicap. The Dimensions of OHRQoL Project [7] suggested, based on exploratory factor analytic results from 5173 international participants and confirmatory factor analytic results from 5022 participants, that Oral Function, Orofacial Pain, Orofacial Appearance, and Psychosocial Impact are the four major aspects of patient’s self-perceived OHRQoL [17, 18] that are measured by the OHIP. However, the findings of the DOQ Project caution against the use of four dimension scores due to the presence of a large general factor that accounts for the lion’s share of reported symptom/problem comorbidity. Rather, the project authors recommend that OHRQoL measured with OHIP can be accurately described with a single summary score [18] that taps the general dimension. Therefore, in the present study, we used all items of the long OHIP (minus the three items that reference dentures) as our OHRQoL measure.

Data analysis

Reliability assessment

For each recall period and country, Cronbach’s alpha [19] was calculated as a measure of the OHIP summary scores’ internal consistency reliability. These reliability coefficients estimate the proportion of observed score variance that is due to true individual differences in OHIP summary scores. We computed 95 % confidence intervals for the reliability coefficients using a method by Duhachek and Iacobucci [20].

Summary score analyses

OHIP summary scores were computed for each individual (at each occasion) as the sum of the 46 OHIP item scores (three items referencing dentures were not used in the analyses). The means and standard deviations of these total scores were computed for each country by occasion (first or second) and recall period (7 days or 1 month). In addition, we computed the means and standard deviations of the global oral health status indicator for each country, and we computed paired t tests of mean differences between the two OHIP form recall periods and between the two sets of global oral health status indicators.

To assess the convergent validity of OHIP summary scores for the two recall periods, we correlated the recall period-specific summary scores with the associated global oral health status scores. Confidence intervals for these Pearson correlations were constructed using Fisher’s r-to-z transformation [21]. For each country, we tested whether these correlations were significantly different from each other. Because the two correlations are based on the same subjects, we used Steiger’s method [22] for testing differences among dependent correlations.

Structural equation models

A series of structural equation models (SEMs) [23] was fit to the data to evaluate the measurement invariance and convergent validity of the two OHIP forms. In our first set of analyses, we tested the dimensional structure of each OHIP form using separate confirmatory factor analytic (CFA) models (see Fig. 1a). Next, we evaluated the OHIP form structural invariance to determine whether recall period choice influenced the relationships between the 46 manifest OHIP items and the underlying common factor of OHRQoL (see Fig. 1b). Thirdly, we tested the convergent validity of each form by correlating the associated OHIP general factor with the global oral health status measure (see Fig. 1c).

Fig. 1
figure 1

Path diagrams for one-factor (a), two-factor (b), and convergent validity (c) structural equation models. Rectangles denote manifest variables, circles denote latent variables, double-headed arrows denote correlations, and single-headed arrows denote regression pathways

Prior to conducting the SEM analyses, we scaled the OHIP and global oral health status scores to remove country of origin mean-level effects from the data. Specifically, for each variable (i.e., item), we removed the country-specific item means to control for sample differences in perceived oral health (as shown in Table 1).

Table 1 Sociodemographic characteristics, denture status, proportion of follow-up (posttreatment) assessments, and OHRQoL impairment in prosthodontic patients from the six countries of the Dimensions of OHRQoL Ancillary Project

For the SEM analyses, all models were estimated using diagonally weighted least squares (DWLS) [24] estimation in the lavaan package [25] for the R software [16] programming environment. DWLS estimation has been shown to work well with ordinal data and to be robust to violations of multivariate normality [26].

Across the SEM analyses, model fit was evaluated using a standard collection of fit indices [23]. These indices included the log-likelihood chi-square test, the standardized root mean square residual (SRMR) [27], the root mean square error of approximation (RMSEA) [28], the comparative fit index (CFI) [29], the Tucker–Lewis index (TLI) [30], and the adjusted goodness-of-fit index (AGFI) [31]. To gauge the quality of our SEM results, we consulted Nye and Drasgow [26] who recently investigated the performance of these fit indices using DWLS estimation under a variety of sample sizes and variable skewness conditions. Results for data sets most like the ones in our study suggested the following guidelines for adjudicating adequate model fit: RMSEA ≤ .02, SRMR ≤ .05, CFI ≥ .99, and TLI ≥ .99. A conservative cutoff of .95 was chosen for the AGFI.

Results

Characterization of prosthodontic patients from the six countries

The total sample included 267 adult prosthodontic patients from six international prosthodontics treatment centers. Summaries of the demographic variables, denture status, and proportion of follow-up assessments for these subjects are displayed in Table 1. In the aggregate data set, females represented 58 % of all respondents, and in each participating country, the ratio of female to male prosthodontic patients was larger than one. The respondents had a mean age (SD) of 54.0 years (17.2). Across countries, average subject ages varied from 40.0 years (16.1) for Slovenian patients to 68.6 years (8.7) for Japanese patients. Most (56.6 %) prosthodontic patients had no removable dentures. The proportion of subjects with no removable dentures ranged from 18 % in Japan to 96 % in Slovenia. Across all countries, 51 % of subjects completed the OHIP forms after treatment, though the proportion of posttreatment assessments ranged from 32 % in Slovenia to 68 % in Japan.

OHIP summary score analysis

As shown in Table 1, using data from all 267 patients, we found substantial differences in OHIP summary scores across countries. These average summary scores ranged from 16.2 in Sweden to 45.0 in Germany. Average OHIP summary scores were slightly higher when using the 1-month recall period (34.9) than when using the 7-day recall period (32.1). However, t tests showed significant mean score differences only for Croatia (t(58) = 6.5, p < .001) and Slovenia (t(49) = 2.0, p = .047). In contrast, none of the mean differences between the two sets of global oral health status scores (Table 2) reached statistical significance. For the 257 patients with complete OHIP and global oral health status data, summary scores from both OHIP forms were highly reliable in all (country specific) samples (Table 2). Cronbach’s alpha for OHIP scores ranged from .93 to .98 for the 7-day recall period and from .95 to .98 for the 1-month recall period. Regarding the correlational analyses, Table 2 reports (Pearson) correlations between OHIP summary scores and the global oral health status scores for each country. In most countries, these correlations were moderately high (median r = .52). For the Japanese subjects, due to the modified response format of the global oral health status scores, these correlations were slightly lower.

Table 2 OHIP summary score and global oral health status analyses by country

Finally, we evaluated differences in convergent validity associated with the two OHIP recall periods. Across the six (country-specific) samples, we found no significant differences between the across-form (i.e., recall period) OHIP-global oral health status correlations.

Structural equation models for oral health-related quality of life

Previous findings within the DOQ Project [18] have demonstrated that a one-factor model (1FM) fits the 46-item OHIP reasonably well (using a 1-month recall period). To corroborate this result in the current data, a 1FM confirmatory factor analysis was fit separately to each test form (see Fig. 1A). Fit indices for these analyses are reported in the first two rows of Table 3. These findings suggest that the 1FM provides an accurate and parsimonious account of the latent structure of each OHIP test form.

Table 3 Fit statistics for structural equation models

Next, to investigate the effects of test form on the OHIP latent structure, we combined the two one-factor models into a two-factor (2FM) CFA with correlated factors (see Fig. 1b). Because the two OHIP forms include the same 46 items, their item residual scores were allowed to covary across forms. To test across-form measurement invariance [32], we fit several models that varied the number of parameter equality constraints across test forms. In our first model, we allowed the factor loadings (Λ) and the residual variances (Θ) to vary across the two OHIP forms (λ i ≠ λ i′, θ ii ≠ θ ii′ for i = 1,…,46). As expected, this model fits well (see row 3, Table 3). In our second model, to assess metric invariance [32], we constrained the corresponding factor loadings to be equal across the two test forms (λ i = λ i′ for i = 1,…,46). As reported in Table 3, the fit statistics for this model indicated excellent model-data fit. Finally, to assess strict factorial invariance [32], we constrained both the factor loadings and the residual variances to be equal across the two OHIP forms (λ i = λ i′, θ ii = θ ii′ for i = 1,…,46). This model also fits the data well (see Table 3) and did not fit significantly worse than the metric invariance model (χ 2 dif = 10.4, df = 46, p = 1). In the strict factorial invariance model, the two latent OHRQoL factors correlated .93, indicating that OHIP latent factor scores are highly correlated across recall periods.

Finally, we added the global oral health status indicators to the strict factorial invariance model (Fig. 1c) and tested convergent validity invariance using two structural equation models. In the first unconstrained model, we allowed the correlations between OHIP latent scores and the global oral health status scores to be estimated separately for the two recall periods (ϕ ≠ ϕ′). In the second constrained model, these correlations were required to be equal (ϕ = ϕ′). As shown in Table 3, both the unconstrained (ϕ ≠ ϕ′) and the constrained (ϕ = ϕ′) models fit the data well. In the unconstrained model, the OHIP-global oral health status correlations were .50 (1 month) and .43 (7 days). In the constrained model, these correlations equaled .48. Constraining (ϕ = ϕ′) did not significantly worsen model fit (χ 2 dif = .10, df = 1, p = .76). Thus, our data suggested that the convergent validity of the OHIP latent factors with our global measures of perceived oral health is not significantly affected by moving from the 1-month to the 7-day recall period. For all models, the SRMR values were slightly higher than the previously described threshold value for indicating excellent model fit. Follow-up analyses of the residuals in our final model indicated that small amounts of item covariation remained in the data after fitting unidimensional OHIP latent factor models.Footnote 1

Discussion

In this study, analyzing data from 267 prosthodontic patients from six countries, we found that using a 7-day recall period instead of a 1-month recall period did not impact test score reliability and validity of the long OHIP. In the country-specific analyses, OHIP summary scores had very similar reliability coefficients and slightly higher convergent validity coefficients with the 7-day recall period. Using a SEM approach, we found measurement invariance for OHIP item responses from the two recall periods. Furthermore, the correlations between self-reported global oral health and OHRQoL measured by the OHIP using either the 7-day or 1-month recall periods were not significantly different.

When data from the two OHIP forms were compared, patients reported slightly more OHRQoL impairment (about 3 OHIP points) for the 1-month recall period relative to the 7-day recall period. This difference did not reach 6 OHIP points, which represents the OHIP minimally important difference [33], in any of the six countries under study.

We assessed convergent validity of the OHIP summary scores by correlating data from both forms with a single-item measure of global oral health. These correlations were substantial for both recall periods, but the magnitudes of the correlations varied across countries. As expected, they were higher for the 7-day recall period in all countries except for Slovenia, and the magnitudes of these convergent correlations were similar to those found in previous studies [34, 35]. All Cronbach’s alpha reliability coefficients were very high, and only small differences were observed across the two recall period test forms. Again, these reliability coefficients were similar to previously reported OHIP reliability coefficients [34, 35].

A small number of studies have previously considered the recall period for the OHIP. For example, a Finnish study [36] compared a 12-month with a 1-month recall period for the 14-item OHIP. This study recruited adults from two sources; the first sample included patients awaiting orthognathic surgery (N = 104), and the second sample was a convenience sample of workers drawn from various workplaces in North Finland (N = 111). Similar to our study, the Finnish study did not find substantial differences between OHIP-14 summary scores referencing the two recall periods. Similarly, a German study [10] using the German language OHIP (OHIP-G) compared OHIP-G scores for lifetime, 12- and 1-month recall periods. These researchers found that a 1-month recall period had the highest internal consistency reliability among the three test forms. No differences of clinically relevant score magnitude were observed among the three recall periods, but one statistically significant difference was detected when a 1-month and a lifetime recall period were compared [10]. Considered in aggregate, the findings of these studies [10, 36] are in line with the results of our analyses. Specifically, all three studies found that modifying OHIP recall periods did not produce clinically significant differences in OHIP reliabilities and validities. Nevertheless, the interpretations of these studies have not been consistent. For instance, in the Finnish study, Sutinen et al. concluded that “although a standardized reference period of 12 months is recommended, in population surveys the use of a shorter (one-month) reference period does not appear to influence responses” [36]. In the German study [10], based on the expectation that memory is more accurate over shorter time periods, the authors recommended using a 1-month recall period compared to longer reference periods.

Synthesizing the results of the two previous studies with those of the present study, we conclude that there is no “correct” recall period for self-assessed oral health and that “[t]he recall period must correspond to the characteristics of the phenomenon of interest and the purpose of the assessment” [37]. More specifically, choice of recall period should be affected by the purpose and intended use of the OHIP scores, the patient population, the patients’ disease or condition, the treatment or device, and the study design [38]. Similarly, as noted by Norquist et al. “(1) recall depends on what the patient-reported outcome measure captures, its intended use, and attributes of the disease and study; (2) within the same disease area, recall can vary depending on the concept or phenomenon of interest; (3) recall must consider patient burden and their ability to easily and accurately recall the information requested; and (4) recall must be consistent with the duration of the trial and the scheduled clinic visits” [39]. While the choice of the OHIP recall period, such as the 1-month or 7-day recall period, may not be substantially different for the patient’s burden, the measured phenomenon (e.g., current perceived oral health versus an average disease impact over a certain time) may be more accurately assessed using a particular recall period.

In measures of other health-related outcomes, researchers often prefer shorter recall periods to longer recall periods. For instance, Acaster et al. point out that “[i]n general, shorter recall periods (e.g., 24 h, or 1 week at most) can be preferable to longer recall periods mainly because longer recall data can be heavily biased by current health and any significant events” [38]. In cancer patients, a study comparing a 1-day with a 7-day recall period found comparable symptom reporting [40], and another study led to the recommendation of a 7-day recall period [41]. In a study aimed to assess the accuracy of pain and fatigue items across different recall periods, recall periods of 3, 7, and 28 days generated similar ratings of pain and fatigue levels, suggesting that these recall periods may be exchangeable [42].

Strengths and limitations

This is the first international, multicenter study to assess the 7-day versus 1-month recall period for the OHIP questionnaire, which is the most widely used OHRQoL instrument in dentistry. Because of our sample sizes, country estimates for the correlations of OHIP summary scores with global assessments of oral health had relatively wide confidence intervals. However, these coefficients were moderately high for all countries. Moreover, while correlations across test forms (i.e., recall periods) were high among OHIP summary scores, the absolute levels of OHRQoL measured with the two test forms were not necessarily similar.

Prosthodontic patients from Croatia, Germany, Hungary, Japan, Slovenia, and Sweden represented quite different prosthodontic populations in terms of their age, denture status, proportion of posttreatment (follow-up) OHRQoL assessment (as compared to pretreatment, baseline assessment), and OHRQoL impairment. However, samples did not vary substantially in their OHIP mean values (approximately 1 OHIP point) when the first assessment was compared to the second, providing evidence that average OHRQoL remained stable over the study period. Although the initially planned sample sizes were 50 patients per country, the exclusion of some questionnaires, due to missing data and a job position change for the Swedish collaborator, led to smaller samples for some countries.

Generalizability of results to OHRQoL dimensions, OHIP short forms, and other OHRQoL instruments

According to the DOQ Project [7], an individual’s overall OHRQoL burden can be sufficiently summarized by a single, higher-order score despite the multidimensional nature of OHRQoL [17, 18]. Thus, we modeled the OHIP item responses using single latent factors. Nevertheless, further methodological work may provide more informative OHRQoL measures that better capture individual differences in the conceptually separable domains of oral health.

Many versions of the OHIP have been reported in the literature, including an abbreviated 14-item short form [43], a 5-item short form [44], and several condition-specific versions [4547]. Summary scores from these alternate forms are known to correlate highly with summary scores from the long form OHIP [4850]. These findings suggest that our results are likely to generalize to other OHIP versions. Finally, because the OHIP shares many similarities with other OHRQoL questionnaires, we believe that our recall period results are relevant for other OHRQoL measures.

Conclusion

The present study confirmed that recall periods do not have a large influence on OHIP scores or the correlations of scores with other global measures of perceived oral health. In settings for which oral health changes quickly, we believe that the use of a 7-day recall period is a valuable option in OHRQoL measurement for two reasons:

  • A 7-day recall period unifies the measurement timeframe that is used to assess other oral and medical conditions. This unification facilitates an integrated approach to the assessment of oral and general health.

  • Short recall periods are conceptually appealing: All things considered, short recall periods should produce more valid and reliable results when health changes rapidly.

While we acknowledge that recall periods are situation-specific, we believe that, to achieve better global standardization, the 7-day timeframe should be OHIP’s preferred recall period in clinical settings.