Introduction

In clinical studies, missing data are common despite the best efforts of the investigators [1, 2]. Although missing data are virtually unavoidable, their existence is not always reported. In many studies, the existence of missing data is simply ignored. Another common strategy, which is the default setting of statistical software, consists in deleting observations that have missing values.

A missing value occurs when data for a variable or question are not collected, not available or not appropriate, leading to an empty cell in the data set. Sometimes, missing information may constitute a meaningful value (as when the response is “I don’t know”) that deserves to be analysed. Ignoring or inadequately handling missing values can lead to biases and to loss of statistical power [35]. In randomised controlled trials (RCTs), patients with adverse events often discontinue their treatment and therefore fail to undergo an evaluation of the primary end-point. Consequently, the population of patients with data on the primary end-point is not representative of the initial population. In extreme cases, the bias and loss of power can cancel out or even reverse the treatment effect. Also, sample size calculations are often performed without accounting for incomplete observations, and the sample included in the statistical analysis is therefore smaller than expected. Missing data affect not only RCTs, but also all observational studies.

The missing value issue was first investigated in depth in the 1970s, most notably by Rubin, who developed multiple imputation in 1987 [6]. As the need for practical solutions to handle missing values became recognised, research in the field expanded and the number of theoretical and applied studies increased at a fast pace. Today, strategies for handling missing data in clinical research continue to generate vigorous controversy.

The patterns of missing values in a data set can be classified into three main categories [4] depending on the relationship between the process causing the absence of data and the measured or unmeasured variables. The effect of ignoring the missing data on the reliability of the statistical analysis varies across the three categories.

In the missing completely at random (MCAR) pattern, the probability of the value being missing is independent from all patient characteristics. When representing the data set as a large matrix, the missing values are scattered randomly throughout the matrix. Consequently, the subsample of observations without missing data is representative of the original sample. This pattern is unlikely to be observed. In the missing at random (MAR) pattern, the probability of data being missing is dependent on other observed covariates. In a survey, for instance, young adults are more likely not to declare income; therefore, income is MAR on age. In the third pattern, not missing at random (NMAR), it is assumed that the probability of data being missing cannot be explained solely by other observed covariates but depends instead on the incomplete information itself. For example, it may be assumed that persons with high incomes are less likely to communicate their incomes; therefore, the non-response probability depended on the nature of the missing value.

Analyses such as missing indicator modelling can be performed to distinguish between the two random mechanisms (MCAR and MAR) but do not allow testing of NMAR assumptions.

The objectives of this study were to assess the frequency of missing data reporting in the published critical care literature and to illustrate, on the basis of a large data set from the Conflicus study [7], the consequences of handling versus ignoring missing data and the impact of various techniques for handling missing data. Recommendations on how to handle missing data are provided.

Methods

Review of the ICU literature

To evaluate the reporting and handling of missing data in clinical studies, we reviewed the articles published in the October 2010 issues of three major critical care journals, Intensive Care Medicine (ICM), American Journal of Respiratory and Critical Care Medicine (AJRCCM) and Critical Care Medicine (CCM). All articles providing a statistical analysis on a patient population were reviewed, regardless of study design or size (observational study or clinical trial). Each article was carefully reviewed by two readers (AV and JFT) working independently of each other. The evaluation was standardised by having the readers answer three questions: (a) Were there any missing values? (b) Was information on missing values provided in the methods or results section? (c) Was a specific strategy used to handle the missing data and, if yes, which one? The readers then met to discuss their reviews and to reach a consensus.

The Conflicus study

The Conflicus study is a 1-day cross-sectional study designed in 2006 by the European Society of Intensive Care Medicine (ESICM) from 306 ICUs in 26 countries. The ESICM ethics section prepared a questionnaire to collect data on ICU conflicts, to be completed by all staff members working in each participating ICU on 7 December 2006. The primary objective of the study was to examine the frequency, characteristics and risk factors of conflicts in ICU [7]. As job strain was known to be strongly associated with conflicts in the ICU, an ancillary objective was to identify risk factors for job strain in ICU workers.

Job strain was measured using a 12-item scale derived from the Job Content Questionnaire [8]. This scale explores three domains (job demand, job control and social support). The score is the difference between well-being (represented by job control score plus social support score) and the job stress (job demand score). A high score indicates a low level of stress. In addition to respondent characteristics, each ICU was asked to provide detailed information on the ICU and hospital. Country characteristics were retrieved from the World Health Organisation website (http://www.who.int/research/en/). For the sake of simplicity, we decided to focus only on respondents’ characteristics.

Among the 7,771 respondents initially included, 7,209 (93 %) completed the job strain questionnaire. For the remaining questionnaires the level of missingness of other variables was higher than 94 %, making imputation of job strain hazardous.

Because the questionnaires were self-completed by the ICU workers, missing data were expected. As reported in Table 1, the frequency of missing data for each of the 17 items ranged from 0 to 8 %. The first step in dealing with missing data is to identify the nature of the missing information.

Table 1 Characteristics of respondents when information was available and frequencies of missing values per variable

Statistical analysis of Conflicus study data

Respondent characteristics were described using n (%) for qualitative variables, and median (interquartile range, IQR) or mean ± SD for quantitative variables. The frequency and percentage of missing values for each variable were collected.

A missingness indicator was built for each item by assigning a value of 1 in the event of missing data for a respondent and of 0 otherwise. We then used the Wilcoxon non-parametric test to assess the relationship between the job strain score and the missingness indicator for each item.

Respondent-related variables associated with the job strain score were identified using a linear mixed model including random effects for country and ICU and discarding missing values. We then built a multivariate model including all respondent-related variables yielding p values smaller than 0.10 by univariate analysis. Observations with at least one missing value for one included variable were withdrawn from model fitting (this corresponded to the default process when using SAS software, i.e. “complete-case analysis” or “listwise deletion”). Then, the multivariate model was rebuilt twice, by replacing missing values with the median value for quantitative variables or the most frequent modality for qualitative variables and by using multiple imputations with IVEware. Multiple imputations create several copies of the original data set, in which missing values are imputed by values that differ slightly across the copies. This approach reflects the uncertainty regarding the imputed value. IVEware uses sequential regression multiple imputation (SRMI). Briefly, it specifies an imputation model for each incomplete variable, using other variables as predictors. The imputed variables are used in subsequent imputation models, and the process is repeated until convergence occurs (see electronic supplementary material 1 and 2 for details).

We did not check for interaction among risk factors in the multivariate model and did not address the issue of multiple testing. Post-imputation diagnostics were used for multiple imputation process assessment [9]. We assumed that our data were MAR. It was an intermediate compromise between plausibility (multiple imputation methods behave well with MAR which is more realistic than the MCAR assumption) and complexity (considering NMAR would need more complex consideration going beyond the scope of this article). For both imputation techniques, the percentage of variation of estimates [and standard errors (SEs)] from the complete case analysis were computed.

All statistical analyses were performed using the MIXED and MIANALYZE procedures in SAS software v9.3 (Cary, NC, USA) and IVEware v0.1 (Survey Research Centre Institute for Social Research, University of Michigan, MI, USA).

Results

Literature review

The detailed characteristics of the articles are reported in the electronic supplementary material 6. Of the 44 articles, 4 (9 %) were clinical trials and the remaining 40 (91 %) were observational studies. The median (IQR) number of analysed subjects was 749 (138–3,213). From the 44 articles, 16 (36.4 %) provided no information on whether missing data occurred, 6 (13.6 %) declared having no missing data, 20 (45.5 %) reported that missing values occurred but did not handle them (complete case analyses) and only 2 (4.5 %) used sophisticated statistical methods to manage missing data (multiple imputation and missing variable indicator, respectively).

Conflicus data analysis

The characteristics of respondents when information was available and frequencies of missing values per variable are described in Table 1. The proportion of missing data for independent variables ranged between 0 and 8.4 % (median = 2.9 %). Taking the missing-data pattern into consideration, multivariate analysis using the software default complete-case approach and including all 18 variables would have excluded 24 % of the data set observations (data not shown).

When studying the relationship between the missing-data pattern and job strain, we found that numerous questionnaire items with missing values were associated with the job strain score. Significantly higher job strain scores (i.e. lower stress) were noted in respondents with missing values for the following items: age (p < 0.0001), number of children (p = 0.003), degree of religiosity (p = 0.007), average number of work hours per week (p = 0.001), number of weeks since last vacation (p = 0.02), number of patients cared for during life-sustaining treatment withdrawal within the last week (p = 0.038), number of patients who died within the last week (p = 0.019) and use of antidepressant therapy by the respondent (p = 0.043) (Table 2). Also, the number of missing items per respondent was significantly associated with the respondent’s job strain score (Fig. 1). The final multivariate model obtained for the complete-case analysis and the same models built using the two different imputation methods are shown in Table 3. Diagnostics analyses on the values imputed using multiple imputation were performed but did not reveal critical behaviour (electronic supplementary material 3 and 4).

Table 2 Differences in respondent job strain scores based on the missing values
Fig. 1
figure 1

Histogram of the number of missing variables per respondent according to the job strain score

Table 3 Risk factors for job strain in the Conflicus study ICU workers

Although the estimates obtained using the three methods seemed similar, estimates varied from 0.2 to 60 % between complete case analysis and imputed-data analyses. Similarly, SEs varied from 7.1 to 18.2 %. Variables showing the highest estimates variation were age (up to 60 % in multiple imputation and up to 39 % for median/most frequent modality imputation method), and the number of children (respectively 45 and 52 %). All estimates and SEs variation rates are shown in electronic supplementary material 5. The SEs were lowest using the median/most frequent modality imputation method, in keeping with the fact that this method artificially decreases the variability within a variable by using the same value to replace all missing data. SEs were largest with the complete-case model because of the smaller respondent sample (24 % fewer respondents). Most of the multiple imputation SEs were intermediate between those of the other two methods, as the number of observations used was 100 % and the uncertainty of imputations was taken into account.

Discussion

A cross-sectional analysis of a representative subset of the ICU literature at one specific time point showed that more than one-third of the articles did not mention data completeness and that missing data were noticed in almost half of these studies. Also, missing data were handled very rarely. In our illustrative case, discarding incomplete observation would have led to the exclusion of 24 % of the responders, who were also the ones observing stronger job strain. Then, we did observe changes in parameter estimates and SEs (thus in results interpretation) after using missing data imputation. Finally, we proposed an algorithm to help the clinician through the statistical analysis in the event of missing data.

Our analysis had some limitations. First, the review of critical care literature was based on three majors critical care journals but consequently was not exhaustive of the overall critical care literature. Considering the statistical analysis, the assumption of MAR missingness may be unrealistic in our data. Also, we discarded 7 % of persons who completed neither job strain items nor other questions, and who were probably very different from the 7,209 respondents of our analysis sample. Another limitation was that the multiple testing issue was not addressed in our article. Although this can (as missing data) lead to erroneous conclusions, this matter was beyond of the scope of our objectives. We proposed the use of multiple imputation using SRMI because it was adapted to the survey design of our data. However, this method may not be adapted for other specific study designs in clinical research, for which other suitable methods must be preferred. Finally, despite all efforts that can be provided to ensure appropriate missing data completion, their accuracy will never reach that of the true data, highlighting the fact that one should prioritize efforts on data collection rather than on post data completion.

We could not find other estimations of the frequency of missing data in clinical studies in the literature. Our partial literature review was sufficient to unmask the existence of this issue in many cases. Although awareness among clinicians of the impact of missing data has improved in recent years, little practical information is available about how to report and handle missing data or about the impact of inadequate reporting and handling on the validity of the data. In recent years, guidelines were developed to help improve the reporting of missing data in health research (see Equator Network website for reporting guidelines), and to address the missing value issue in RCTs [10, 11]. There is no unanimous agreement on missing data handling strategy. Thus, we proposed an algorithm based on our experience for analysing data sets with missing values (Fig. 2).

Fig. 2
figure 2

Suggested algorithm for analysing data sets with missing values

From our analysis we saw that discarding observations with missing values raises two major problems: the induction of bias (leading to erroneous results) and loss of statistical power (leading to loss of precision). As shown by our analysis of the Conflicus study, job strain was greater in respondents who supplied incomplete data and increased with the number of unanswered items. Thus, confining the analysis to complete cases would not only diminish statistical power by eliminating 24 % of the population, but also bias the results by producing spuriously low job strain levels. More specifically, on the basis of a strict interpretation of the 5 % significance threshold, ICU workers having at least two children would have significantly increased their job strain score of 0.21 points, whereas multiple imputation analysis revealed a non-significant increase of 0.11 points (50 % less). Another important issue when considering biases introduced by missing data in a clinical study such as Conflicus is the possible existence of specific interactions between questionnaires with missing data (i.e. respondents giving no replies for a given variable) and the outcome variable (i.e. the job strain level). Several levels of interaction may occur. First, it is likely that the characteristics of respondents providing no information on their personal life (having children, being happy or living alone) may differ between respondents with high versus low job strain, leading to bias for the identification of job strain risk factors based on NMAR data. Second, reasons for not replying to a question may be directly related to the outcome variable. Job strain may prevent clinicians from building a relationship with a partner and from having children; in this case, not having children is a consequence and not a cause of high job strain.

Many methods exist to cope with missing data; however, the simplest may result in unrealistic interpretations and the most sophisticated may appear daunting to use.

Complete-case analysis (or listwise deletion) is the most widely used method and consists in deleting observations that have missing values. Complete-case analysis is the default setting of standard statistical software. It produces unbiased estimates only with MCAR data, as the analysed subpopulation remains representative in this situation. Nevertheless, even with MCAR data, the loss of power induced by the deletion of observations may be problematic if the original data set is not large enough. The use of simple imputation methods (median or most frequent value, hot deck imputation) artificially reduces data variability and thus artificially increases confidence in the results obtained. These methods should be avoided.

We encourage clinicians to use multiple imputation using SRMI (or chained equation) for several reasons : first, the overall principles of multiple imputation and the SRMI are understandable; second, this approach uses well-known models (linear, Poisson, logistic regressions models); third, the fitting of the imputation models is based on common guidelines of regression modelling, as is the post-diagnostics of the fitted model; fourth, this approach allows one to specify bounds or to perform conditional imputation (number of cigarettes smoked imputed only for ‘ever smokers’). Additional information and an introduction to different alternative methods are available in electronic supplementary material 1.

In conclusion, the existence of missing data is a common problem in medical studies. Ignoring or not handling this problem is the most often encountered behaviour in the clinical literature but amounts to concealing part of the results. This mainly biases study results and reduces statistical power that might result in erroneous study conclusions. We strongly encourage clinicians to report missing data and to use appropriate statistical techniques for handling missing data. To this purpose we provided a decision algorithm to handle missing data in a clinical study analysis.