Introduction

Length of stay (LOS) for mental disorders has fallen sharply over the last 30 years in the US, with mean LOS for psychiatric admissions to general hospitals falling from 12.1 to 9.6 days between 1988 and 1994 alone (Mechanic et al. 1998). These reductions have coincided with a scaling back of public mental health services and an expansion in care delivered in private psychiatric hospitals and general hospitals (Mechanic et al. 1998), an increasing proportion of which has been paid for by private insurers and Medicaid under managed care arrangements (Frank et al. 2009). However, inpatient services continue to account for 16% of US mental health spending, even after many years of cost reduction, both absolute and relative to increased prescribing costs (Frank et al. 2009), and LOS continues to be longer overall for mental disorders, especially schizophrenia, than for physical disorders (National Center for Health Statistics 2010), with variation in LOS therefore being more prominent than for other disorders. Quite apart from their costs, psychiatric admissions may be experienced as stigmatising and unpleasant; there are therefore important reasons for reducing any unwarranted variation in LOS, especially as randomised studies of short versus long psychiatric hospitalization have demonstrated no difference in readmission rate (Johnstone and Zolese 1999).

There is a substantial body of research into the causes of variation in LOS. In a minority of cases, these studies have directly analysed changes in LOS over time alluded to above (Freiman et al. 1989), but overall they provide greater insight into influences on LOS at a point of time, and especially those that operate at the individual level, than into longer term trends towards reduction in LOS. Much of this research has been performed by health economists, who have been concerned with estimating the effects on LOS of primary payer (Hendryx and DeRyan 1998), payment mechanism (Frank and Lave 1985a, 1986, 1989; Frank et al. 1987; Freiman et al. 1989; Lave and Frank 1988, 1990a, b; Rupp et al. 1985) or provider characteristics (Wallen 1987). Many authors during the 1980s were concerned with the design of per case prospective payment systems—a project whose feasibility depends centrally on the extent to which LOS may be predicted (Ashcraft et al. 1989; English et al. 1986; Mitchell et al. 1987; Siegel et al. 1986; Taube et al. 1984a). Some economic studies have been inspired by economic models of the behaviour of healthcare providers (Ellis and McGuire 1986; Seidman and Frank 1985). Studies from a sociological perspective similarly relate variation in LOS to generalised models of the behaviour of doctors and carers (Gruber 1982) or patients (Ortega and Rushing 1984). Clinically-oriented researchers have suggested contrasting practical reasons for studying LOS. Predictions of LOS might, for example, be used to guide appropriate choices of care early in admission (Goldstein et al. 1988). It has also been suggested that evidence of practice pattern variation based on case-mix adjusted models might be fed back to clinicians (Huntley et al. 1998; Lyons et al. 1991).

Although researchers have sometimes been optimistic about the possibility of predicting LOS (Anderson et al. 2004; Ashcraft et al. 1989), the decision to develop a prospective per diem payment system for Medicare funded patients in speciality psychiatry hospitals and certified psychiatric units in part reflects a general consensus that this may be impractical based on existing data sources (Cotterill and Thomas 2004; Lave 2003). There have been correspondingly fewer studies performed in the US in recent years. We suggest however that further original research is justified. Firstly, there is an important scientific justification: Psychiatric hospitals remain important social institutions and a central focus of the activity of psychiatrists and other professionals; from this perspective the determinants of LOS document the ways in which these institutions and professionals distribute their activity. We therefore suggest that up to date studies of LOS are a key component of any attempt to describe and account for the operation of psychiatric services and are an important counterpart to better known sociological and anthropological approaches to the psychiatric hospital (Goffman 1968; Stanton and Schwartz 1954). Secondly, despite their weaknesses, payment models based on per case prospective payment persist in the US (Lave 2003; Quinn 2008) and similar systems are being introduced elsewhere, including the UK (Evans-Lacko et al. 2008). Payment systems combining per case and per diem payment, which similarly necessitate analyses of LOS, may furthermore have the advantage of better balancing cost and quality of care by combining the incentive to increase LOS associated with per diem payment with the incentive to reduce LOS associated with per case payment (Ellis and McGuire 1986). Thirdly, it remains possible that the identification of modifiable determinants of LOS could lead to improvements in the quality of care. Unwarranted practice pattern variation (Huntley et al. 1998; Lyons et al. 1991) is one such influence. In our own research into long psychiatric admissions in England, we have been interested in the role of homelessness and rehousing (Tulloch et al. 2008); this has also been a topic of interest in the US (McGuire and Mares 2000; Salit et al. 1998).

The essential foundation for any future research into LOS is a sound understanding of previous research. There are previous non-systematic reviews which relate to the literature on LOS (Caton and Gralnick 1987; Hermann et al. 2007; Jencks et al. 1987; McCrone 1995). We aimed to perform a review that would be systematic, up to date, and, in particular, would be of practical assistance to researchers wishing to update and improve the literature on LOS both in the US and, as in our case, in countries such as the United Kingdom where research into LOS is much less well developed. The review was performed at the planning stage of an analysis of the determinants of LOS in a single English mental health care system using a novel anonymised clinical data repository (Stewart et al. 2009). For these purposes, the greatest practical benefit would derive from a summary of core replicated findings, especially relating to case-mix variables which should not be omitted from new analyses, and any indication of appropriate sample size. In order to be able to draw such conclusions it is important that there be limited heterogeneity between included studies and that there be sufficient screened studies to permit meaningful quality appraisal. As differences in practice between countries are likely to be substantial, we restricted our review to studies from the US, which vastly outnumber studies from other countries. We therefore did not consider important non-US studies such as Boot et al. (1997) (Australia), Stevens et al. (2001) (Germany) and Hodgson et al. (2000), McCrone and Phelan (1994) (England).

Methods

Search Strategy

MEDLINE, PsychINFO, EMBASE and EconLit databases were searched on 30th July, 2008. The search strategy (available from the corresponding author on request) was intended to locate any regression based study of the determinants of LOS among psychiatric patients using a mixture of indexed terms including “Length of Stay”. The reference lists of those studies proceeding to quality appraisal were also examined in order to find any additional possibly relevant studies: these were then screened for inclusion and quality appraised where appropriate.

Inclusion Criteria and Screening Procedures

The aim of the review was to synthesise relatively recent evidence regarding the determinants of LOS among US acute psychiatric patients aged 21–64. Looking at titles, abstracts and author’s addresses and, if necessary, the full text of articles, the following inclusion criteria were applied:

  1. (a)

    A multivariable linear regression or generalised linear model, e.g. survival analysis, was used to estimate the effects of exposure variables on LOS, ln(LOS) or log10(LOS). (Studies reporting only analysis of variance (ANOVA) or covariance (ANCOVA) were excluded as these techniques do not yield estimates of the direct effect of exposures on LOS.)

  2. (b)

    The study was non-duplicated and published between 1978 and 2007 in an academic journal, using data no earlier than 1976.

  3. (c)

    There was no exclusion based on psychotic or non-psychotic diagnosis or legal status.

  4. (d)

    The sample was neither restricted to child and adolescent patients nor to patients over 65.

  5. (e)

    The sample was composed of consecutive admissions or a random sample of admissions or a combination of these.

  6. (f)

    The analysis was not stratified by diagnosis. (Such studies produced estimates of effect which were not generalizable to the entire population of interest.)

Quality Appraisal

The quality of analyses was appraised using a modification of an existing procedure for studies of prognosis (Hayden et al. 2006) in which each study is rated on how well it avoids bias due to each of the following: study participation, study attrition, measurement of the exposure of interest, measurement of potential confounders, outcome measurement and analysis. The rating is on a four-point scale—either yes, partly, no or unsure. Any study rated as “no” on any domain was excluded.

Some modification and operationalisation of these guidelines was necessary in order to account for the specific features of LOS studies. Study attrition was not relevant. A particular difficulty regarding the appraisal of study participation was that identical data or data derived from the same hospital or hospitals during the same period in some cases gave rise to more than one individual regression analysis. In order not to bias the results of the review, it was necessary to ensure as far as possible that data taken from a hospital or set of hospitals at one time only contributed one regression analysis to the review. Therefore multiply analysed data were identified and analyses based on fewer observations or those using less standard statistical methods were rejected. The results of quality appraisal are tabulated according to the full operationalised criteria (see Table 2).

Data Extraction

Meta-analysis was not attempted because variables were measured in many different ways, analyses was performed using several different techniques and techniques for the meta-analysis of regression slopes are relatively undeveloped.

The essential study characteristics of each included study were tabulated. Where applicable, the following numerical data were also extracted: (a) the total variance explained (r 2); (b) an overall measure of LOS for the sample (arithmetic mean, geometric mean or median); and, for any variable estimated in at least four analyses, (c) the direction and statistical significance of reported effects.

For those analyses based on linear regression of log10(LOS) or ln(LOS), regression coefficients for the variables in (c) above were also tabulated in their exponentiated form, i.e. the form in which they are multiplicative on LOS. This was done in order to provide some indication of the size of these effects. (Coefficients from the smaller number of studies using linear regression of LOS or survival analysis do not have multiplicative effects on LOS and are therefore not directly comparable with these; for clarity these were omitted.) In the case of continuous variables, effects were appropriately scaled in order to demonstrate the effect of different values that might be observed in practice rather than the effect of a single unit (see notes to Table 5).

Results

After removal of duplicate records, 5,371 database records were screened. All but 106 publications were excluded based on the review of title, abstract and author’s address. 76 of these 106 publications were rejected at the screening stage and are listed in Table 1.

Table 1 Studies rejected before quality appraisal

The remaining 30 publications contributed at least one analysis which proceeded to quality appraisal. One of these publications (Schumacher et al. 1986) included an alternative analysis stratified by diagnosis and another (Lave and Frank 1990b) included other analyses that were published elsewhere. These analyses were rejected before quality appraisal. Review of the references of the 30 studies meeting inclusion criteria led to screening of a further 10 publications. One of these (Freiman et al. 1989) yielded an analysis of the 1984 and 1985 PATBILL data which met inclusion criteria and also proceeded to quality appraisal.

A total of 31 publications were quality appraised with 13 being rejected (see Table 2 for reasons for rejection tabulated according to quality appraisal criterion—full details available on request from the corresponding author). Data from 18 publications were extracted (Bezold et al. 1996; Blais and Baity 2005; Boelhouwer and Rosenberg 1983; Brock and Brown 1993; Compton et al. 2006; Frank and Lave 1986; Freiman et al. 1989; Greenfield et al. 1989; Huntley et al. 1998; Kato et al. 1995; Kirshner and Johnston 1985; Lave and Frank 1990a, b; Lyons et al. 1991; McLay et al. 2005; Rupp et al. 1985; Stern et al. 2001; Wallen 1987). Three publications (Compton et al. 2006; Frank and Lave 1986; Huntley et al. 1998) contained two separate analyses based on patients admitted at the same time to the same hospitals and the largest was chosen from each pair of analyses. A further two publications (Freiman et al. 1989; Lave and Frank 1990a) analysed different categories of hospital separately and two regressions were extracted from each. Therefore data from a total of 20 regression analyses were extracted.

Table 2 Results of quality appraisal

Study Settings and Overall LOS

The basic characteristics of the 20 included analyses are tabulated in Table 3. There were 11 analyses of single hospitals (with sample size ranging from 41 to 6,366) and 9 analyses based on multiple hospitals (with sample size ranging from 3,118 to 58,821). Most studies were exclusively of general hospitals or at most included a small proportion of private hospital patients (Lave and Frank 1990b). Exceptions were Greenfield et al. (1989) who studied a private psychiatric hospital, and Stern et al. (2001) and Huntley et al. (1998) who studied state psychiatric services. Some measure of overall observed LOS was available for all but two analyses. The 16 reported values for arithmetic mean LOS ranged from 6.8 to 95.4 days (interquartile range 7.4–21.3). With the exception of Stern et al. (2001), whose mean LOS of 95.4 days was an outlier and presumably related to the special nature of the state hospital population studied, the maximum arithmetic mean LOS was 24.9 days. The overall variance explained was reported by all but one of the analyses based on linear regression. It ranged from 0.11 to 0.49 and therefore a significant amount of variance was unexplained. There was no significant relationship between log sample size and r 2 after omission of one or two influential observations (the two smallest studies).

Table 3 Characteristics of included studies

Exposure Variables and Their Effects on LOS

One analysis used survival analysis (Stern et al. 2001); all other studies used linear regression of LOS, log10(LOS) or ln(LOS). Overall, the reporting of results was incomplete—only two studies fully reported the distributions of all exposure variables and a full set of regression coefficients and intercept value (Kato et al. 1995; McLay et al. 2005). One study author was able to provide all of the additional information not included in the original publication (Compton et al. 2006).

The included studies estimated the effects of a total of 82 distinct exposure variables. The effects of 66 of these were estimated by 3 or fewer studies.Footnote 1 Table 4 summarises the direction and significance of the effects of the 16 variables whose effects were estimated by four or more analyses. As outlined above, Table 5 is restricted to the subset of analyses based on regression of log10(LOS) or ln(LOS) and regression coefficients for the variables above are tabulated in their exponential form, i.e. the form in which they are multiplicative on LOS.

Table 4 Direction and significance of effects
Table 5 Multiplicative effects of exposure variables on LOS

Age

All selected analyses included age as an exposure variable. Summary of the effect of age is complicated (a) by the variety of ways in which it was modelled (linear, log-linear, quadratic or categorical), (b) the evidence that there may be non-linear effects of age with lowest LOS in middle age (Lave and Frank 1990a; Rupp et al. 1985), and (c) the varying age composition of different samples, with two large samples being composed largely of individuals over 65 year old. Despite these complications, Table 4 demonstrates that significant associations with age were found by 8 of 10 studies with sample size 3118 or more, and by a smaller proportion of smaller studies. Referring to Table 5, it may be seen that coefficients for age when appropriately scaled were comparable to the effect of psychosis.

Diagnosis

All selected analyses also included diagnosis as an exposure variable. A significant effect of diagnosis was found by all studies with sample size 760 or more (see Table 4). Where the direction of effects was reported, psychosis was in all cases associated with increased LOS compared with a reference non-psychotic category. Exponentiated coefficients were similar for psychotic diagnosis (1.12–1.31) and large versus small hospital size (1.05–1.37) and were comparable with effects seen for age, but were greater than those seen for female gender (1.01–1.07).

Female Gender

Female gender was associated with increased LOS in each of the nine studies with sample size 3,118 or more, but was unassociated with LOS in 6 of 8 smaller studies. As stated above, the size of coefficients was consistently smaller than for other frequently measured variables.

Other Variables

Of the remaining variables, several were consistently or near consistently associated with shorter LOS: AMA discharge, prospective payment, selective contracting, being married and being detained. The effect of AMA discharge was largest (exponentiated coefficients 0.52–0.75) with the size of the exponentiated coefficients for the other variables being in the range 0.75–0.92. Larger hospital size was associated with increased LOS in six of seven analyses and the size of these effects is discussed above.

Substance abuse comorbidity and GAF/GAS score were examined only by smaller studies, with limited evidence of effect. The effects of medical comorbidity, the number of previous admissions and payer were also chiefly estimated by smaller studies, but one larger study demonstrated an association for each of these. Overall, the paucity of larger studies measuring these five variables makes it difficult to draw firm conclusions.

Ethnicity was associated with LOS in only 2 of 11 analyses, and notably 5 larger analyses demonstrated no effect. Similarly, teaching hospital status and number of psychiatrists per capita were not consistently associated with variation in LOS.

Discussion

This systematic review synthesises the findings of the highest quality US studies of LOS for patients with mental disorders published over the last 30 years. Strengths of the review are that it was systematic and that results were fully tabulated and ordered by sample size. A set of variables with generally consistent effects were identified, although, in line with previous research and commentary, studies based on linear regression generally reported modest explained variance (r 2). This set of variables and the conclusions drawn regarding desirable sample size provide some basis for planning future analyses. Both for practical and scientific reasons there remains a case for further research into variation in LOS. Some further comments on the methodological aspects of reviewed studies are therefore combined with suggestions for future improvements.

Study Populations and Treatment of Transfers Between Units and Time Period

The included studies were largely based on discharges from general hospitals. Although these are plainly of great importance in US mental health services, it does seem that private psychiatric hospitals and state hospitals are relatively underresearched. Another obvious feature of the studies reviewed here is their age: most studies are now at least 10 years old, and most of the largest studies are based on data that are 20 or 30 years old. Even though the effects found were remarkably consistent, the applicability of these studies to current population is inevitably somewhat uncertain. Overall, research into LOS in psychiatry needs updating.

Outcome Measurement

Although measurement of LOS appears unproblematic at first sight, there are some complications in relation to the treatment of leave periods and repeat admissions (Gurel 1966) and also in the treatment of transfers between units, which have been found to occur commonly (Lee et al. 1985). All of these phenomena have the potential to dramatically alter measured LOS and may bias it downwards relative to the overall LOS. In principle, this difficulty would vitiate any attempt to study the probably substantial effect of time since admission using survival analysis (Stern et al. 2001). It is also of clear relevance to researchers interested in the design of payment systems. We would suggest that future researchers should at least justify their approach to the treatment of repeat admissions and leave periods and should preferably perform appropriate sensitivity analyses. Transfers are less easy to deal with. The development of state level data warehouses (Coffey et al. 2008) might provide a means of performing such analyses, as might the use of identifiers related to primary payer. These strategies might also provide a means of combining analyses of general, state and private hospitals.

Exposure Measurement

Based on a large number of analyses with varying sample size there was clear evidence that low statistical power had affected some estimates of the effect of diagnosis, marital status, age and gender. It is also possible that inadequate statistical power explained the preponderance of negative results for medical comorbidity, payer, previous admissions, GAF/GAS scale score and substance abuse comorbidity. The smallest sample size above which there was a consistent effect of diagnosis was 760; for marital status it was 3,335 and for female sex it was 3,118. We suggest that future studies of LOS should have a sample size of at least 3,000, as studies which are underpowered to detect small but replicated effects, such as those of gender, will lose some credibility and will be unable to examine possible mechanisms such as gender differences in contribution to family income (Gruber 1982) or symptoms (Greenberg and Bornstein 1989). In view of their consistent effects, we also suggest that such studies should aim to estimate the effect of each of the following: age, diagnosis, sex, marital status, hospital size, AMA discharge, legal status and, where appropriate, payment method. In addition, age should if appropriate be modelled as a non-linear effect, and, because of the possibility of effect modification e.g. effects of age being restricted to those with depression (Ashcraft et al. 1989; Taube et al. 1984b), tests for interaction should be performed and interaction variables included where appropriate.

It would of course be desirable to study additional exposure variables. Our suggestions below are restricted to variables which have already been studied to a limited extent, for which there is an obvious case to be made for inclusion and for which there are grounds for hoping that their inclusion will increase variance explained above the modest level typically found. The need for an adequate sample size largely precludes the use of purpose-collected data and therefore the suggestions made are restricted to techniques that could be applied to routine data, data collected in electronic patient records (Coffey et al. 2008) or data collected for quality control or payment purposes, e.g. the System for Classification of Inpatient Psychiatry (SCIPP) (Hirdes et al. 2003).

Firstly, we suggest that illness severity, as this relates to resource use rather than mortality (McMahon and Newbold 1986; Mezzich and Sharfstein 1985), should be given special priority. Severity plausibly relates to the physician’s notion of need for hospitalization and therefore quite directly to LOS. Among possible measures of severity, ADL scores and GAF are gamable, and therefore of doubtful utility for payment systems, but are plausibly related to LOS. The GAF is however now available within some routine data sets (Department of Veterans Affairs 2008), and severity banded DRGs now form part of the Health Care and Utilization Project dataset (Agency for Healthcare Research and Quality 2009). Alternatives such as employment status, marital status and housing type, which relate to lifetime severity if not current functional impairment, are in contrast not gamable and are readily measured. Measures of historic resource use provide an alternative means of modelling severity, although again only relating indirectly, if at all, to current functional impairment. Length and number of previous admissions and a variable modelling individual level clustering of admission length were found to be highly significant and substantial by the study of Stern et al. (2001). The authors’ own research in the UK (Tulloch et al. 2008) suggests that the maximum length of previous admissions and a history of admission to psychiatric intensive care, even if limited to the preceding 2 years, are both strongly related to current LOS. These measures of resource use only require the existence of several years of administrative data and would perhaps be more acceptable to providers than supplementary data collection.

If severity not only exerts direct effects on LOS but mediates and confounds the effects of other variables, a failure to model its effect may not only reduce the variance explained in studies of LOS but also lead to confounded estimates of the effects of other variables. This applies particularly to the second set of variables which it is suggested merit further attention, namely variables for hospital, ward or unit, physician and hospital type. These variables are of central importance to understanding variation in LOS ascribable to organisational factors, represented in this review only by measures of hospital size. Quite apart from the bias that will result if severity confounds the effect of these variables, a failure to include it in models used to derive prospective payment tariffs may also lead to inequity across providers and encourage adverse selection (Hermann et al. 2007). The measurement of severity should therefore be supplemented by estimation of this second set of variables.

The Interpretation of Associations with LOS

While it is relatively straightforward to tabulate the quantitative findings of the studies reviewed here, it is more difficult to summarise authors’ interpretations of these effects. Indeed, the extent to which any interpretation is offered varies, and generally authors suggest possible mechanisms for at most a few of the exposure variables measured. In the discussion sections of the studies reviewed here, particular attention is paid variously to managed care (Bezold et al. 1996), payment method (Frank and Lave 1986; Freiman et al. 1989; Lave and Frank 1990a, b), active military service (Brock and Brown 1993; McLay et al. 2005), substance misuse (Compton et al. 2006), violence (Greenfield et al. 1989), cognitive impairment (Kato et al. 1995), physician practice (Lyons et al. 1991), previous service use and legal status (Stern et al. 2001) and illness stage (Wallen 1987). Some studies did not articulate any theory of the link between exposures and outcome. Diagnosis, age and gender were not discussed in detail by any author. It seems in general that many findings call for little or no explanation. This phenomenon may reflect the close relationship between clinical practice and clinical research—presumably, variables which are felt to possess face-validity as determinants of LOS (Fetter et al. 1980) are felt to have been satisfactorily dealt with as long as the associated effect has been quantified.

An alternative approach, followed by some sociological authors, is to explicitly theorise about the determinants of LOS (Gruber 1982; Ortega and Rushing 1984). However, the extent to which these abstract models relate to the micro level of clinical decision making may be questioned, and for this reason the intuitive selection of variables based on clinical knowledge may be preferable. Although in principle it would be possible to apply qualitative techniques to the study of psychiatric decision-making and attempt to use the findings of such studies to inform quantitative study design, the feasibility of this approach is uncertain. Overall it is to be hoped that future researchers will be more comprehensive in their discussions, and more explicit in their hypotheses, especially as standard epidemiological techniques could be applied to test possible explanations for some replicated findings, such as the possibility that medical comorbidity explains the association with old age.