Introduction

In 1996, when the Center for Mental Health Services and the Substance Abuse and Mental Health Services Administration (SAMHSA) began funding the Mental Health Statistics Improvement Program (MHSIP), one of the required tasks was to develop a consumer-oriented report card of mental health services that would establish national performance benchmarks.1 A task force of mental health agency representatives, clients, and advocates worked with the National Association of State Mental Health Program Directors to develop an adult consumer satisfaction survey as a source for many of the report card performance measures. The task force identified specific elements of mental health services essential to consumer satisfaction: the access to services, the quality and appropriateness of services, and the outcomes of services, in addition to a global measure of satisfaction.2 These became the MHSIP consumer satisfaction survey’s main domains that were operationally defined as scales composed of individual items.

Access was defined as the quickness, ease, and convenience with which consumers received services. Quality-Appropriateness was defined as the quality and appropriateness of services the consumer received and how well the services matched his or her individual needs. Initially, the task force had two items defining a Participation domain indicating the clients’ participation in treatment, but these items are highly similar to the Quality-Appropriateness domain and have been generally included in that scale. Outcomes was defined as the consumer’s subjective perception of symptom reduction and improved functioning. Satisfaction was defined as how well the consumers liked the services overall. The initial items for these scales were derived from previous satisfaction surveys or were developed by the task force. Consumers rate each of the survey items on standard five-point Likert-type rating scales ranging from strongly agree to strongly disagree.

Previous factor analyses

Lebow reviewed factor analyses of a number of mental health satisfaction scales.3,4 Factor analysis of most of these scales resulted in a large general satisfaction factor. However, narrow delimited factors were also frequently found, which is consistent with a hierarchical factor structure of narrow lower-order factors and a general higher-order satisfaction factor. Frequent problems noted were that reliability was rarely reported, unrepresentative client samples were used, and surveys often had poor response rates.

Four previous studies have factor analyzed the MHSIP consumer satisfaction survey. A number of different versions of the consumer satisfaction survey have been developed which contain 20, 28, 36, or 40 items.1,2,5,6,7,8 Because of changes over time to the items in the consumer satisfaction surveys, the previous factor analyses of the survey rarely examine the same exact set of items twice. However, the overlap in items was generally high because the same pool of items was used for the consumer satisfaction surveys in the different studies, and the longer versions of the consumer satisfaction surveys typically contain all the items in the shorter versions of the consumer satisfaction surveys.

Wackwitz conducted factor analyses of the original 40-item consumer satisfaction survey to examine the psychometric characteristics and factor structure in order to explore the feasibility of creating a shorter version of the survey.5 The original 40-item consumer satisfaction survey produced clear factors for all four scales with reliability ranging from 0.81 to 0.94. A shorter 21-item survey was derived to measure the same four scales.

Eisen and colleagues also conducted a factor analysis of the original 40-item consumer satisfaction survey excluding the three Satisfaction items.6 A three-factor oblique solution clearly identified factors corresponding to the scales of Access, Quality-Appropriateness, and Outcomes with reliability ranging from 0.81 to 0.91.

Lutterman et al. conducted factor analyses of the 28-item consumer satisfaction survey in the process of developing the Social Connectedness and Functioning scales, finding a seven-factor solution with strong support for Quality-Appropriateness and the new scales of Social Connectedness and Functioning.7 Based in part on the results of this study, the final scales for Functioning and Social Connectedness were selected.

Jerrell conducted a factor analysis of the 20-item consumer satisfaction survey and found three factors that defined Access, Quality-Appropriateness, and Outcomes with an average reliability as measured by Cronbach’s alpha of 0.77 and 2-week test-retest correlations of 0.55.8

The current study

In this study, the MHSIP consumer satisfaction surveys administered by the Texas Department of State Health Services, Mental Health and Substance Abuse division from 2004 to 2012 were factor analyzed to determine the degree to which the original scales could be identified and the relation of the survey scales to external criteria. These annual surveys provide large, diverse random samples of adult public mental health service recipients diagnosed with a severe persistent mental illness.

The consumer satisfaction surveys are used to evaluate the performance of state mental health services and are required reporting for the SAMHSA’s Community Mental Health Services Block Grant. However, Kaufmann and Phillips found that only 14% of the states administering these consumer satisfaction surveys had information about the reliability of their surveys and that less than 7% had conducted validity studies such as factor analysis.9 Approximately 400 to 500 million dollars are spent on the federal community mental health block grant annually, and since the consumer satisfaction survey is used for block grant reporting, it could be economically consequential if its results were used to help plan block grant spending priorities. Therefore, it is important to document the reliability and validity of the survey and to know the relation of the survey scales to other aspects of the treatment process.

Three previous factor analyses have called for additional research examining the relation of the consumer satisfaction survey to clinical symptoms, criteria validity separate from the survey, and analyzing the utility of the Social Connectedness scale.6,7,8 The purpose of this study was to examine the factor structure and reliability of the MHSIP consumer satisfaction surveys and to assess the concurrent validity of the scales with data external to the survey, as suggested by previous research. The relation of the survey scales to external criteria including symptom scales (as suggested by Jerrell and Shafer and Temple) and outcome measures such as hospitalization, substance abuse, housing, criminal justice involvement, and employment was used to examine the scales’ concurrent validity.8,10 Clients indicating improved outcomes in the consumer satisfaction surveys should also exhibit improvements in clinical symptom scales and employment as well as decreased criminal justice involvement and hospitalization.

Method

Survey

A 28-item core survey was used annually during 2004 and 2005. The number of questions in each scale was Satisfaction [three], Outcomes [eight], Quality-Appropriateness [nine], Participation [two], and Access [six]. In 2006, eight additional items for Functioning [four] and Social Connectedness [four] were added to the survey. In this survey, lower scores on a one to five Likert-type scale indicated greater satisfaction.

Sample

The 2004 to 2012 consumer satisfaction survey samples consisted of 4606 returned surveys, and 4242 were retained for analysis. Overall, 76% of the respondents had complete surveys, 14% were missing a single item, 2% were missing two items, and 8% were missing three or more items. Surveys retained for analysis could have no more than two missing items (92% of all clients). Surveys retained for analysis had missing items replaced with the mean.

There were 2590 (61%) females and 1652 (39%) males in the sample. The racial-ethnic distribution of the clients in the sample was 2338 (55%) Whites, 889 (21%) Hispanics, 853 (20%) Blacks, and 162 (4%) of some other race or ethnicity. The average age of clients in the sample was 46.7 (SD = 11.8). The most frequent primary diagnoses of clients were schizophrenia (34%), major depression (31%), and bipolar disorder (29%). A new random sample was drawn every year for the annual surveys. Clients may have participated in the surveys during multiple years; 237 (5%) clients had 2 years of surveys and 17 (< 1%) had three surveys.

Procedure

The surveys used in this study were nine separate annual single statewide random samples (2004 to 2012) designed to be analyzed at the state level with a 95% confidence interval. Surveys were mailed out directly from the state headquarters to the clients’ last known address. Clients included in the sample had received services from their local community mental health clinic during the survey year. On average, 512 surveys (SD = 76) were returned each year with an average survey raw return rate of 21% (SD = 6%).

Outcome data

In addition to the survey data, client records from mental health service databases were extracted in order to examine the relation of the survey to external criteria. The external criteria data for a client during the actual survey year itself was excluded, because the date surveys were conducted varied across the years and the date that a specific client’s survey was returned was not available in the administrative database that records were obtained from. Data was extracted for each client for 3 years prior to their survey year and 3 years after the survey year. Data was then recoded from specific calendar years into years relative to the survey ranging from 3 years prior to 3 years after the survey for use in the analyses.

The clients’ mental health service records included a variety of external criteria that were available for analysis. Symptom scales based on diagnoses (the Quick Inventory for Depressive Symptomatology (QIDS), the Positive and Negative Symptom Scales, and the Bipolar Depression Symptom Scale (BDSS)) were included. Records of state mental health hospitalizations were included in the study; however, hospitalizations in non-state hospitals were not available for inclusion. Additionally, a number of locally developed single-item rating scales for employment, housing, and other areas were included.

For clients diagnosed with depression, the 16-item self-report or clinician-rated QIDS scale was used.11 The QIDS was derived from the longer Inventory of Depressive Symptomatology (IDS). The QIDS assesses all symptom domains needed to diagnosis depression. For clients diagnosed with bipolar disorder, the 10-item clinician-rated Brief Bipolar Disorder Symptom Scale (BDSS) was used. The BDSS was derived from the 24-item Brief Psychiatric Rating Scale (BPRS) to provide a short reliable measure of symptom severity.12 For clients diagnosed with schizophrenia, two scales, one measuring positive symptoms and one measuring negative symptoms, were used. The four-item Brief Negative Symptom Assessment (BNSA) instrument is a clinician rating scale that is derived from the Negative Symptom Assessment and designed to determine the severity of negative symptoms of schizophrenia.13 Similarly, the four-item Positive Symptom Rating Scale (PSRS) is a locally developed clinician rating scale that is derived from the BPRS and designed to determine the severity of positive symptoms of schizophrenia.14 The PSRS consists of four BPRS items which have been found in meta-analyses of the 18-item BPRS and 24-item BPRS to be good indicators of positive symptoms, including suspiciousness, unusual thought content, hallucinations, and conceptual disorganization.15,16 The clinicians completing these scales were all licensed mental health professionals.

The Texas Recommended Assessment Guidelines (TRAG) are single-item clinician rating scales, with scores ranging from one (least severe) to five (most severe).17 Scales are anchored with specific behaviors and events to measure the areas of: social support, substance abuse, medical treatment, housing instability, harm to self or others, functional impairment, employment problems, and criminal justice involvement.

Because clients could potentially have many symptom scales and TRAG assessments on record during any particular service year, an average was computed for the multiple assessments in each individual year prior to and after the survey. Additionally, since data from the survey year was excluded, analyses were conducted using data for clients that occurred either prior to or after the survey, which the majority of clients had. For example, a client may have had services the year before and after the survey but in no other years, or they may have had services and data only for the year prior to the survey and no data after the survey. In the correlational analyses of outcome data with the satisfaction survey, pairwise correlations were used to control for missing data. For state mental health hospitalization, a categorical variable indicating any hospitalization was used for analysis. Although some clients may have died or otherwise been lost to follow-up, on average, the clients had almost 2 years of assessments in the years after their survey (M = 1.88), so the number potentially lost to follow-up was relatively small.

Results

Factor analysis and scale reliability of the consumer satisfaction survey

Exploratory, rather than confirmatory, factor analysis was used in this study because confirmatory factor analysis is limited in that it only tests the specific items and factors that are specified. Because of the variability in previous studies, exploratory factor analysis allowed the factors and the items defining them to emerge and be examined without constraints. While previous studies have generally found support for most of the overall hypothesized structure of the survey scales, there has been enough variability both in the failure of factors to emerge and in the details of the items that define the factors to warrant further exploratory analyses.

Five factors were extracted using principal factors with squared multiple correlations as initial estimates of the communalities. Factors were rotated using direct oblique minimization. The first ten eigenvalues were 15.73, 3.94, 0.98, 0.73, 0.34, 0.26, 0.21, 0.18, 0.15, and 0.10. Given the design of the survey as five to seven conceptual domains and that the differences in eigenvalues after the fifth factor alternate between 0.03 and 0.05, a five-factor solution was used. The five factors were the closest possible solution to the original domains, and the resulting pattern of factor loadings was very clear with virtually no secondary factor loadings. The five-factor solution accounted for 60.3% of the variance. The results of the factor analyses are presented in Table 1.

Table 1 Exploratory factor analysis of satisfaction items. Direct oblimin rotated factor pattern (standardized regression coefficients)

Factor one was defined by the eight Outcomes items and the four Functioning items (average factor loading = 0.73). Factor two was defined by 11 items, the nine Quality-Appropriateness items, and the two Participation items (average loading = 0.63). Factor three was defined by the six Access items (average loading = 0.61). Factor four was defined by the four Social Connectedness items (average loading = 0.72). Factor five was defined by the three Satisfaction items (average loading = 0.64). Across all factors, the items defining them had large univocal factor loadings with no secondary salient loadings.

The factor analysis results described previously were for the 3282 clients whose surveys contained all 36 items from 2006 to 2012. Surveys in 2003 and 2004 did not contain the Functioning or Social Connectedness items. However, the results of a factor analysis of the 28 items common across all years revealed that the four resulting factors of Quality-Participation, Access, Outcomes, and Satisfaction were almost exactly the same as in the five-factor solution described previously. Details are available from the authors.

The total score for the satisfaction survey was 57.4 (SD = 19.9). Temporal trends in the total scale scores were examined by year, while an ANOVA revealed a significant difference among the annual mean scores. Scheffe’s post hoc test indicated that no specific years were statistically significantly different from one another.

Scales were constructed based on the factor analysis results and the conceptual scale domains. Descriptive statistics for the total of the raw scale scores and for the average scale score item (based on the five-point Likert-type question format) were calculated. Scale reliability was assessed both by coefficient alpha for items at a particular point in time as well as test-retest correlations for 125 pairs of surveys completed by the same clients across two survey sample years (90 pairs for Outcomes-Functioning and Social Connectedness), with an average time between surveys of 2.74 years (SD = 1.75). The descriptive statistics for the scales are summarized in Table 2.

Table 2 Reliability estimates for satisfaction factor-based scales

Relation of the survey scales to outcome measures

The descriptive statistics for the external criteria (symptom and outcome measures) are displayed in Table 3. The means are arranged from 3 years before the satisfaction survey to 3 years after the survey. Because a client could have multiple assessments during any particular year (or none if they were not in service), the means in each column were calculated using the average score for each client in a particular year (and then averaged across all clients normally). Some clients may have had multiple assessments in a year and some clients may have only a single one. The vast majority (95%) of satisfaction survey clients had external criteria data in the 3 years prior to or after the survey, and, on average, the clients had 3.96 years of external criteria data (SD = 1.48) available for analysis. The general trend for the external criteria data was from higher scores (indicating more difficulties) prior to the satisfaction survey to lower scores (indicating improvement) after the survey.

Table 3 Outcome item descriptive statistics relative to years prior and after the clients’ satisfaction survey

However, the external criterion scores in Table 3 are for group means, and thus, the correlations that are displayed in Table 4 should be used to examine the relation of the satisfaction survey scales with outcomes and symptoms. The correlations are ordered from 3 years before to 3 years after the satisfaction survey and in subsets for each of the satisfaction scales. For these results, higher symptom scale scores, hospitalization, or higher TRAG scores indicate worse outcomes, and high satisfaction scale scores indicate lower satisfaction (the satisfaction items are scored from one = strongly agree to five = strongly disagree) so positive correlations indicate that low satisfaction is associated with negative outcomes and that high satisfaction is associated with positive outcomes.

Table 4 Correlations of satisfaction scales with outcome items relative to years prior and after the client’s satisfaction survey

For the survey scales of Access, Quality-Participation, and Satisfaction, there were relatively few significant correlations, primarily with the outcome scales for depression and bipolar disorders (QIDS and BDSS). However, for the Outcomes-Functioning scale, there were a large number of significant correlations of larger magnitude than the three previous scales, likely because the content of the Outcomes and Functioning scale items is more similar to the external criteria measures for symptoms and outcomes. All four symptom scales had relatively large significant correlations across all the years. The Social Connectedness scale had a number of relatively large and significant correlations with the external criteria measures. All four symptom scales had relatively large significant correlations with Social Connectedness across all the years. The TRAG scales also had significant correlations across many years.

Follow-up regression analyses were conducted fitting a general model of the five satisfaction scales (Satisfaction, Access, Quality, Outcomes-Functioning, and Social Connectedness) as predictors of the individual symptom scales 3 years prior to and after the satisfaction survey. The results are summarized here, but the detailed results are available from the authors. Overall, all of the regressions were statistically significant. However, the average adjusted R square was in the single digits, indicating a relatively small magnitude for the overall effect size. The most interesting results were first, that the Outcomes-Functioning satisfaction scale was often (but not always) the largest significant predictor. Second, the Social Connectedness scale was also equally often a significant predictor with only a somewhat smaller magnitude than Outcomes-Functioning. The scales for Access and Quality were significant predictors slightly less than half as many times as the scales for Social Connectedness and Outcomes-Functioning. The Satisfaction scale itself was only a significant predictor once. Interestingly, the average R square for predicting the scales for depression symptoms (0.09) was larger than for bipolar disorder symptoms (0.06), which was larger than both positive and negative schizophrenia symptoms (0.03).

Discussion

The majority of the original hypothesized domain scales for the satisfaction surveys formed well-defined factors, and scales based on the items defining these factors had high internal consistency (average coefficient alpha = 0.91), and for a small subsample, reasonable test-retest correlations (r = .53) across an average of 2.7 years. The results of this study, in conjunction with previous research, support the structure and reliability of most of the scales. The satisfaction survey scales also had a consistent pattern of modest correlations with the external criteria of symptom scales and other outcomes such that lower client satisfaction was associated with higher symptoms and poorer outcomes, and higher client satisfaction was associated with lower symptoms and better outcomes, which also provides evidence for the satisfaction surveys construct validity and generalizability.

Similar to the findings of Wackwitz, Jerrell, and Eisen et al., the results of the factor analysis in this study clearly support the Access, Quality, and Outcomes factors.5,6,8 Also like Lutterman et al., the Social Connectedness scale was supported as an independent factor.7 The results from the present study indicated that the Functioning items were not clearly distinguishable from the Outcomes items, similar to the results found by Shafer and Temple for the youth and family version of these scales, which suggests they should be combined into a single scale.10 The two Participation items were not distinguishable from the Quality-Appropriateness items, and they should also probably be combined into a single scale.

The pattern of the client’s satisfaction scale correlations to outcomes was as predicted with lower satisfaction scores being associated with worse outcomes and higher satisfaction scores being associated with better outcomes. The specific satisfaction scales of Outcomes-Functioning and Social Connectedness were the most strongly associated with outcomes. The Social Connectedness scale is especially interesting and important, because this domain is one of the SAMHSA’s national outcome measures and has items reflecting both social support and loneliness, both of which are likely to have an effect on mental health. The symptom scales for bipolar depression and major depression also generally had higher correlations with satisfaction than did the positive and negative symptom scales for schizophrenia. The majority of external criteria were completed by clinicians (with the exception of approximately half the QIDS that can be self-report or clinician rated). This pattern of correlations indicates that the clinician’s ratings of symptoms and the client’s perception of their own functioning and outcomes are systematically related in ways that would be expected. Overall, although generally modest, these correlations with external criteria contribute to documenting the construct validity and generalizability of these satisfaction scales. Additionally, the pattern of results across time was that symptoms and outcomes were generally worse before the satisfaction survey; yet, symptoms and outcomes were generally better after the satisfaction survey, possibly suggesting that the overall length of treatment may have a generally positive effect, with lower symptoms and more positive outcomes as the number of years of treatment increases.

The most objective outcome, psychiatric hospitalizations, did not have any large significant correlations with the survey scales. However, the administrative databases used included only hospitalizations in state hospitals and did not include other private or public hospitalizations and, therefore, limited the data available for analysis. In addition, psychiatric hospitalization is relatively rare, with only about 11% of the sample having any hospitalizations; hence, the base rate may have been too low to detect any effect.

This study had a number of limitations, one of which was that the survey response rate was relatively low. Another limitation was that the matching of the client’s assessment data to satisfaction surveys required excluding external data for the actual year of the survey due to the lack of exact survey completion dates. Consequently, symptom scales and outcomes were further removed in time from the satisfaction survey and were only available for the years before and after the survey. In addition, the further away from the survey year one goes in time, the somewhat fewer symptom scales and outcomes were available for analysis. Both of these issues may have attenuated the correlations between the survey scales and the outcomes. Nevertheless, the sample was a large group of public mental health clients that typically had multiple years of data available for each client, which did have some significant but modest correlations with external criteria.

Implications for Behavioral Health

The MHSIP consumer satisfaction surveys are required of states by SAMHSA as part of the Community Mental Health Services Block Grant. The survey is used to evaluate the performance of state mental health services statewide, or by a provider, and is used to improve services. The surveys are necessary to provide the consumer’s perspective on the quality and impact of services received. Based on a state’s survey results, states will provide additional assistance to improve services and the client’s satisfaction with those services, if needed. Given the significance attached to the results of the survey at the state and national levels, it is important to understand the strengths and limitations of the survey.

The factor analysis found strong support for the predicted satisfaction scale structure and scales based on these factors showed acceptable reliability. This finding supports the structure and reliability of the survey scales in conjunction with previous factor analysis. In this study, the clients’ personal assessment of their outcomes and functioning was generally congruent with objective assessments of client symptoms by clinicians. Moreover, symptom improvement was predicted not only by client assessment of Outcomes and Functioning but also by Social Connectedness with small additional contributions from Quality and Appropriateness. This represents an important step forward in demonstrating the construct validity and generalizability of the consumer satisfaction survey as well as the potential utility of satisfaction surveys to assess client’s treatment experiences and outcomes.

The Outcomes-Functioning scale is important, because while individual symptom data for clients is not collected at the national level, the survey scales are collected and reported by states. These results suggest that the clients’ Outcomes-Functioning scale scores could be used as a reasonably valid (although modest) indicator of the clients’ clinical outcomes. This might be of some assistance at the aggregate level for program evaluation and planning for the Community Mental Health Services Block Grant. Yet, more research on using this as an indicator of clinical outcomes would be needed to further establish its validity.

The Social Connectedness scale is unique and different from most satisfaction survey scales and probably serves as a measure of social support. This should be investigated further in future research as social support plays an important role in mediating many health outcome processes and may be an area that some clients could potentially strengthen to assist in their recovery given its unique independent contribution to symptom improvement.