Introduction

Although the impact of disease on patient quality of life is recognised as important in health care, the impact of illness on those living with the patient has been largely overlooked. There are specialty and disease-specific studies relating to the impact of illness on patients’ family members in dermatology [1], oncology [2] and physical and mental disability [3]. These have shown that the impact of illness on families is widespread and severe and that few families are offered appropriate support. Poston et al. [3] explore the impact of childhood illness, particularly physical and mental disability, on the family and describe the strong influence that individual family members can have on one another. This study of childhood illness was the first to explore family quality of life. In a study exploring family quality of life in dermatology, emotional impact on the family was found to be the most commonly affected area, with 98 % of family members interviewed reporting a degree of emotional distress as a result of the patient’s illness [4]. In some disease areas, the quality of life of family members of patients can be more greatly affected than that of patients themselves [5, 6], and a link has been demonstrated between disease severity and level of impact on the family member [7, 8]. Many of these studies focus on the carer, overlooking the impact of illness on those members of the family who are not carers, but may still be affected by living with the patient. A recent literature review revealed that there is no generic instrument to measure this impact of illness on the partner or family members usable across the whole of medicine [9]. Such a measure is needed to help investigate the impact of illness on families of patients, to draw attention to this huge secondary burden of disease, to use to assess the impact of appropriate interventions to support family members, and to inform and support family-orientated clinical decision-making.

Although many generic patient and population measures exist, previous work in dermatology [4] showed that there is a unique combination of quality of life issues related to life with an unwell family member. It may therefore be possible that these quality of life issues are unique to family members across the whole of medicine. As this range of issues is specific to family members of patients, they are likely to differ from the items in patient-generated generic measures, and so it was felt that using patient-generated measures to assess these unique issues, even those tested in community populations, was unsuitable, as the specific impact of illness would not be accurately measured. However, due to the multidimensional nature of quality of life, it was hypothesised that the score of the new measure would have a moderate correlation with scores of patient measures, such as the WHOQOL-BREF in this study, and that there would be some overlap in the ways that illness impacts family members and patients. A strong correlation between the measures would not be expected, as this would suggest that identical constructs are being measured. In addition, it is important to develop the new instrument from scratch, rather than from adaptation of an existing instrument, or from previous literature, as the content comes directly from interviews with family members themselves and their views are embedded in the content and wording of items in the measure.

Semi-structured interviews were conducted with family members of patients suffering from diseases representing a wide range of medical specialties [10] When we transcribed these interviews and coded the effects of illness on family members’ lives, similar themes emerged across all areas of medicine. Specifically, ten key areas of impact on family quality of life were identified: emotional impact, daily activities, family relationships, sleep and health, holidays, support and medical care, work and study, financial impact, social life and time planning. The data from this qualitative study were then used in the development and validation of the Family Reported Outcome Measure (FROM), a self-reported instrument designed to measure the impact of illness on family members across all specialties and disease areas.

Methods

Ethical approval and patient consent

This study was approved by the South East Wales Research Ethics Committee (21/05/12) and the Research and Development department of the Cardiff and Vale University Local Health Board (05/06/10) and Velindre NHS Trust (13/07/10). All patients and family members gave written informed consent.

Item generation

The ten themes identified from previous qualitative interviews with 133 family members of patients from 26 specialties [10] were developed to form the questionnaire items. Any sub-theme mentioned by >5 % of interviewees was developed into an item. Each item measured one concept, and items were designed to be clear and concise. A 30-item draft questionnaire was developed (version 1), with a five-point Likert response scale: 0 = not at all, 1 = a little, 2 = moderately, 3 = a lot, 4 = extremely.

Participants

Participants were all family members of patients, selected from 26 clinical specialties of the Cardiff and Vale University Local Health Board. Using a purposive sampling method and involving a senior specialist from each specialty, patients were selected with a range of conditions which best represented their specialty. One accompanying family member of each patient was approached for recruitment after they had been seen in the clinic. Recruitment mostly took place in outpatient clinics, with a few family members of patients recruited from long-term rehabilitation ward settings. Participants were eligible if over 18 years old, a family member or partner of a patient, and fluent in English. Family members were recruited in two cohorts: for item reduction and for validation.

Participants cohort 1: item reduction

Two hundred and forty-five family members were approached to participate in this stage: four declined due to time pressures. All 241 family members were asked to complete version 2 of FROM. One response was excluded due to incomplete answers. The final validation used data from 240 family members from 26 specialties (Table 1). Their demographic characteristics are shown in Table 2.

Table 1 The 26 specialties included in the study (number of family members for cohort 1, number of family members for cohort 2)
Table 2 Demographics of the family members and patients in the study

Participants cohort 2: psychometric validation

One hundred and thirty-one family members were approached to participate: nine declined, seven did not have time and two for personal reasons. One withdrew before the questionnaires were complete, due to personal reasons, and his responses were excluded. One subject was excluded due to incomplete answers. The final validation was carried out using data from 120 family members of patients from 25 of the specialities in Table 1. Mental health was not included because of practical recruitment issues. The demographic characteristics of the family members are shown in Table 2.

When recruited, family members were given the FROM to complete, along with the WHOQOL-BREF, a validated generic quality of life measure [11] of 26 items under four domains, with higher scores indicating higher QoL. Family members also completed a Global Health Score (GHS), rating the patient’s overall health from 0 to 10 on a visual analogue scale, with 0 = worst possible health and 10 = perfect health. Family members were asked to answer four cognitive debriefing questions about the face validity of the FROM: Is the questionnaire easy to complete? Are the response options straightforward? Are the instructions and statements clear? Do the questions cover all areas of your life which have been affected? To measure practicality, the investigator (CJG) timed family members completing the FROM. The readability and item length were also assessed [12]. Data were analysed using PASW Statistics 18©.

All family members were contacted via email or post after 1–2 weeks and asked to complete the FROM and the GHS again.

Measures

During this study, the questionnaires used included the FROM-16 (the newly developed measure), the WHOQOL-BREF and the GHS. The WHOQOL-BREF has been extensively psychometrically tested and has shown acceptable internal consistency (Cronbach’s α = 0.68–0.82), good construct validity (tested by correlation with a single-item quality of life measure) and good item-total correlation in previous studies [11]. The GHS was used successfully in a previous study of family quality of life in dermatology [1], where a correlation between the GHS score and the QoL of the family member was found.

Results

Content validity

The content validation of version 1 of the FROM used both qualitative and quantitative methods [13] with family members of patients and an expert panel of healthcare professionals including consultants, specialist nurses and academic experts.

The questionnaire feedback forms (n = 23) from the quantitative part of the content validity showed good agreement between judges, intraclass correlation (ICC) = 0.97 [p ≤ 0.001, confidence interval (CI) 0.94–0.99]. 95 % of judges thought that items were complete, written clearly, relevant to family members and fitted well with the response options. The scale content validity index (CVI) was 0.88 suggesting high scale content validity for the scale. Changes were made as a result of qualitative feedback from family members and healthcare professionals to item and instruction wording and one item was split to form two items.

Item reduction

Item reduction was carried out using Rasch analysis, then factors were identified using factor analysis.

Rasch analysis

For Rasch analysis, the software RUMM2030 was used. The overall fit of the data to the Rasch model was examined repeatedly after all stages of the analysis [14]. The threshold ordering was examined to check whether items progressed in a logical order, and items were rescored if necessary. Individual items were checked for their fit to the model, and items with a fit residual of ±2.5 were considered for removal [15]. The individual person fit was also checked, and outliers were considered for removal if they were skewing the data. Items with low endorsement (a high percentage of family members scoring 0 or 1) were considered for removal, along with results from further tests: items which showed local dependency were combined, and items were tested for differential item functioning (DIF) by family member’s gender and age. The DIF analysis was used when considering items for removal. Problem items were removed stepwise, and the overall fit statistics were tested at each stage. When the final set of items had been identified, the scale was tested for unidimensionality, and the targeting of the questionnaire was assessed to ensure that the items were representative of the target population. The person separation index (PSI) was calculated to identify whether the questionnaire could distinguish between different groups of family members.

The partial credit Rasch model was chosen for the FROM, as the result of the likelihood ratio test in RUMM2030 showed that the data did not fit the rating scale model (p < 0.05). The original summary fit statistics showed poor fit to the Rasch model [fit residual mean for items (SD) = −0.22 (1.97), persons (SD) = −0.19(1.34), χ 2 p = 0], suggesting that changes to the FROM needed to be made. Twenty-four of the 31 items showed disordered thresholds, and so the five response categories were collapsed first into four categories and then into three, causing all 31 items to become ordered. Figure 1 shows the ordered category probability curve for item 1, and Fig. 2 shows the curve for item 4. The categories were collapsed for the questionnaire as a whole, giving a uniform scoring system and making the FROM score much easier for investigators and clinicians to calculate.

Fig. 1
figure 1

The ordered category probability curve for item 1

Fig. 2
figure 2

The disordered category probability curve for item 4

Six items were found to be misfitting with fit residuals >±2.5, and these were removed one at a time. Eleven individual person fit statistics were outside the acceptable range. However, as the fit to the model was not greatly improved by removing these people, and the removal of the misfitting respondents could have a negative effect on the construct of the FROM [15], these respondents were included in the further analysis. Two items showed a low level of endorsement: items 9 (I feel the burden of caring for my family member) and 29 (My work or study is affected) and both were considered for removal. Potentially problematic dependency between items can be identified if the residual correlation is between 0.2 and 0.3 above the average of all of the item residual correlation [16]. The average residual correlation of the FROM items was −0.039, so correlations between items of above 0.16 were identified, for example between items 12 (My family activities are affected) and 14 (My hobbies are affected).

Items were removed or combined in a stepwise process, with the overall fit statistics consulted at each stage. Sixteen items were retained and tested for differential item functioning (DIF) by age and gender. Eight of the 16 items showed uniform DIF by either gender or age, or both. For example, item 22 (My family expenses are increased) showed DIF by gender, with males more likely to score highly. Both DIF by gender and age cancelled out at test level (p = 0.6 and p = 0.8 respectively). Only item 20 (My sex life is affected) showed non-uniform DIF, but it was retained as it was identified as a very important and frequently occurring theme by family members during the interview stage. The 16-item FROM showed good targeting for the population (mean person location value of −0.622), suggesting that the sample is located at a slightly lower level than the items. The PSI of the FROM is 0.88 (rounded to 0.9), meaning that the measure can significantly distinguish between 4 different groups of respondents [17]. The summary statistics for the final version of the 16-item FROM indicate that it has a good fit to the Rasch model. The mean (SD) fit residual values were −0.36 (1.07) for items and −0.22 (1.01) for persons (Total χ 2 = 56.6, df = 48, p = 0.18).

Factor analysis

Factor analysis was applied to the data to determine whether the FROM can be given a single total summed score, or multiple scores for different factors. Exploratory factor analysis was performed to determine the loading of items onto factors using oblique rotation, as factors were expected to be related with Oblimin and Kaiser normalisation [18]. The number of factors was determined using Kaiser’s criterion rule [19] and Cattells’ scree plot [20]. The minimum threshold loading value for items onto factors was 0.4 [21]. The final version of the FROM (version 3) was produced as a result of factor and Rasch analysis. Two hundred and forty subjects were used for factor analysis of this 16-item measure, in excess of guideline sample size [19] The principle component analysis for the 16-item FROM revealed three factors with eigenvalues ≥1, which together explained 61 % of the variance. One of these factors was very close to the Kaiser’s criterion cut-off point with an eigenvalue of 1.008. The scree plot (Fig. 3) showed two dominant factors which were taken forward for factor rotation. All items loaded above the minimum threshold value of 0.4 [21]. Items were assigned the factor to which they had the highest loading (Table 3). The items loaded onto the two factors in a logical way according to their theme or construct. The first factor, “Emotional”, loaded five items: worried, angry, sad, frustrated and difficult to talk to someone. The second factor “Personal and Social Life” loaded 10 items: hard to find time for self, travel, eating habits, family activities, holidays, sex life, work or study, relationships with other family members, family expenses and sleep. Item 9 loaded highly onto both factors but its’ concept matched better to the “Emotional” factor, which brought the factor total to sixitems.

Fig. 3
figure 3

Scree plot to show the variance in the components of the FROM

Table 3 The structure matrix of the FROM showing the loading of each item onto the two factors extracted

Measurement properties

FROM scores

The final version of the FROM (version 3, “FROM-16©”) (Fig. 4) has 16 items with three response options for each, ranging from Not At All (scoring 0), A little (scoring 1) and A Lot (scoring 2). The higher the total score, the greater the effect on the family member’s quality of life. The FROM-16 has two parts (domains): Emotional (6 questions, maximum score 12) and Personal and Social Life (10 questions, maximum score 20). The total score for the two domains is positively correlated (r = 0.62, p < 0.001). The total score for both domains also shows a strong positive correlation with the total FROM score (Emotional: r = 0.85, p < 0.001; Personal and Social Life: r = 0.94, p < 0.001)

Fig. 4
figure 4

The Family Reported Outcome Measure (FROM-16)©. © S. Salek, A. Y. Finlay, M. K. A. Basra, C. J. Golics, May 2012

Total scores for the FROM-16 (0–32) ranged from 1 to 32, median = 11.50, mean = 12.28, SD = 7.47. The mean total domain scores were 5.6 (Emotional) and 6.7 (Personal and Social Life). There was no floor effect (FROM-16 score of 0), and only one subject scored 32 indicating a minimal ceiling effect. The items with the highest mean score (possible range 0–2) were feeling worried (1.42), feeling sad (1.13), feeling frustrated (1.17), family activities being affected (0.93) and effect on sleep (0.90). There was no significant difference between the mean total FROM scores of males (11.83) and females (12.52) (p = 0.63). Spearman’s rank order correlation coefficient showed no correlation between the family members’ age and the total FROM score (r = 0.02, p = 0.80). The highest mean total scores were found in family members of neurology patients (19.8), oncology patients (17.6), haematology patients (16.6) and chronic pain patients (16.6). The lowest mean scores were in family members of ophthalmology patients (4.25) and orthopaedics (5.80).

Reliability

The reliability of the FROM was measured using internal consistency and test–retest reliability. Internal consistency was measured using Cronbach’s alpha (α), which should be above 0.7 [22, 23] to be adequate. Test–retest reliability was carried out by investigating whether the FROM produces the same results when administered to stable subjects (those whose health had not changed) on two occasions. Patients were considered unstable if their GHS had changed by more than one point in the 1–2-week follow-up, and the family members were excluded from the reliability study. In those considered stable, the reproducibility was measured using the intraclass correlation coefficient.

The Cronbach’s alpha coefficient of FROM was 0.91 suggesting high internal consistency between items. This was not improved by deleting individual items (0.90–0.91), demonstrating that all of the items contribute to the total FROM score [24] The two domains also showed high internal consistency (Emotional = 0.80, Personal and Social Life = 0.89).

Seventy-four (61.2 %) family members returned the second set of completed questionnaires 7–14 days after first recruitment. Twenty-three of these were eliminated due to a change in GH score of more than one point. The test–retest was based on data from the remaining 51 family members (43.0 %) of patients with stable health status. The ICC value for the total FROM score was 0.93, suggesting that the scale shows reproducible results in stable subjects. The two domains also showed high test–retest reliability (Emotional = 0.85, Personal and Social Life = 0.92).

Validity

We hypothesised that the impact of illness on family member’s QoL is correlated to the family member’s overall QoL. This was assessed by comparing the WHOQOL-BREF scores with the FROM scores using Spearman’s rank correlation coefficient: a strong correlation was expected. It was hypothesised that the impact of illness on family member’s QoL is correlated to the health of the patient. This was assessed by comparing the FROM score with the GHS using Spearman’s rank correlation coefficient: a strong correlation was expected.

Hypothesis 1

One subject was removed due to an incomplete WHOQOL-BREF score. Using Spearman’s rank correlation coefficient, a moderate correlation was found between the FROM scores and the WHOQOL-BREF scores (n = 119, r = −0.55, p < 0.001). The two domains also showed a moderate correlation with the WHOQOL-BREF score (Emotional r = −0.43, p < 0.001; Personal and Social Life r = −0.52, p < 0.001).

The correlation is negative due to the different scoring directions of the questionnaires. The impact of illness on family members of patients is therefore correlated to their overall QoL.

Hypothesis 2

The GHS correlated negatively with the total FROM score using Spearman’s rank correlation coefficient (r = −0.51, p < 0.001). The two domains also showed a moderate correlation with the GHS (Emotional r = −0.48, p < 0.001; Personal and Social Life r = −0.45, p < 0.001). This shows that the lower a patient’s health (as perceived by the family member), the greater the impact on the family member’s QoL. The scores of each of the 16 FROM items were compared to the GH score. The correlation was low but significant for all items (r = −0.21–0.46). The items showing the strongest correlation with the GH score were those concerning holiday (r = −0.46), difficulty caring (r = −0.45), and time for self (r = −0.44) and the two items showing the weakest correlation were those concerning sex life (r = −0.21) and family expenses (r = −0.26).

Face validity and practicality

For the four cognitive debriefing questions, 99.2 % of family members thought that the FROM was easy to complete, 100 % thought the response options were straightforward, 99.2 % thought the instructions were clear and 87.4 % thought that the FROM items covered all areas of their life which had been affected. No new areas were suggested which were not covered by the FROM, so no changes were made. The mean completion time (n = 108) of the FROM was 115 s (range 55–272). The Flesch readability score for the FROM was 64.7. The mean length of the 16 items was 5.6 words (range 3–12).

Discussion

This study indicates that the FROM-16 is reliable and valid in family members of patients from a variety of specialties and diseases. Other measures using the term “family” in the title are designed for use in specific patients or a specific family member. The Family Quality of Life Survey for caregivers of people with disabilities [25] shares many similar concepts. The Impact-on-Family scale [26] for parents of children with chronic illness was devised from interviews with family members in the same way as the FROM. However, the FROM has a much shorter length and completion time.

The FROM items were developed directly from interviews with family members of patients [27] with careful consideration of the themes and language used. This led to a high content validity. The number of judges used for content validity exceeded the minimum recommended number [28]. The qualitative and quantitative feedback was drawn from a large number of judges from mixed backgrounds, increasing confidence that changes made to the items were representative of the general population of family members and were clinically relevant across a range of specialties.

Rasch analysis was used along with factor analysis for item reduction. The initial results showed disordered thresholds in 24 of the 31 items, indicating that subjects found it difficult to discriminate between the different response options [15]. The category probability curves were used to aid category collapsing. Collapsing response categories in the same way for all items should make the FROM straightforward to use. Furthermore, having the same scale scoring for each item avoids putting an inappropriate emphasis on certain items [29].

Although the numbers of family members recruited from each specialty were small, the mean total FROM scores were still calculated for each specialty. This gives a preliminary idea of in which specialties family members are most affected, though disease-specific data will be needed before any clear conclusions can be drawn.

The reliability of the FROM was demonstrated by a high Cronbach’s alpha (0.91) and high ICC (0.93) for test–retest reliability. The minimum acceptable value for α varies between authors [30, 31], and it has been suggested that for a measure to be used clinically, the α should be above 0.9 [32], fulfilled by FROM. The choice of time interval for follow-up in a test–retest analysis is an important consideration [15, 28]. A retest interval of between 2 and 14 days is usually considered acceptable [23]. The validity of the FROM was successfully proven using two a priori hypothesis. The use of the GHS proved successful in the FROM validation, and previous studies have found that family members are able to accurately assess patient disease severity [33] further increasing the reliability of this result. The high correlation seen between the FROM and the WHOQOL-BREF suggests that the impact on a family member’s quality of life as a result of the patient’s illness contributes greatly to their general QoL, even with other potential external influences.

The study had several limitations. Firstly, the majority of the population studied were Caucasian, and the number of secondary relatives (e.g. grandparents, aunts) was low. Further studies in a more diverse population should now be carried out with the FROM to determine whether the findings are applicable to a wider population. The small numbers of family members from each speciality could be considered a limitation of this study. However, although the interview saturation point was number 40, interviews were continued to 133. These qualitative interviews demonstrated that there were a limited number of ways family members lives are affected, and commonality was seen across the specialties. Furthermore, during the face validity testing, 87.4 % of family members felt that the FROM covered all areas of their life which had been affected, and the majority of the feedback comments provided to explain this were concerned with individuals’ examples of specific items, which were already covered more broadly. To gain more information about specific specialties, further studies are required. Another limitation of the study was the lack so far of testing sensitivity to change.

It is likely that the FROM can be used in any family member of any patient across all medical specialties. However, it remains unproved whether the FROM has generalisability; Is it reliable in every disease? Is it reliable in other populations outside the UK? The FROM may need to be revalidated in specific populations, but our preliminary findings suggest that the FROM has potential for wide and varied use. The FROM-16 contains a different combination of items from patient measures such as the WHOQOL-BREF, suggesting that using a patient measure to assess the impact of illness on family members may not be appropriate, and confirms the need for a family-member-focused measure.

Potential scope and usage

As the first generic family quality of life measure, the FROM-16 has the potential to be used in a huge variety of situations. The items in the FROM-16 were devised directly from in-depth interviews with family members of patients, ensuring that the concepts measured are rooted in family members’ experience and not healthcare workers views. The development of the FROM used a mix of traditional and modern techniques and the highest standards of validation. The FROM is a generic family member equivalent of generic patient measures such as the WHOQOL-BREF [11] and the SF-36 [34]. It is short in length with a current recall period and its simplicity lends itself to high-quality validated translations. The 16 items describe fundamental concepts, making it likely that FROM will be appropriate for use across other developed and developing countries.

The FROM may potentially be used in clinical situations to improve communication between healthcare professionals when discussing the family and social situation of patients, and should encourage clinicians to think about the impact on the patient’s family when making treatment decisions. The FROM can be used to compare the impact of illness across different disease, or compare the impact of different treatments on the family. There is also potential for wider use in clinical research, for example concerning new therapeutic interventions and identifying needs of a community. The measure also has the potential to be used in disease education programmes for families of patients, both to inform content and to facilitate discussion. Although clinical cut-off points need to be developed for the FROM, it could be used as a tool to identify “at-risk” family members who require referral to support groups or clinical services. Depending upon the type of support required, family members can be referred to their GP, to counselling services, financial services or support groups. Although there are few existing family support groups, the areas of family members lives captured in the items of the FROM could be used to inform the content of new family-member-specific support groups. The increased involvement of family members in existing patient support groups should also be considered to encourage open discussions between patients and family members. In the UK, most patient organisations or advocate groups also provide support to family members and organise regular activities to bring families of patients in contact with one another to exchange their experiences in coping with living with a chronically ill family member. This also includes educational sessions as well as social events.

The FROM could be used as aid in healthcare planning and could affect the culture of healthcare and how patients are managed. Until now, family quality of life was not able to be measured and compared between specialties and disease areas: being able to measure this impact will now help to focus scientific interest on this major additional burden of disease that is often overlooked in healthcare research.