Background

In the general population, stress has been linked to a variety of health complications, including hypertension, diabetes, susceptibility to infection, and has been cited as causative to an immune response [1, 2]. More recently, some studies have found that the perception of stress or anxiety is associated with negative outcomes for mother and offspring, including preeclampsia, preterm birth (PTB), small for gestational age, congenital heart defects, and impaired cognitive development of the child [38]. However, the effect of stress on pregnancy and birth outcomes has resulted in contradictory findings [913]. Copper et al. [14] found that stress was significantly related to both low birth weight and PTB. However, other studies such as Neggers et al. [11, 13, 15] only found significant associations in certain subgroups such as smokers, underweight women, and persons of low socio-economic status. Still, other studies have found no association after adjustment for all confounders [13]. Given the variation of study results, this construct continues to be investigated. Clearly delineating the effects of stress would allow practitioners to focus on specific risk factors and offer reasonable advice and interventions to improve pregnancy and birth outcomes.

Stress is a complex construct, and accurately measuring stress is important in many areas of sociologic and epidemiologic research. Stress has a myriad causes and is experienced in innumerable ways by individuals. Some of the aspects that are common to examine are physical stress, perceived stress, and stress reactivity. Consequent to the multiple aspects of stress measurement, a gold standard of measurement does not exist. Some methods that have been used to measure the different facets of stress include psychologic measures, often measured with a questionnaire, and physiologic markers, measured by saliva or blood. These biologic samples can be difficult to obtain, are costly to analyze, and are equivocally linked to stress [1620]. Overall, questionnaires are a more cost-efficient way to measure stress; however, although questionnaires are developed and validated to measure specific aspects of stress in certain populations, it can be difficult to determine the actual construct being measured or the construct validity. Questionnaires are relatively easy to complete, can be interviewer or self-administered, and can be captured remotely over the phone, on a computer, or in-person. Instruments are typically short, balancing the time of the study staff, the participant, and the need for an accurate assessment. Short instruments help manage the length of the full interview, especially when multiple constructs, predictors and outcomes of interest are being measured. Minimizing the time of an interview is important due to the risk of respondent fatigue, which may cause participants to not pay close attention to the questions and their answers [21].

Stress is not an isolated event, interacting with other psychosocial and health constructs. The effects of other constructs modify how stress is perceived and felt by the individual [22, 23]. Other constructs that are frequently assessed with stress include both internal measures such as self-esteem and depression, and external measures such as social or material support and evaluation of life events. A variety of instruments exist to measure both internal and external influences. One commonly used instrument to measure depression in men, women, pregnant, non-pregnant, and post-natal individuals is the Edinburgh Depression Scale (EDS) [24, 25].

Another consideration for survey research is the reliability of the participants’ responses: the answers provided need to be consistent and replicable. High reliability ensures researchers are measuring the intended exposure/outcome, and allows for study replication in the same or other populations. To assure reliability, survey research often includes internal reliability checks, in which questions or constructs are measured multiple times in different ways. Previously validated instruments have been developed and validated distinctively; any alterations could potentially invalidate the measurement. One way to ensure reliability of construct measurement is to use two instruments and compare the results.

In practice, researchers often face the challenge to select the most appropriate instrument from several available choices. In this study, the choice was between two instruments to measure perceived stress among pregnant women [26, 27], both developed and previously validated to measure perceived stress. The Perceived Stress Scale (PSS) was developed by Sheldon Cohen in 1983 and validated in multi-racial, ethnic, and gender populations and in pregnant women [18, 28]. The second instrument, the Perceived Stress section of the Prenatal Psychosocial Profile (PPP) was developed by Mary Ann Curry in 1994 and validated in rural and non-rural populations among multi-racial and ethnic populations [29, 30]. The scales have two clear differences. The first difference is that the PPP was developed for use in pregnant women while the PSS is applicable to both men and women at varying stages of their lives (including pregnant women). A second main difference is that the PSS, as an instrument, stands on its own. The PPP is composed of multiple sections, including the assessment of support (from a partner and from others) and the assessment of self-esteem, which complement the perceived stress section. To the authors’ knowledge, no previous research has been conducted to compare these instruments. The goals of this study are to test the reliability of measurement of perceived stress by the two instruments. The conclusions of this analysis will provide researchers with important information on how the PSS compares to the PPP. The results of this analysis will also offer data for researchers to use when choosing an instrument to measure perceived stress for their research.

Methods

This study analyzes data from a completed prospective cohort study of 301 women from New Orleans and Baton Rouge who were pregnant during Hurricane Katrina or became pregnant in the 6 months after. Details about methods for data collection are described elsewhere [26, 27]. Briefly, in-person interviews were conducted at clinics between January 2006 and June 2007. All women spoke English, planned to deliver at the study hospitals, were over 18 years old, and living in either New Orleans or Baton Rouge during Hurricane Katrina. Data for 258 women were available for analysis. Imputation by carrying the last answer forward was used when a participant had one answer missing from any of the scales used in this study (n = 17). Women who did not complete the PSS or the PPP (n = 39) were excluded leaving 219 women for this analysis. The Institutional Review Boards of Tulane University and the participating hospitals approved this study.

The PSS is a 10-item scale; each item has five possible responses measuring the frequency of perceived stress over the last month; never, almost never, sometimes, fairly often, and very often. Items are general and assess stress due to events, feeling out of control, and feeling rushed or short on time. It assesses an individual’s reaction and feelings to the specific circumstances (ex. “angered” or “upset.”) Four of the items are positively stated, for example, “how often have you felt that you were on top of things?” The remaining items are negatively stated, for example, “how often have you been upset because of something that happened unexpectedly?” Each item contributes 0–4 points to the total score resulting in a total score that ranges from 0 to 40, a higher score indicating greater perceived stress. The Cronbach’s alpha for this instrument is between 0.84 and 0.86 [31].

The PPP is an 11-item scale; each item has four possible responses measuring the amount of stress perceived to each item. The participant answers the “extent” of stress they have felt about “a current stressor/hassle;” no stress, some stress, moderate stress, and severe stress. Items are specific and include money, family, friends, work, stressful events, substance use, emotions, and about the current pregnancy. The response to each item contributes 1–4 points to the total score resulting in a total score that ranges from 11 to 44, a higher score indicating greater perceived stress. The Cronbach’s alpha for this instrument ranges from 0.69 to 0.78 [29].

The Assessment of Support portion of the PPP is broken into two sections; partner support and support from other people. Each scale has 11 items, each item with six possible responses from very dissatisfied [1] to very satisfied [6]. Scores range from 11 to 66; a higher score indicating greater satisfaction with partner/other support.

The Assessment of Self Esteem portion of the PPP includes 11 items, each with four possible responses (strongly agree, agree, disagree, strongly disagree), with five positively stated and six negatively stated items. Each item contributes 1–4 points to the total score resulting in a total score that ranges from 11 to 44, a higher score indicating more self-esteem.

Because the PPP measures multiple constructs and the PSS only measures stress, we compared both to a validated instrument that was developed independently of the two scales and measures depression, a construct strongly related to stress [22]. The EDS is a 10-item scale, each item having four possible responses in a graduated or likert type scale. Each item contributes 0–3 points to the total score resulting in a score that ranges from 0 to 30, a higher score indicating greater levels of depression. A score that is >12 should be referred for clinical assessment.

Statistical Analysis

Distributions on the two scales were assessed for normality. We found that the PSS is normally distributed while all other scales (PPP, Other support, Partner support, self-esteem, EDS) are not. We examined the distribution of the PPP after transformation and found that grouping participants into tertiles coincided with the distribution of scores (Fig. 1a, b). We first compared concordance between the two instruments at all three tertiles. Secondly we examined it dichotomously at the highest amount of stress compared to the two lower tertiles. Cohen’s Kappa was used to assess inter-scale agreement on perceived stress in tertiles and dichotomously [32, 33]. Due to the nonparametric distribution of the data, the Spearman’s rank correlation was used to assess the correlation between the scores of the Assessment of Partner/Other Support and Self Esteem scales from the PPP were tested with the total scores of the PSS and Perceived Stress score of the PPP. Two analyses for missing data were conducted. First, to assess the difference between women included in the study and women who were not included due to missing information. Second, to assess women with imputed data compared to women with complete data. All p values were two-tailed, and the significance level selected was 0.05. Data were analyzed using SAS/STAT software, version 9.2 of the SAS System for Windows (SAS Institute Inc., Cary, NC, USA).

Fig. 1
figure 1

a Distribution of the total score on the Perceived Stress portion of the Prenatal Psychosocial Profile. Scores on the Perceived Stress portion of the Prenatal Psychosocial Profile range from 11 to 44, with a higher score indicating higher perceived stress. In the sample, women primarily scored towards the lowest range of the instrument indicating lower amounts of perceived stress. Data in this figure has not been transformed. b Distribution of the total score on the Perceived Stress Scale. Scores of the scale range from 0 to 40, with a higher score indicating higher perceived stress. In this sample, women scored in a more moderate range of stress with fewer women reporting the extremes of stress. Data in this figure has not been transformed

Results

Almost seventy-nine percent of women were between the ages of 18 and 34. Women were mostly black or white, 43.4 and 51.2 % respectively. The majority of women had some college education 52.5, and 16.0 % had either a college education or higher. At the time of the interview, 48.4 % were working for pay (Table 1). Sensitivity analysis comparing the 219 women with complete information on the PPP and PSS to the sample of 258 with demographic information found no difference based upon age, race or education level (data not shown).

Table 1 Distribution of baseline characteristics of the study sample

For the PSS, the scores ranged from 0 to 39, the mode was 14, and the median score was 16. The lowest tertile of stress contained 33.8 % of women followed by 35.2 % who ranked in the middle tertile and 31.1 % who ranked as having the greatest amount of perceived stress (Table 1). For the PPP, the untransformed scores ranged from 11 to 35, the mode was 13, and the median score was 17. Women who perceived the least amount of stress comprised 40.2 %, women with moderate stress were 31.1 % and women with the greatest amount of perceived stress were 28.8 % of the sample (Table 1). The overall agreement between the two instruments was κ = 0.37 (95 % CI 0.27–0.47). When the tertiles were compared, a total of 92 women (42.0 %) were discordant. When the scores were dichotomized to above and below the third tertile, women who had the highest perceived stress 71 women (32.4 %) on the PPP and 68 women (31.1 %) on the PSS. The application of Cohen’s Kappa resulted in a κ = 0.61 (95 % CI 0.50–0.72, p < 0.01) with 37 women (16.9 %) classified discordantly.

The Spearman correlation of the two tests was r s  = 0.71 (p < 0.01). Both the PPP and PSS were statistically significantly correlated with a higher score on the EDS (r s  = 0.72 for PSS and r s  = 0.76 for PPP, p < 0.01 for both). Both the PPP and PSS scores were inversely correlated with the Assessments of Support and Self Esteem portions of the PPP; Support from Partner, r s  = −0.46 for PSS and r s  = −0.47 for PPP; Support from Other, r s  = −0.32 for PSS and r s  = −0.31 for PPP; Self Esteem, r s  = −0.52 for PSS and r s  = −0.41 for the PPP; p < 0.01 for all (Table 2).

Table 2 Spearman correlation coefficients of Prenatal Psychosocial Profile, Perceived Stress Scale, and Edinburgh Depression Scale

Women with any completed data but were not included in the study (n = 39) were compared to women who were included in the study (n = 219). We found no difference between the women based on education or race. We did find that women who were excluded had more than 50 % of all data points missing including the age variable. When we compared the women with imputed values to those who had complete data, we found no difference between these two groups of women. No additional data were available for the 43 women who did not complete any part of the interview.

Discussion

In our study sample, the overall agreement between the PSS and PPP was moderate at κ = 0.37. When we compared the highest stress tertile compared to the lower stress tertiles we found high agreement κ = 0.61. These instruments demonstrate high reliability between the two tests considering the two scales do not use the same questions or wording. Some variation in classification occurred between the tertiles: the mode of the PPP appeared in the lowest tertile while the mode of the PSS appeared in the middle tertile. This difference is expected because the distributions of the two scales were different in this cohort: the PSS was normally distributed and the PPP was not normally distributed. At lower levels of stress, more women were discordantly classified suggesting that these instruments are not suitable to measure lower levels of stress and would result in unstable results. Misclassification at the lower levels of stress would lack clinical significance.

Many instruments have been developed to measure the perception of stress. When choosing an instrument for research or clinical purposes the instrument measures the construct of interest. This study compared the PSS and perceived stress portion of the PPP to help researchers and clinicians differentiate between the two instruments. Both instruments have been previously validated to measure perceived stress. We found moderate agreement between the two scales, indicating that it would be possible to use the combination of the two scales as an assessment on internal reliability of data, or to use a single scale to measure stress. However, some key differences exist between the two scales. The PPP may be considered a more comprehensive instrument since it includes components of support and self-esteem, while the PSS is more generalizable as it has been validated in men, pregnant and non-pregnant women. The instruments themselves are also unique.

Items on the PSS refer to general feelings and factors that an individual has experienced in the past month. The broad scope of the questions makes it impossible to identify the potential causes of stress; however, they identify how well a woman is adapting to the stress in her life. Conversely, items on the PPP are highly specific and identify the specific causes of stress a woman is experiencing. The scale that is used on the PPP also provides the woman with the ability to identify the amount of stress that she is experiencing by the specific item. Allowing researchers and clinicians to identify the amount of stress a woman perceives from specific parts of her life.

Multiple limitations to this study exist. The sample size is small and has missing data. Due to the limited sample size and missing data, stratification by education, age, and race resulted in frequencies <5 persons in certain categories resulting in unstable estimates. Thirteen percent of participants with incomplete data were excluded from this analysis. These participants had a similar distribution of race and education as the analyzed population. Given this similarity, our conclusions are unlikely to change based on the missing data. Our study sample is not representative of the population. A convenience sample of women was used for this study; therefore women who did not complete any part of the interview are not likely to change the results of the study. Also, women were interviewed between 6 months and 1 year after Hurricane Katrina hit the Gulf Coast.

Future research should examine these results in different populations. Additionally, research should examine the relationship of these instruments to biologic markers of the stress response and their association with adverse pregnancy outcomes.

Conclusion

Total scores of the PSS and the Perceived Stress portion of the PPP were positively correlated. Additionally, the scores on each were positively correlated with the Edinburgh Depression Scale, and negatively correlated with the Assessment of Support and Assessment of self-esteem portions of the PPP. Researchers should choose to administer a single instrument to participants, or to use the instruments in combination as an external reliability check.