In the past decade, there has been an exponential growth of research on mindfulness — defined as a practice rooted in Buddhist principles that fosters awareness and acceptance of thoughts, emotions, and physical sensations to decrease suffering (Bodhi, 2011). Research indicates that mindfulness predicts lower anxiety, depression (Webb et al., 2019), rumination (Blanke et al., 2020), and stress (Chiesa & Serretti, 2009); and is associated with greater psychological well-being (Baer et al., 2008), satisfaction with life (Rogge & Daks, 2021), and self-compassion (Svendsen et al., 2017). Despite these promising outcomes, mindfulness research has focused on predominately White samples. This leaves open important questions about whether the effects of mindfulness are generalizable to individuals from other races and ethnicities, particularly to Black Americans who experience disproportionately higher levels of chronic and acute stress, and associated adverse mental health outcomes (Williams et al., 1997). In a systematic review of 12,265 mindfulness studies from 1990 to 2016, only 24 studies (0.2%) focused on minority representation, cultural adaptations for interventions, or ethnoracial group comparisons (DeLuca et al., 2018). Moreover, only 11 of these studies (0.1%) included predominantly Black American samples. Similarly, a review of mindfulness-related therapies (i.e., mind–body therapies) for cardiometabolic diseases found only 5 out of 425 trials (1%) targeted Black populations (Johnson et al., 2018). A necessary first step in mindfulness research in Black Americans is to evaluate the validity and reliability of a widely used measure of mindfulness for Black Americans.

The most widely used tool to measure mindfulness is the 39-item Five Facet Mindfulness Questionnaire (FFMQ-39; Baer et al., 2006). The abbreviated version of the FFMQ-39, the 15-item FFMQ (FFMQ-15; Baer et al., 2012), has been used less frequently; however, it is a valuable option for assessing mindfulness in time-limited settings and to minimize participant burden (e.g., time-sensitive interventions; hospital settings). Both the FFMQ-39 and FFMQ-15 have been validated in predominantly White samples (e.g., Baer et al., 2008; Christopher et al., 2012; Gu et al., 2016). The only study to validate the 39-item FFMQ within a Black American sample used a low-income sample of Black Americans with recent suicidal ideation who were recruited from clinical settings (Watson-Singleton et al., 2018). Notably, this study removed 19 items to successfully validate the FFMQ in Black Americans. To date, neither of these FFMQ questionnaires has been validated with a non-clinical, Black American sample. Given that mindfulness is a skill that can be useful regardless of underlying psychopathology, it is necessary to validate all available versions of mindfulness measures (e.g., long and short) within normative community samples to ensure generalizability. Below is a review of past work demonstrating the psychometric properties of the FFMQ-39 and the FFMQ-15 as well as a discussion about how the FFMQ may fall short for Black Americans.

Mindfulness is a complex and multifaceted construct that has sparked continued debate about its various components and how best to measure them. Within this discourse, 5 mindfulness factors (or “facets”) have received the most empirical attention — acting with awareness (awareness), describing (describe), non-judging of experience (nonjudgment), nonreactivity to inner experience (nonreactivity), and observing (observe) (Baer et al., 2006, 2008; Christopher et al., 2012; Gu et al., 2016; Sweeney et al., 2021). The focus on these 5 factors is largely due to the development and subsequent widespread implementation of the FFMQ, which assesses the 5 factors. Awareness involves maintaining a focus on actions, without distraction. Describe involves describing or labeling in words one’s internal experiences such as beliefs, opinions, emotions, and expectations. Nonjudgment involves being open-minded and curious about one’s internal experience. Nonreactivity involves experiencing emotions from a meta-cognitive or decentered perspective. Observe involves attending to internal experiences and external stimuli, such as smells, sounds, and sights (Baer et al., 2006).

Prior evidence from confirmatory factor analyses (CFAs) indicates that the 5-factor model of the FFMQ-39 and FFMQ-15 fit the data well (CFI > 0.90, TLI > 0.90, RMSEA < 0.06, and SRMR < 0.09; Baer et al., 2006, 2008; Christopher et al., 2012; Gu et al., 2016; Sweeney et al., 2021). Notably, these studies included samples that were predominantly White or did not report the race of their samples. The factors in both questionnaires have displayed good internal consistency. The FFMQ-39 alphas range from 0.75 to 0.91 (Baer et al., 2006) and the FFMQ-15 alphas range from 0.64 to 0.80 (Gu et al., 2016). The test–retest reliability of the English FFMQ-39 and FFMQ-15 has only been reported in one study, which found weak stability over a 6-week period (i.e., correlations for each factor ranging from 0.22 to 0.54; (Watson-Singleton et al., 2018)). However, this study also included an intervention during that 6-week period which may have interfered with assessing the stability of the measure; thus, the temporal stability of both the FFMQ-39 and FFMQ-15 has been largely unexamined.

In addition to these analyses of factor structure and reliability, previous research in predominantly White samples have established a nomological network of the FFMQ-39 and the FFMQ-15, such that the factors have been correlated with constructs conceptually related to mindfulness. For instance, awareness, describe, nonjudgment, and nonreactivity have expectedly correlated with lower anxiety, depression (Webb et al., 2019), rumination (Svendsen et al., 2017), and greater levels of life satisfaction, psychological well-being (Christopher et al., 2012), and mindfulness meditation experience (Baer et al., 2008). While the observe factor has negatively correlated with depression, rumination (Gu et al., 2016), and positively correlated with anxiety (Lee & Zelman, 2019), self-compassion (Sweeney et al., 2021), satisfaction with life (Christopher et al., 2012; Rogge & Daks, 2021), and mindfulness meditation experience (Baer et al., 2008). Furthermore, the observe factor correlation with psychological well-being has been nonsignificant (Baer et al., 2008). Notably, ethnic identity—the extent to which a person identifies with their ethnicity—has not been examined as a correlate of the FFMQ factors. Previous research has seen ethnic identity to be positively correlated with race-related stress (Tovar-Murray, 2011). Thus, it is possible that mindfulness—a tool used to mitigate stress—might have a negative relationship with ethnic identity. Put together, there is scientific interest in a nomological network with psychological outcomes as well as other psychosocial constructs related to stress and coping.

Overall, the 39-item and 15-item FFMQs have demonstrated good internal consistency, validity, and invariance, at least in predominantly White samples. However, the FFMQ’s psychometric performance in a non-clinical Black American sample is unknown. Given that prior work has empirically questioned the validity of the 5-factor interpretation of the FFMQ across non-White cultures (Karl et al., 2020), determining its validity for Black Americans remains an open question.

There are several reasons to doubt the performance of the FFMQ within Black Americans. To begin, the language of the FFMQ items is not inclusive of all Black Americans. For example, an item in the observe factor states, “I pay attention to sensations, such as the wind in my hair or sun on my face.” Several Black American hair styles including afros, braids, and locs do not blow in the wind, potentially making the item less reflective of how Black Americans commonly experience mindfulness in everyday life. Moreover, including an unrelatable item in the FFMQ suggests that the inclusion of Black Americans was not prioritized when the FFMQ items were developed. This deprioritization may cause Black Americans to feel alienated, which could be an unintended negative mood induction, thus influencing responses to other items in the questionnaire.

Furthermore, there is reason to believe that the factors of the FFMQ may not adequately capture the most important factors of mindfulness within Black Americans because the FFMQ does not incorporate racism-related stress. For instance, Womack and Sloan (2017) found a negative correlation between alertness to discrimination and the acceptance subscale of the Kentucky Inventory of Mindfulness Scale (KIMS; Baer et al., 2004). This finding supports the idea that the heightened attention to potential discriminatory acts runs counter to the flexible, open, and non-judgmental attention congruent with mindfulness as it is operationalized in the KIMS scale, a measure that has informed the items and factors within the FFMQ. Indeed, items on the FFMQ such as “When I’m walking, I deliberately notice the sensations of my body moving (Observe)” and "I don’t pay attention to what I’m doing because I’m daydreaming, worrying, or otherwise distracted (reverse-coded, Act with Awareness)” do not appear to be optimally sensitive to the lack of safety that Black Americans feel walking in public or the stress of worry and distraction when interacting in society built on structural/systemic racism. As such, the items of the FFMQ may not be understood as relevant to the experiences of Black Americans and, thus, may poorly reflect mindfulness as used by Black Americans.

In addition to the potential variability in the performance of the FFMQ among Black Americans, as a whole, it is important to examine the significant heterogeneity across various social domains within Black Americans. Doing so supports the need to investigate whether the FFMQ is invariant across specific sub-groups within a Black American sample. Yet, the question remains, which sub-groups (i.e., specific social categories) should be examined? The approach to this question is grounded in the fact that different social experiences may lead to different familiarity with and access to mindfulness. These differences may lead to variability in understanding the FFMQ items. Such differences can be observed with an invariance analysis. The infracategorical model of inequality (Monk, 2022) and prior research guided decisions about which sub-groups should be examined with invariance analyses. For example, the infracategorical model of inequality argues that social inequality research should focus less on dominant classical sub-groups or categories (i.e., race and gender) and more on contemporary (infracategorical) categories (i.e., perceived discrimination and skin tone) that help to deepen societal understanding of racialized social experiences and inequalities, such as health disparities. This infracategorical information allows practitioners and researchers to capture more specific and relevant information to address inequities (e.g., data to inform how best to culturally adapt clinical interventions). Another support for the sub-groups that we selected for the invariance analyses comes from previous inequality and mindfulness research which support the need to examine categories associated with (1) ethnic culture (ethnic heritage and ethnic identity); (2) stress (perceived discrimination, perceived colorism, depression); and (3) demographic and mindfulness-exposure variables frequently included in mindfulness research (gender, household income, mindful meditation experience).

First, ethnic culture provides different social experiences within Black Americans that can lead to different understandings of the FFMQ items. Two key features of ethnic culture are ethnic heritage and ethnic identity. Ethnic heritages within Black Americans include, but are not limited to, African Americans who have descended from the enslaved and immigrants from the Caribbeans and/or Africa. These communities contain related yet distinct cultures; and thus, may react differently to mindfulness—a culturally derived practice. For instance, these communities have different historic trauma and lived experiences (e.g., enslavement, colonialism, and immigration) that may create different perceptions of stress and coping strategies. Previous research has found Nigerian-Americans (versus African Americans who have descended from the enslaved) to observe more mental health issues in their communities (Adewale et al., 2016). It is possible that mindfulness (a construct related to mental health) may also be differently experienced and understood across these groups. Therefore, individuals from different ethnic heritages and varying degrees of ethnic identity within Black Americans might interpret and respond differently to FFMQ items.

Second, categories associated with stress might also cause people to have different experiences with mindfulness. Black Americans have historically been subjected to unique forms of stress including racial discrimination and colorism—defined as discrimination or prejudicial treatment that privileges people with lighter skin tones while denigrating people with darker skin tones (Oh et al., 2021). These experiences directly contribute to the disproportionately high levels of chronic stress and associated chronic stress outcomes (e.g., depression) in Black Americans (Williams et al., 1997). Furthermore, stress is inversely associated with levels of mindfulness. For example, people with high (vs. low) levels of perceived stress have scored lower on the FFMQ (Carmody & Baer, 2008). Additionally, stress has been identified as a barrier to attending mindfulness programs (Schussler et al., 2020). It is possible that the experience of stress from perceived discrimination, skin tone (a proxy measure of perceived colorism), and depression might be associated with less interactions with mindfulness, and thus different understandings of the FFMQ items.

Finally, gender, household income, and mindfulness meditation experience are categories commonly associated with mindfulness research. For instance, research has observed gender differences in mindfulness practices such that women (versus men) are more likely to meditate (Upchurch & Johnson, 2019). Additionally, household income poses a potential barrier to mindfulness, given that access to guided mindfulness practices is more available in higher income areas, tends to require fees, and lower income individuals typically have less available time to spend engaging in mediation practices (Biggers et al., 2020). Similarly, meditating samples were found more likely to endorse positively worded FFMQ items, while non-meditating samples were found more likely to deny negatively worded items (Baer et al., 2011). Thus, different experiences between gender, mindfulness meditation experience, and income may lead to varied understandings of the FFMQ items.

The current study is the first to analyze the psychometric properties of the FFMQ-39 and the FFMQ-15 within a non-clinical, Black sample in the United States. In this three-part study, participants completed either the FFMQ-39 or the FFMQ-15 at two time points, separated by 1 month, and were followed up with at a third time point about 2.5 years later to collect additional socio-demographic information. Here, the validity of the FFMQ-39 and the FFMQ-15 within Black Americans was tested by following the approaches of Baer et al. (2006), and Watson-Singleton et al. (2018). First, the factor structure and reliability of the scales were assessed. Then, the factor structure was additionally examined through measurement invariance tests across different social experiences among Black Americans. Finally, the nomological network of the 5 factors was examined across psychological constructs including anxiety, depression level, rumination, ethnic identity, psychological well-being, satisfaction with life, self-compassion, and mindfulness meditation experience. As the first study to validate the FFMQ-39 and the FFMQ-15 with a non-clinical, Black sample, this study did not have predictions for each of the psychometric analyses. Rather, the goal of the study was to explore the psychometric properties of the FFMQ and determine whether the FFMQ can be used to accurately measure levels of mindfulness in Black Americans.

Method

Participants

A pool of 17,465 US participants recruited from Amazon Mechanical Turk Prime completed a brief screening survey where they reported their racial identity, along with other demographics. Only participants who self-identified as Black (n = 1963) were invited to participate in the present study and of those, 619 participants enrolled in the Time 1 assessment; 541 participants began the Time 2 assessment (which took place one month later); and 263 began the Time 3 assessment (which took place approximately two and a half years later). All participants lived in the United States and will be referred to as “Black Americans.” To enhance data quality, attention check questions were used to screen out inattentive responders. An example attention check question is “Please select the middle option.” Participants who failed attention checks were removed from analyses and were not invited to participate at future time points. Additionally, the surveys contained a reCAPTCHA button to screen out potential online robots. Several participants were removed from analyses based on the following pre-registered exclusion criteria: not identifying as Black at Time 1 (n = 14), not correctly answering the attention checks at Time 1 (n = 20), Time 2 (n = 12, excluded only from Time 2 analyses), or at Time 3 (n = 8, excluded only from Time 3 analyses). Some of these exclusion criteria overlapped within participants, resulting in a final Time 1 sample size of 586, a Time 2 sample size of 520 (89% retention), and a Time 3 sample size of 251 (43% retention). Power analyses indicated that a CFA with 80% power, RMSEA = 0.05, and alpha = 0.05 required a sample size of at least 58 participants to reject the model specified for the 39-item scale and a sample size of at least 186 participants to reject the model specified for the 15-item scale. Table 1 shows the socio-demographic information for the participants.

Table 1 Socio-demographic percentages of participants

Procedure

This multi-part longitudinal study was distributed online via Amazon’s Mechanical Turk. A brief screener survey recruited US-based participants to determine if the participants identified as Black or African American (e.g., African American, Jamaican, Nigerian, Ethiopian). The screener also collected the age and gender of the participants. The race inclusion criterion was not advertised in the screener survey. Participants who met the inclusion criteria and agreed to participate at Time 1 read information about the study and the study’s participation. Participants then provided informed consent on an online consent form. At Time 1, half of the participants were randomly assigned to complete the FFMQ-39, while the other half were randomly assigned to complete the FFMQ-15. Time 2 was scheduled to take place approximately one month after Time 1 (average = 32.6 days, range = 23.7–66.9 days). At Time 2, all participants completed the same version of the FFMQ they completed at Time 1. The FFMQ was assessed before any other measures at both Time 1 and Time 2 (except for several demographic questions assessed first in Time 1). Anxiety, depression level, rumination, ethnic heritage, ethnic identity, psychological well-being, satisfaction with life, self-compassion, and mindfulness meditation experience were collected at Time 1. Lifetime discrimination and everyday discrimination were collected at Time 2.

Participants were invited to complete Time 3 approximately two and a half years later (average = 31.4 months, range = 30.3–32.1 months). Time 3 was conducted to collect additional demographic information not assessed at Time 1, including skin tone. Participants were recontacted through their MTurk Prime identification numbers. Collectively, participants had the opportunity to earn a total of $13.05 for completing the study ($0.05 for the screener, $3 for Time 1, $7 for Time 2, and $3 for Time 3; or ~ $12/hour).

Measures

Table 2 includes the mean, standard deviation, alpha, omega total, and omega hierarchical of all the measures in the present study at Time 1 (excluding everyday discrimination and lifetime discrimination which were collected at Time 2). The FFMQ questionnaires collected at Time 2 do not appear here because they were solely used for pre-registered test–retest analyses.

Table 2 Measure descriptives

Five Facet Mindfulness Questionnaire

Five Facet Mindfulness Questionnaire—Long form (FFMQ-39)

Half of the participants completed the long-form, original 39-item FFMQ (Baer et al., 2006). This questionnaire assesses the 5 facets of mindfulness: acting with awareness (awareness), describing (describe), non-judging of experience (nonjudgment), nonreactivity to inner experience (nonreactivity), and observing (observe). The items were measured on a 5-point Likert-type scale from 1 (never or rarely true) to 5 (very often or always true). Example items include: “I do jobs or tasks automatically without being aware of what I’m doing” (reverse-coded; awareness), “I’m good at finding words to describe my feelings” (describe), “I tell myself I shouldn’t be feeling the way I’m feeling” (reverse-coded; nonjudgment), “When I have distressing thoughts or images, I just notice them and let them go” (nonreactivity) and “I pay attention to sensations, such as the wind in my hair or sun on my face” (observe). Each factor was summed for a total factor score. The factor scores ranged from 8 to 40 for all factors except for the nonreactivity factor which ranges from 7 to 35.

Five Facet Mindfulness Questionnaire-15 (FFMQ-15)

The other half of the participants completed the 15-item short-form version of the Five Facet Mindfulness Questionnaire (FFMQ-15) (Baer et al., 2012). Each of the 5 factors contained 3 items. The items were chosen to balance content validity and item factor loadings (Baer et al., 2012). The factor scores ranged from 3–15.

Nomological Network Measures

The nomological network examines the relationship between the observed FFMQ questionnaires and other constructs that are known to be related to mindfulness. This analysis converges with the mindfulness nomological network found in previous research to provide construct validity for the FFMQ-39 and FFMQ-15. The descriptives for the nomological network are in Table 2. The scores were averaged or summed based on the conventional use of the measures in previous research.

Anxiety

Anxiety was measured with the seven-item anxiety subscale of the Hospital Anxiety and Depression Scale (HADS-A) (Zigmond & Snaith, 1983). This subscale identifies anxiety severity in a non-psychiatric population. The items were measured on a 4-point frequency scale from 0 (not at all) to 3 (very often). The responses were averaged to create a composite score.

Depression Level

The Beck Depression Inventory (BDI-II) (Beck et al., 1996) measures the severity of depression symptoms experienced during the past two weeks. This inventory contains 21 items rated on a scale of 0 (e.g., I do not feel sad) to 3 (e.g., I am so sad or unhappy I cannot stand it) and was summed to create a composite score. Due to Institutional Review Board concerns, the current study did not collect items referencing suicide ideation.

Rumination

Rumination was measured with the Rumination-Reflection Questionnaire (RRQ)—rumination subscale (Trapnell & Campbell, 1999). The RRQ—rumination subscale consists of 12 items measured on a 5-point Likert scale from 1 (strongly disagree) to 5 (strongly agree). The scores were averaged to form a composite score.

Ethnic Identity

The Multigroup Ethnic Identity Measure (MEIM) was used to measure ethnic identity (Roberts et al., 1999). The scale consists of 12 items with a four-point Likert scale from 1 (strongly agree) to 4 (strongly disagree). The scores were averaged to form a composite score. The inclusion of all MEIM analyses was recommended by reviewers not pre-registered.

Psychological Well-being

Psychological well-being was measured with the abbreviated version of the Ryff Psychological Well-being Scale (Ryff et al., 1995). The scale consists of 18 items with a 6-point Likert scale from 1 (strongly disagree) to 5 (strongly agree). The scores were summed to form a composite score.

Satisfaction with Life

The Satisfaction with Life (SWL) measure (Diener et al., 1985) was a 5-item scale that assesses positive evaluations of one’s life. Items were measured on a 7-point, Likert scale from 1 (strongly disagree) to 7 (strongly agree). The scores were summed to form a composite score.

Self-compassion

The Self-Compassion Scale-short form (SCS-SF) (Raes et al., 2011), was used to measure self-compassion. Originally derived from the Neff (2003) Self-Compassion Scale (SCS), the SCS-SF is a 12-item scale measuring global self-compassion on a 5-point frequency scale from 1 (almost never) to 5 (almost always). The items were summed to create a composite score.

Mindfulness Meditation Experience

Mindfulness meditation experience was assessed with one item that asked about the frequency of mindfulness meditation on a 1-to-5 response scale: 1 (I have no experience), 2 (I tried it once or twice), 3 (I practice mindfulness meditation several times per year), 4 (I practice mindfulness meditation several times per month), and 5 (I practice mindfulness meditation once a week or more).

Invariance Measures

To test the consistency of the FFMQ-39 and the FFMQ-15 between groups in the sample, measurement invariance was assessed across the following variables: ethnic heritage, skin tone, everyday discrimination, lifetime discrimination, depression level, gender, mindfulness meditation experience, and household income. Although several of these measures are continuous, measurement invariance analyses require continuous measures to be segmented into meaningful categories. The determination of these categories is described below.

Ethnic Heritage

Participants’ ethnic heritage was derived from their parents’ ethnic heritage collected at Time 1. Participant parents’ ethnic heritage was indicated by selecting if one or more parents identified as an “African American,” “African Caribbean/Afro-Caribbean,” “Hispanic or Latino, including Mexican American, Central American,” “African,” “Mixed,” or “Other.” For invariance analysis, the study focused on contrasting participants who descended from American the enslaved vs. not, which was indirectly assessed via parents’ ethnic heritage. Specifically, the parents’ ethnic heritage was dichotomized such that if one or more parent was identified as African American, the ethnic heritage was scored as 1 (i.e., likely descendant of the enslaved; n = 489); and if both parents were not identified as African American, the ethnic heritage was scored as 0 (i.e., likely not a descendant of the enslaved; n = 104). Ethnic heritage was derived from the parents’ ethnic heritage because the sample size of the more direct measure of participants’ ethnic heritage collected at Time 3 was too small for an invariance analysis. Specifically, at Time 3, 212 participants self-identified as a Black American Native/African American descendant of the enslaved, and 36 participants self-identified as an African immigrant, descendant of African immigration, and/or African Caribbean/Afro Caribbean. Since the latter subgroup sample size was less than 50, an invariance analysis was not able to be run on this variable (Tanaka, 1987). Confirming that parents’ ethnic heritage was indeed a valid marker of participants’ ethnic heritage, the parents’ ethnic heritage was found to significantly predict the participants’ ethnic heritage such that if one or more parent was identified as “African American” as opposed to “African Caribbean / Afro-Caribbean,” “Hispanic or Latino, including Mexican American, Central American,” “African,” “Mixed,” or “Other,” the odds of a participant identifying as a descendant of the American enslaved increased by 16.55 times (p < 0.001). This simple model predicted participants’ ethnic heritage with an accuracy of 88% in cross-validation analyses.

Ethnic Identity

The MEIM responses were transformed into a median split binary variable. The median was 2.09. Responses greater than or equal to 2.09 were coded as 1 (n = 307). Responses less than 2.09 were coded as 0 (n = 275).

Everyday Discrimination

Everyday discrimination was measured with a modified version of the Williams et al. (1997) Everyday Discrimination Scale. This nine-item questionnaire measures perceived unfair treatment in everyday life, modified in the present study to include experiences due to a participant’s race. The responses were scored on a 6-point Likert scale that ranged from 0 (never) to 5 (Almost every day). Following previously established procedures (Lewis et al., 2010), the responses were averaged into a composite. The composite was transformed into a binary variable where 0 (n = 201) indicated a composite score of less than one—representing a participant who reported experiencing discrimination less than once on a daily basis, and 1 (n = 318) indicated a composite score of one or more—representing a participant who reported experiencing discrimination at least once on a daily basis.

Lifetime Discrimination

The Lifetime Discrimination Scale (Kessler et al., 1999) measures the perceived experience of discrimination across nine domains such as education, employment, and housing over the course of one’s life. Items are rated 1 (yes) or 0 (no). The responses were summed into a single composite. The present study adapted the items of this scale to measure lifetime experiences of racial discrimination, in particular. Participants who did not report any lifetime discrimination were scored as 0 (n = 141) and participants who reported experiencing any amount of lifetime discrimination were scored as 1 (n = 379).

Skin Tone

The complexion of the participants’ skin tone was collected following the method of Bond and Cash (1992). Participants were presented with nine colored squares. Each color represented a different skin tone, including three light, three medium, and three dark skin tones. Participants were asked to select the square with the color that most resembled the complexion of their faces. The responses were transformed into a binary variable where 0 (n = 57) indicated light complexions and 1 (n = 192) indicated medium and dark complexions.

Depression Level

BDI-II scores were transformed into a binary variable based off previous research where 0 (score of 0–13 on the BDI-II; n = 352) represents minimal depression and 1 (score of 14–60 on the BDI-II; n = 233) represents mild to severe depression (Beck et al., 1996).

Mindfulness Meditation Experience

The mindfulness meditation experience was transformed into a binary variable for the measurement invariance analyses. Participants who indicated having no experience with mindfulness were scored as 0 (n = 273) and participants who indicated any level of mindfulness meditation experience were scored as 1 (n = 309).

Household Income

Participants indicated their household income on a scale ranging from $19,000 to below to $200,000 and above. The scale increments were separated by $20,000. Invariance was examined between participants below versus above the United States average household poverty line. Thus, household income was dichotomized such that 1 (n = 74) indicated participants whose household income was $19,000 or below and 0 (n = 511) indicated participants whose household income was $20,000 and above.

Data Analyses

All pre-registered analyses are reported in either the manuscript or the supplementary materials. Any deviations from the pre-registration are explicitly noted. Based on suggestions from reviewers, results from some pre-registered analyses were moved to the supplementary materials. These results did not contradict the results of any analyses presented in the manuscript. All analyses were conducted using R, version 4.1.0 (R Core Team, 2020), and the psych (Revelle, 2021) and lavaan (Rosseel, 2012) packages. The following FFMQ analyses were conducted on the data collected at Time 1. The FFMQ data from Time 2 were only used for the test–retest analyses. Full information maximum likelihood was applied to estimate all missing values. No multivariate outliers were detected.

The analyses included: (1) CFAs to confirm the strength and reliability of the factors; (2) invariance analyses to test the generalizability of the observed FFMQ models across various identities and experiences of the Black community including ethnic heritage, ethnic identity, everyday discrimination, lifetime discrimination, skin tone, depression level, gender, mindfulness meditation experience, and household income; (3) Cronbach’s alphas and McDonald’s omegas were used to confirm internal reliability; (4) test–retest analyses to assess temporal reliability; and (5) a nomological network examination to describe the relationship between the observed mindfulness construct and its correlation with other domains including anxiety, depression level, rumination, ethnic identity, psychological well-being, satisfaction with life, self-compassion, and mindfulness meditation experience.

Analytic Criteria

CFA fit indices were used to test the structure of the FFMQ scales. The model fit in relation to the number of parameters in the model was assessed with the chi-square ratio (χ2/df). A chi-square ratio of less than three indicated a good fit (Schermelleh-Engel et al., 2003). The absolute fit of the model was assessed with root mean squared error of approximation (RMSEA) and standardized root mean square residual (SRMR). A RMSEA < 0.06 (Browne & Cudeck, 1992) and a SRMR < 0.08 (Hu & Bentler, 1999) indicated an acceptable model fit. Lower RMSEA and SRMR indicated a greater model fit. The incremental fit of the model was assessed with the comparative fit index (CFI) and the Tucker-Lewis index (TLI). Both the CFI and the TLI are required to be greater than 0.90 to be considered a good fit. Higher CFI and TLI indicated a greater model fit. All correlations were assessed at the 95% significance level. Correlation comparisons were evaluated with William’s correlation test with a 95% significance level.

Next, invariance analyses were applied to assess the goodness of fit of the observed models within social categories in the Black American community. The invariance analysis followed Hirschfeld and Brachel (2014) such that each social category was tested with four nested models of invariance: configural, weak, strong, and strict. Configural invariance ensures that the number of latent variables and pattern of loadings of the latent variables are similar across the subgroups. Weak invariance adds to configural invariance to ensure that the size of these loadings is similar across groups. Strong invariance adds to weak invariance to ensure that the intercepts of each factor are similar across groups. Finally, strict invariance adds to strong invariance to ensure that the residual variances and residual covariances of the factor loadings are similar between groups.

Chi-squared difference tests and CFI comparisons were used to assess model invariance. Full invariance was achieved when a non-significant chi-square p-value was observed and when the difference between the model CFI’s (ΔCFI) was less than 0.01 (Hirschfeld & Brachel, 2014). This criterion indicates that the model did not significantly differ from the previous model. Given that fit indices typically worsen with smaller sample sizes, it was anticipated that fit indices might not meet conventional cutoff criteria. Thus, the analyses focused on changes in indices across levels of invariance rather than the absolute level of fit indices. If full invariance was not observed, the model was further investigated to identify the specific indices (i.e., factor loadings, intercepts, residuals, and residual covariances) that significantly differed from the same index in the prior model at a 95% significance level. Once the significant indices were identified, the model was re-analyzed while the identified indices were allowed to vary between groups. If invariance was achieved with the modified model, this was referred to as partial invariance. If the modified model did not achieve invariance, this would mean that no invariance was found within the model.

Results

Confirmatory Factor Analysis

There are mixed approaches for modeling the FFMQ scales as either a 5-factor model or a 5-factor hierarchical model (e.g., Baer et al., 2004; Watson-Singleton et al., 2018). Both models were compared, and the 5-factor model outperformed the 5-factor hierarchical model for both the FFMQ-39 and the FFMQ-15. Table 3 details the fit statistics for the FFMQ-39 and FFMQ-15. Additionally, the 5-factor model was compared to the 6-factor model proposed by Karl et al. (2020)—a model that separated the awareness factor into 2 separate, but correlated factors. The 5-factor model also outperformed the 6-factor model (see supplementary material). Thus, the analyses continued with the 5-factor model. The analyses were run at the item level (as opposed to parcels) to observe the individual item properties and relationships to their factors (Christopher et al., 2012). See Table 4 for the absolute standardized factor loadings for the FFMQ-39 and FFMQ-15.

Table 3 Fit indices of FFMQ-39 and FFMQ-15
Table 4 Absolute standardized factor loadings of the FFMQ-39 and the FFMQ-15

Despite the overall superiority of the 5-factor model for both the FFMQ-39 and FFMQ-15, for the FFMQ-39, the CFI (0.86) and the TLI (0.85) both indicated inadequate fit. An exploratory inspection of modification indices indicated that the largest contributors to misfit were four pairs of similarly worded items from the awareness factor (e.g., “I am easily distracted” and “When I do things, my mind wanders off and I’m easily distracted”). After allowing residuals to correlate for these four pairs as well as accounting for acquiescence (Aichholzer, 2014), the CFI and TLI both increased to 0.92 (see Table 3).

Invariance

An invariance analysis examined the subgroups of ethnic heritage (having one or more parent identified as African American vs. not identifying either parent as African American), ethnic identity (responses greater than or equal to the median vs. responses less than the median), everyday discrimination (reporting experiencing discrimination less than once on a daily basis vs. reporting experiencing discrimination more than once on a daily basis), lifetime discrimination (did not report experiencing discrimination vs. reported experiencing discrimination), skin tone (light complexions vs. medium and dark complexions), depression levels (minimal levels vs. at least mild levels), gender (men vs. women), mindfulness meditation experience (no experience vs. any level of experience), and household income ($19,999 or below vs. $20,000 or higher). Table 5 displays the results from the invariance analyses for the FFMQ-39 and the FFMQ-15. Following recommendations from Tanaka (1987), invariance analyses were applied to all measurements with subgroups of at least 50 participants within each FFMQ questionnaire. Within a large number of total participants, half of the participants completed the FFMQ-39 and half of the participants completed the FFMQ-15. The subgroups of skin tone and household income within each FFMQ questionnaire did not meet the 50 participant criteria. To complete the invariance analyses on these measurements, the subgroup sample size was increased by extracting the FFMQ-15 items from the FFMQ-39 and adding to the original FFMQ-15 responses. The extraction of the FFMQ-15 items from the FFMQ-39 items has been supported by previous research (Gu et al., 2016).

Table 5 Measurement invariance

Full or partial invariance was observed in all models and across all subgroups. Collectively, the 5-factor structure was reliable between ethnic heritage, ethnic identity, everyday discrimination, lifetime discrimination, skin tone, depression level, gender, mindfulness meditation experience, and household income. The parameters constrained to achieve partial invariance are listed in the Supplementary Material.

Reliability

Internal Consistency

The internal consistency of the FFMQ-39 and the FFMQ-15 were assessed with Cronbach’s coefficient alpha, McDonald’s omega total (omegat), and McDonald’s omega hierarchical (omegah). The inclusion of the omega analyses was not pre-registered. As shown in Table 2, all internal reliability indices were acceptable for the FFMQ-39 (all alpha values > 0.81, all omegat values > 0.86, omegah values > 0.66) and the FFMQ-15 (all alpha values > 0.65, all omegat values > 0.67). The FFMQ-15 does not have an omega hierarchical because the omega hierarchical cannot be calculated for factors of three or fewer items.

Test–Retest Reliability

To assess the temporal stability of the FFMQ-39 and the FFMQ-15, the 5 factors of each measure obtained at Time 1 were correlated with the parallel factors measured one month later at Time 2. As shown in Table 6, all test–retest correlations were strong (0.52—0.85) and significant at the 99% significance level. The Supplementary Material contains a correlation table of all FFMQ-39 and FFMQ-15 factors across time.

Table 6 Intercorrelations and test–retest reliabilities of the FFMQ-39 and the FFMQ-15

Intercorrelations

Table 6 reports how the 5 factors correlate with each other, for both the FFMQ-39 and the FFMQ-15. Although most factors were positively related, they remained distinct, ranging from − 0.17 to 0.54. Correlations between factors were similar across the FFMQ-39 and FFMQ-15, although generally smaller in magnitude for the latter, likely due to the lower number of items. Importantly, these factor correlations were similar to those reported by Baer et al. (2006). For the FFMQ-39, 2 (of 10) correlations differed: (1) the previously demonstrated null association between observe and nonjudgment was significantly negative in the current study (r = -0.17, p < 0.01); and (2) the previously demonstrated positive correlation between awareness and observe was not significant (r = 0.10, p = 0.07). For the FFMQ-15, three (of ten) correlations differed such that previously demonstrated positive correlations were not significant in the present study: (1) awareness and nonreactivity (r = 0.10, p = 0.09), (2) awareness and observe (r = 0.02, p = 0.80), and (3) nonjudgment and nonreactivity (r = 0.04, p = 0.51).

A William’s correlation test indicated that the correlation between the nonjudgment and observe factors was significantly different between the FFMQ-39 and the FFMQ-15. Additionally, all intercorrelations between the nonreactivity factor and the other factors in the FFMQ-39 were significantly stronger compared to the parallel intercorrelations in the FFMQ-15. This suggests that the internal relationships between factors within the FFMQ-39 differ somewhat from the internal relationships between the factors within the FFMQ-15 such that some intercorrelations (e.g., the nonjudgment and observe correlation) were stronger in the longer (vs. shorter) version of the FFMQ. A second William’s correlation test was run on the latent variable correlations of the FFMQ-39 and FFMQ-15 to correct for unreliability and the results were consistent with the original William’s correlation test.

Nomological Network: Predicting Psychological Experience

Table 7 demonstrates the nomological network between the FFMQ factors and measures of anxiety, depression level, rumination, ethnic identity, psychological well-being, satisfaction with life, self-compassion, and mindfulness meditation experience. To the extent that the FFMQ-39 and FFMQ-15 accurately measure mindfulness in Black Americans, Black Americans’ scores on the 5 factors should correlate with these constructs in a pattern that parallels the correlations of previous research. The current nomological network paralleled previous research, with the exceptions of the correlations with the observe factor and mindfulness meditation experience. Awareness, describe, nonjudgment, and reactivity correlated with better psychological well-being, and observe was generally unrelated to psychological well-being. Additionally, ethnic identity has not been previously correlated with the FFMQ factors; thus, the correlations in the current study cannot be compared to previous research.

Table 7 Nomological Network of FFMQ-39 and FFMQ-15

Discussion

This study contributed to the generalizability of mindfulness research by being the first study to validate the psychometric properties of the widely used FFMQ-39 and the FFMQ-15 within a non-clinical, Black American sample. The comprehensive examination included analyses of factor structure, measurement invariance, internal consistencies, test–retest correlations, and nomological networks. The rigorous application of these methods confidently supports the use of both the FFMQ-39 and the FFMQ-15 within a Black American sample. Validating these two questionnaires within a Black American sample is important for several reasons. First, this research advances the scientific understanding, application, and measurement of mindfulness within a sample of Americans that have been previously excluded from mindfulness research studies but represent a significant proportion of the U.S. population. Second, these findings support that both the original and short version of the FFMQ can be used as measures of individual differences in trait mindfulness to predict important health outcomes within Black Americans. Third, it advances the field’s ability to verify the efficacy and effectiveness of mindfulness interventions within the Black American community. Mindfulness interventions to reduce stress, pregnancy-related stress, emotion regulation difficulties, and blood pressure show promising results within Black Americans (Palta et al., 2012; Watson-Singleton et al., 2021; Zhang & Emory, 2015). However, without valid assessments of mindfulness, it is not possible to understand whether such interventions are effectively targeting mindfulness skills. The present research enhances confidence in researchers’ ability to assess whether such mindfulness interventions are indeed improving mindfulness skills in Black Americans. Finally, this research not only validates the original 39-item FFMQ but also provides evidence supporting the reliable use of the short-form (FFMQ-15) for studies of Black Americans that require brief assessments. Given the shorter nature of the FFMQ-15, future research can confidently apply the FFMQ-15 to save time while maintaining the integrity of the questionnaire.

As demonstrated, the CFAs indicated that the 5-factor structure was a good model fit for both the FFMQ-39 and the FFMQ-15. The FFMQ-39 model required correlating the residuals of similarly worded items within the questionnaire (e.g., “It seems I am ‘running on automatic’ without much awareness of what I’m doing.” and “I do jobs or tasks automatically without being aware of what I’m doing.”). Future researchers should apply these constraints when testing the model fit for the FFMQ-39. Moreover, this observation encourages future researchers to be more vigilant about item similarity during scale development and validation.

Next, the measurement invariance analysis was one of this study’s strongest contributions. Full and partial invariance was observed between ethnic heritage, ethnic identity, everyday discrimination, lifetime discrimination, skin tone, depression level, gender, mindfulness meditation experience, and household income within Black Americans. This contribution is groundbreaking because this is the first study to assess the invariance of any mindfulness measure within Black Americans, and this is the first FFMQ invariance analysis to include household income, perceived discrimination, skin tone, and ethnic identity. Furthermore, the invariance analysis advances the general understanding of diversity within Black American samples—an area that is vastly under researched within the fields of psychology, mental health, and medicine (Buchanan et al., 2021). Overall, the invariance analysis supports the generalizability of applying the FFMQ questionnaires across Black Americans.

As for the intercorrelations, the nonreactivity intercorrelations between awareness, observe, and nonjudgment are significant in the FFMQ-39 (r-values from 0.24 to 0.47); however, these intercorrelations were much smaller and non-significant in the FFMQ-15 (r-values from 0.04 to 0.24)—indicating a different factor relationship within the two questionnaires. This contrast suggests that the FFMQ-15 nonreactivity intercorrelations might be imbalanced.

Nevertheless, the reliability of both FFMQ scales was further supported with successful test-retests. The test–retest analyses indicated a strong temporal consistency between Time 1 and Time 2 in both the FFMQ-39 and the FFMQ-15, making the current study the first English FFMQ analysis to report consistent test–retest reliabilities. This novel finding demonstrated that the observed FFMQ factors were consistent over time in a Black American sample and supports the use of both FFMQ measures in longitudinal studies with Black American samples.

Continuing with construct validity, a nomological network was established to further test the validity of the observed FFMQ factors measured among Black Americans and constructs that are theoretically related to mindfulness. This was the first study to report the correlations between ethnic identity and FFMQ factors. Interestingly, most of the factors negatively correlated with ethnic identity. This is in line with previous research that highlights cultural barriers (e.g., perceived religious conflict and mental health stigma) in the Black American community that might limit a person’s level of mindfulness (Watson et al., 2016). Thus, it is possible that the more a person identifies with their Black culture, the less mindfulness a person practices. Further research is needed to better understand the relationship between ethnic identity, mindfulness, and the FFMQ scales.

Besides ethnic identity, the current nomological network was largely consistent with previous research, with the exceptions of some correlations with observe. Specifically, the FFMQ-39 observed no association between anxiety, depression, and satisfaction with life. While previous research has seen observe to positively correlate with anxiety, negatively correlate with depression, positively correlate with satisfaction with life, and have no association with psychological well-being (Baer et al., 2008; Christopher et al., 2012; Gu et al., 2016; Lee & Zelman, 2019; Rogge & Daks, 2021; Sweeney et al., 2021). Similarly, the current FFMQ-15 showed no association between observe and the constructs of anxiety, depression, and satisfaction with life. The different observe associations are understandable given that the observe factor has a history of varying associations with psychological assessments (Baer et al., 2008; Christopher et al., 2012; Gu et al., 2016; Rogge & Daks, 2021; Sweeney et al., 2021). This inconsistency might stem from the fact that all the nomological network constructs refer to internal sensations (e.g., thoughts, feelings, and cognitions). While the observe factor is the only factor that measures both internal sensations and external sensations (e.g., smells, sights, and sounds). Thus, the items that measure external sensations within the observe factor might weaken the relationship between the factor and theoretically related constructs.

Another interesting observation within the nomological network is the inconsistent relationship between various facets and mindful meditation experience. Specifically, in the FFMQ-39 there was an unexpected null association between mindfulness meditation experience and nonjudgment. In the FFMQ-15 there were unexpected non-associations between mindfulness meditation experience and the factors describe, nonjudgment, and nonreactivity. This difference might be explained by the low average and variance of mindfulness meditation experience in the sample. The average participant in the sample meditated less than once or twice in their lifetime. It is possible that a wider range of mindfulness meditation experience is required to accurately observe the relationship between mindfulness meditation experience and the FFMQ factors. Given that mindfulness meditation is underutilized within Black Americans (Biggers et al., 2020), future research should target a diverse representation of mindfulness meditation experience within Black Americans to better understand the relationship between mindfulness meditation experience and the FFMQ factors.

It is important to compare the current results to the results from Watson-Singleton et al. (2018)—the first FFMQ validation on Black Americans. Notably, Watson-Singleton et al. (2018) utilized a stricter item selection criterion during their exploratory factor analyses that created a 20-item FFMQ from the original 39-item FFMQ. An analysis of the current sample under the stricter item selection criteria from Watson-Singleton et al. (2018) can be found in the supplementary material. Regarding the similarities, both studies found that the 5-factor model outperformed the hierarchical model. Additionally, in both studies the constructs of depression and self-compassion correlated with the FFMQ factors in the same pattern. Conversely, the studies differed in the results of the test–retest such that the threshold for test–retest was only found to be acceptable in the current study.

Limitations and Future Research

The results of this study should be interpreted in light of several limitations that suggest useful directions for future research. First, due to the online nature of the Mturk Prime sample, this study included a well-powered, diverse, and representative sample of Black Americans. Despite these benefits, Mturk Prime samples come with the limitation of possible inattentive and/or automated responses. Attention checks and a reCAPTCHA button were implemented to mitigate this possibility. Future research can better authenticate the responses by conducting this research in person or an online study with synchronous interaction with an experimenter.

Second, the comprehensive nomological network supported the construct validity of the FFMQ in a Black sample using theoretically related constructs. Nonetheless, due to the underrepresentation of Black people within psychology research, most of the measures in the nomological network have not been validated within a Black sample. The BDI-II (Dutton et al., 2004) and the Self-Compassion Scale (Zhang et al., 2019) are the only nomological network measures that have been validated within a Black sample. Future research should validate the remaining psychological measures to ensure their applicability to a Black sample. Additionally, this research provided essential information on the relationship between mindfulness meditation experience and the FFMQ factors. However, mindfulness meditation experience was measured with a single item that may not have thoroughly captured the diverse range of meditation experience (e.g., body scan, muscle relation, and breathing meditations). Future research could enhance this measure by collecting information such as the average duration of each meditation and the type of meditation.

Third, although invariance was observed in the FFMQ between Black Americans who descended from the enslaved and Black Americans who immigrated from Africa or the Caribbean region, the ethnic heritage variable used for these analyses was derived from the ethnicity of the participants’ parents, as opposed to a self-reported ethnic heritage variable. Parents’ ethnicity did predict participants’ ethnic heritage with very high accuracy (88%). Still, future research should collect self-reported ethnic heritage during the initial research stages.

Fourth, the detailed inclusion of heterogeneous demographics within this Black American sample is a key asset to this study because it supported the generalizability and invariance of the observed FFMQ model structure. Yet, subsamples of household income, ethnic heritage, and skin tone were not large enough to perform invariance analyses on both the FFMQ-39 and the FFMQ-15. Fortunately, this issue was ameliorated in the invariance analysis by extracting the FFMQ-15 items from the FFMQ-39 responses to adequately power this test. To avoid this issue in the future, researchers should employ methods to ensure socioeconomic diversity within the sample and collect comprehensive within group demographics during the initial stages of data collection.

Fifth, while the sample contained a wide age range, 87.7% of the sample was under the age of 51. Thus, preventing the examination of age differences for middle-aged and older adults. Future research might deliberately sample adults over age 50 and would additionally do well to assess trait mindfulness over the life course to further examine age and cohort differences.

Finally, the nomological network and invariance analyses enhanced the understanding of how the FFMQ relates to several social experiences within Black Americans. Although a wide range of social experiences were included, this research did not include a direct analysis of spirituality, religiosity, or the perceived stigma of mindfulness—social experiences that have been linked to mindfulness within Black Americans (Watson et al., 2016). The Supplementary Materials contain an invariance analysis on an indirect measure of spirituality/religiosity. Future researchers should include direct measures of spirituality, religiosity, and the perceived stigma of mindfulness in additional invariance analyses.

Taken together, the results of this research are foundational to understanding and measuring mindfulness within a non-clinical Black sample. The psychometric properties of the FFMQ-39 and the FFMQ-15 indicate that both questionnaires can validly measure mindfulness within a Black sample. Given that mindfulness is a valuable practice and an effective treatment to improve mental and physical health, it is essential that mindfulness can be validly measured in all populations — particularly disenfranchised communities of color who experience disproportionately poorer health outcomes.