Due to the requirement of an identified stressor, or causal event, posttraumatic stress disorder (PTSD) is considered a “made-to-order” diagnosis for personal injury plaintiffs (Melton, Petrila, Poythress, & Slobogin, 2007; American Psychiatric Association, 2013). Stressors claimed as the precipitant and cause of PTSD symptoms can involve a variety of events, including motor vehicle accidents, industrial disasters, workplace discrimination and harassment, and physical and sexual assaults (Taylor, Frueh, & Asmundson, 2007). In most jurisdictions, damages can be awarded for emotional distress and injury in the absence of physical harm (Foote and Lareau, 2013). The utilization of PTSD as a basis of personal injury claims has become so commonplace as to support an industry of attorneys who solely litigate cases involving PTSD (Stone, 1993), and PTSD is more likely than other psychological conditions to be evaluated within the context of litigation (Demakis & Elhai, 2011; Wisdom et al., 2014).

Several aspects of trauma symptoms and posttraumatic conditions create obstacles to a clear diagnostic picture. PTSD is recognized as a highly subjective, heterogeneous, and often-comorbid disorder (Hall & Hall, 2007; Zoellner, Pruitt, Farach, & Jun, 2014). Galatzer-Levy and Bryant (2013) calculated as many as 636,120 combinations of symptoms that can comprise a PTSD diagnosis, illustrating the degree of variability possible for each individual presentation of the condition. Notably, the DSM-5 diagnostic criteria include both psychiatric (e.g., subjective emotional symptoms, such as feelings of estrangement) as well as potentially more objective, cognitive complaints (e.g., difficulty concentrating, memory deficits; American Psychiatric Association, 2013; Wisdom et al., 2014). This distinction suggests that evaluation of PTSD should address both emotional and cognitive functioning. In trauma, there is also the potential for comorbid physical injury, as may be common in instances such as combat, sexual violence, or natural disasters. In the case of personal injury litigation, comorbid physical and psychological injury is also common, such as head injury associated with a motor vehicle accident (Zatnick et al., 2004). Thus, clinicians may need to delineate psychogenic symptoms from those with organic origins. Further, clinicians in a psycho-legal context might be tasked with discerning the presence of trauma symptoms due a specific etiological stressor (i.e., the alleged tort offense), while also acknowledging the past experience of trauma and other causal and contributory factors. Reports from the National Comorbidity Survey indicate that 34% of men and 25% of women endorsed a lifetime experience of more than one traumatic event (Kessler, Sonnega, Bromet, Hughes, & Nelson, 1995).

Adding to this inherent complexity, assessors of mental injury, in a legal context, must be especially attuned to the possibility of symptom exaggeration or falsification. Malingering, defined as the “deliberate fabrication or gross exaggeration of psychological or physical symptoms for the fulfillment of an external goal” (American Psychiatric Association, 2013), is presumed to be highly prevalent in contexts where individuals may garner financial, legal, or personal incentives (e.g., damages, conviction, or retribution; Peace & Masliuk, 2011). As noted by Resnick and colleagues (Resnick, West, & Payne, 2008), the identification of malingered, either falsified or exaggerated, PTSD symptomatology has become one of the most challenging tasks for clinicians, independent of legal context. The legal, financial, and societal implications for the successful malingering of psychological symptoms and conditions are substantial (Peace & Masliuk, 2011; Stone, 1993), thereby placing mental health practitioners at the forefront of legal cases to verify the presence and extent of psychological distress.

Estimating the prevalence of malingering in clinical and forensic populations is difficult. According to estimates by forensic practitioners, malingering likely occurs in 15–17% of forensic cases (Rogers & Bender, 2013); however, some estimates suggest that malingering occurs in up to 40% of civil litigation cases involving neuropsychological assessment (Larrabee, 2003; Young, Kane, & Nicholson, 2007). Other studies have found that 20–30% of results from psychometric testing on personal injury plaintiffs suggest that malingering had taken place (Taylor et al., 2007). Compared to other clinical settings, the base rate of symptom exaggeration and failure on performance-based tasks has been found to be higher in situations where compensation is sought (Frueh et al., 2005). Thus, there is a need for practitioners to be aware of the possibility of feigning and to have adequate tools to assess for it.

The subjectivity, heterogeneity, comorbidity, and prominence of PTSD increases the capacity for individuals undergoing an assessment to feign psychological injury (Purtle, Lynn, & Malik, 2016; Wisdom et al., 2014). The symptoms of PTSD can be believably feigned regardless of the veracity of their existence, and regardless of depth of psychological knowledge, direct coaching, or advance practice. Arce, Farina, and Buela (2008) demonstrated that a sample of naïve participants, untrained in psychopathology or previously exposed to potentially-traumatic traffic accidents, was able to feign both direct and indirect symptoms of trauma following a simulated motor vehicle accident on the MMPI-2. Lees-Haley and Dunn (1994), investigating whether naïve subjects could produce believable symptom profiles, found that up to 98.9% of subjects could successfully meet the requirements of PTSD based on self-report questionnaires, without coaching or practicing. It is evident that clinicians cannot rely on self-report of symptoms alone and, consequently, and there is substantial pressure to identify methods of evaluation that are robust against threats of falsification.

Estimates of feigning in legal contexts may be further obscured by the extent to which persons may vary in their deception throughout the process of assessment and across testing periods. Persons who feign may be inconsistent in their false responding across tests, time, and symptom clusters, including psychiatric, physical/somatic, or cognitive/neuropsychological (Berry & Nelson, 2010; Boone, 2009). It cannot be assumed that known feigners will always falsify responses in assessment, or that individuals who feign one type of symptom will also feign others (Rogers, 2008a; Boone, 2009). Thus, malingering is best not perceived as a “monolithic” or stable construct or behavior, but rather one that is dynamic and multi-faceted. Due to this variability, it has long been recommended that mental health practitioners use a variety of methods and sources of information when conducting an assessment of malingering. In the general forensic literature, there are two primary types of malingering measures, symptom validity tests and performance validity tests.

Symptom validity tests (SVTs) aim to detect the exaggeration or fabrication of psychological symptoms through self-report measures of experience. SVTs utilize myriad detection methods, including capitalizing on the relative infrequency, atypical combination, or unusual severity of reported psychological symptoms (Rogers, 2008a). SVTs tend to exist in two forms: larger, multi-scale inventories of personality and psychopathology that include embedded validity scales and briefer, domain-specific measures (Guy, Kwartner, & Miller, 2006; Rogers & Bender, 2013). The most commonly used and recognized embedded SVT methods are the Minnesota Multiphasic Personality Inventory–2 (MMPI-2; Butcher, Dahlstrom, Graham, Tellegen, & Kreammer, 1989) and the Personality Assessment Inventory (Morey, 2007; Boccaccini & Brodsky, 1999; Martin, Schroeder, & Odland, 2015). In these, examinees respond to a series of questions which garner information about symptom presentation. In addition to clinical scales, validity scales serve to assess atypical or severe symptom endorsement and inconsistent, defensive responding.

Bridging the gap between larger inventories and domain-specific tasks are measures such as the Trauma Symptom Inventory (TSI-2; Briere, 2011). The TSI-2 holds a narrower focus on the symptoms and experiences common to traumatic stress disorders, while also including embedded validity subscales (i.e., the Atypical Responding scale (ATR)). As with other large-scale inventories, an advantage of the TSI-2 is the ability to identify atypical presentations based on comparisons with genuine clinical and community populations. While valuable in its relative focus on trauma reactions, Resnick et al. (2008) note that the ATR subscale of the original TSI (Briere 1995) demonstrated only marginal ability to differentiate between genuine PTSD patients and malingerers, leading to a substantial false positive rate (Elhai, Gray, Naifeh, Butcher, Davis, Falsetti, & Best, 2005). Elhai and colleagues (2005) observed that the TSI ATR subscale was composed of items that addressed bizarre, unusual, and psychotic experiences, which were not selected based on an infrequency criterion (e.g., identified as occurring in less than 10% of the standardization sample). Clinicians thus were strongly cautioned against the sole use of the TSI ATR scale to determine malingered PTSD. While substantial limitations to the use of this measure were noted, Boccaccini and Brodsky (1999) found that in a survey of 80 psychologists involved in emotional injury cases, approximately 33% reported the use of the TSI in addition to other measures in their assessment batteries. A more recent survey of test usage found that in an international sample of 868 assessments conducted by 434 clinicians, approximately 21% involved the use of the TSI in civil tort cases, reflecting the second most commonly used measure alongside the MMPI (Neal & Grisso, 2014). Per the measure’s publisher, the TSI-2 features considerable revision to the original TSI ATR subscale and the scale’s predictive ability is “markedly superior” to the original TSI (Briere, 2011). Supporting this assertion, Gray, Elhai, and Briere (2010) found that the TSI-2 ATR scale was able to correctly classify 74.2% of a known-groups comparison sample into simulators and genuine PTSD sufferers.

Domain-specific measures attempt to discern malingering or symptom exaggeration more briefly by seeking endorsement of atypical, unusually severe, or highly unlikely symptoms (Parks, 2015). Some examples of these measures include the Structured Interview of Reported Symptoms (SIRS and SIRS-2; Rogers, Bagby, & Dickens, 1992; Rogers, Sewell, & Gillard, 2010), the Miller Forensic Assessment of Symptoms Test (M-FAST; Miller, 2001), and the Structured Inventory of Malingered Symptomatology (SIMS; Widows & Smith, 2005). Domain-specific measures such as these are frequently used in forensic practice (Guy et al., 2006; Rogers & Bender, 2013) and empirical evidence supports the use of these stand-alone SVTs in the detection of feigned amnesia, epilepsy, and psychosis (Parks, 2015). A noted limitation of SVTs is the degree to which they address specific domains of psychopathology (i.e., psychosis, memory and cognitive impairment). For example, the M-FAST (Miller, 2001) was originally developed to detect feigned psychosis, and thus includes a predominance of items highlighting atypical or extreme psychotic symptoms. On the SIRS, researchers demonstrated that even severely traumatized individuals rarely endorsed items pertaining to usual symptom combinations or “absurd” symptoms (Brand, McNary, Loewenstein, Kolos, & Barr, 2006; Rogers, Payne, Correa, Gilliard, & Ross, 2009), suggesting that such measures do not necessarily map on to traumatic conditions.

As noted by Gray et al. (2010), currently there is no “gold standard” for the assessment of malingered PTSD, and extant methods are largely suited to distinguish between genuine and feigned psychiatric distress, rather than PTSD specifically. Others have suggested that dedicated measures of malingering capture a broad “badness” factor, as indicated by exaggerated distress, which is thus reflective of general dishonesty of presentation, regardless specific symptoms (Rogers, 2008b; Egeland, Andersson, Sundseth, & Schanke, 2015). According to some, measures of symptom exaggeration tend to perform poorly when implemented to detect the falsification of a specific disorder (Gray et al., 2010; Taylor et al., 2007). This, however, is complicated by evidence suggesting that existing measures have demonstrated limitations in distinguishing between malingered PTSD and a genuine, extreme form of distress often associated with PTSD (Hall & Hall, 2007). One study found that, in a clinical sample of victims of adult sexual abuse, 20% of respondents on the MMPI-2 Infrequency scale had a T-score greater than 100 (5 standard deviations above the mean; Klotz Flitter, Elhai, & Gold, 2003). As discussed by Klotz-Flitter and colleagues (Klotz Flitter et al., 2003), significant elevations on malingering scales may be reflective of severe, genuine pathology or distress for trauma victims. Elevations may result from conscious or unconscious attempts to “cry for help,” or may result from dissociative experiences, in addition to typical PTSD symptomatology, which lead to atypical or “chaotic”, but genuine, symptom profiles. Similarly, the SIRS has been found to over-classify individuals with trauma histories as feigners (Brand, Turisch, Tzall, & Loewenstein, 2014). As such, SVTs currently possess limitations when it comes to the assessment of PTSD specifically.

In contrast to symptom validity tests, performance validity tests (PVTs) or effort-based measures are designed to verify symptom presentation based on actual examinee performance on neurocognitive tasks. Similar to symptom validity tests, PVTs utilize a variety of detection methods, including identifying uncommon or unlikely performance presentations, when compared to a genuinely impaired normative sample. For example, some methods take advantage of a floor effect, in which malingerers demonstrate significant impairment on tasks for which even genuine patients would succeed. The Test of Memory Malingering (TOMM; Tombaugh, 1996), is one measure that uses such a strategy. In a validation sample of 145 patients with confirmed neurological impairment, representative of patients with cognitive impairment, aphasia, traumatic brain injury, and dementia, all diagnostic groups averaged above the recommended cut-off score (Tombaugh, 2006). Interested readers are encouraged to see administration manual for more specific details, in efforts to maintain test security. Floor effect methods are among the most frequently used to detect suboptimal effort among a sample of 188 neuropsychologists (Sharland & Gfeller, 2007).

Despite the purported utility of performance-based methods, the actual use of PVT’s in personal injury cases has been less prominent, particularly in cases where head injury is not also alleged. In a survey of 80 emotional injury evaluators, it was noted that no measure of performance validity was listed among the top ten most frequently used tasks (Boccaccini & Brodsky, 1999). Rather, approximately half indicated using non-symptom-based measures such as the WAIS-R or WAIS-III, which are themselves not PVTs, but have subtests that might be individually indicative of poor performance (e.g., digit span). Further, as outlined by Wisdom and colleagues, few studies have addressed the use of PVTs when assessing cognitive functioning in the context of PTSD (2014). While many studies have demonstrated an association between PTSD and cognitive impairment in domains such as memory, attention, and executive function, few have done so while controlling for the possible effects of response bias. When taking into consideration performance on the Word Memory Test (Green, 2003), Wisdom and colleagues demonstrated that of a sample of 134 military veterans with a history of traumatic brain injury, no significant differences were found between veterans with and without PTSD on measures of cognitive functioning (2014). These findings suggest the importance of incorporating a PVT into the assessment of PTSD, particularly in settings vulnerable to secondary gain (e.g., litigation).

Some researchers have attempted to apply knowledge of physiological arousal and cognitive processing systems to the detection of malingered PTSD. In terms of physiology, PTSD is associated with elevations in heart rate, startle response, blood pressure, and perspiration, consistent with common bodily reactions to stress (Hall & Hall, 2007; Taylor et al., 2007). Cognitively, PTSD is associated with hypervigilance to threat and alterations in attention, memory, and processing speed (Merten, Thies, Schneider, & Stevens, 2009). Researchers in the field of malingering detection had hoped that the development of “gold standards” in hyperarousal symptom presentation for PTSD would allow for more accurate detection strategies, as it stands to reason that physiological and cognitive measures are less subject to response bias and intentional deception. Supporting this theory, one study demonstrated that 75% of known simulators were unable to mirror true PTSD patients in terms of the severity of physiological changes that occur in response to trauma (Orr & Pitman, 1993). In contrast, it was also found that simulators of PTSD were sufficiently capable of producing heart rate elevations that mimic those found in true patients, and it is believed that feigners would be able to simulate other physiological arousal symptoms with practice (Orr & Pitman, 1993). In terms of cognitive symptoms, Thomas and Fremouw (2009) demonstrated that a modified Stroop task and a free recall task involving words associated with trauma were capable of differentiating between genuine PTSD patients and feigners. The Morel Emotional Numbing Tests (Morel, 1998) is a clinical example of these methods designed specifically to assess for feigned PTSD. While a promising performance-based measure, more research is needed to support its utility (Messer & Fremouw, 2007).

Further confounding this process is the existence of individual differences in reactions toward traumatic events. For example, when looking specifically at trauma following motor vehicle accidents, it has been found that 23% of individuals with a PTSD diagnosis could not be identified based on the physiological measures of heart rate (Veazey, Blanchard, Hickling, & Buckley, 2004; Hall & Hall, 2007). More generally, it has been found that over 40% of persons diagnosed with PTSD did not have increased physiological reactions to presentations of trauma-related stimuli (Orr, McNally, Rosen, & Shalev, 2004; Taylor et al., 2007). Thus, it is evident that even those with true PTSD exhibit stark differences in symptom presentation. Heterogeneity of symptom presentations in PTSD and the potential for distinct subgroups of patients with true disorders suggest that the identification of gold standards of assessment based on physiological arousal or cognitive processing is likely to be difficult. Thomas and Fremouw (2009) highlight that, to date, no single tool has been found to possess the ability to accurately and consistently detect PTSD feigning.

Consensus from neuropsychologists urges the use of both performance and symptom-based tests in the assessment of possible feigning (Egeland et al., 2015; Heilbronner, Sweet, Morgan, Larrabee, and Millis, 2009). This recommendation stems from a body of literature demonstrating divergence between PVT and SVT measures when evaluating malingered neurocognitive impairment, specifically following a traumatic brain injury. For example, examining the relationship between PVT and SVT failure, Demakis, Gervais, and Rohling (2008) found that elevated reports of psychological symptoms, as measured by the MMPI-2, were not associated with PVT failure, nor were poorer performances on measures of neuropsychological functioning, including the TOMM, associated with SVT failure. Based on factor analysis using both PVTs and one SVT, Ruocco and colleagues noted (Ruocco et al., 2008) that individuals who feigned impairment on performance validity measures rarely feigned impairment on or exaggerated their experience of psychological distress. Similarly, Greiffenstein, Gola, and Baker (1995) found that, in a sample of brain injury patients referred for neuropsychological evaluation in the context of personal injury litigation, PVTs and SVTs scores were not significantly related. Subsequent factor analyses did not support a unitary construct of malingering. These findings support the assertion that malingering be conceptualized as a multifaceted construct composed of uniquely contributory, performance, and symptom endorsement factors. Despite its roots in neurocognitive impairment, this has been extended to include the assessments of possible feigned PTSD, particularly in cases when cognitive symptoms are alleged (Wisdom et al., 2014; Demakis & Elhai, 2011).

Recommendations from neuropsychology also encourage that validity should be tested across symptom domains, including cognitive functioning, somatic or physical symptom complaints, and psychological distress. Using a sample of compensation seekers and personal injury litigants, some of whom presented with comorbid head injury, Alwes, Clark, Berry, and Granacher (2008) found that both the SIMS and the M-FAST demonstrated stronger sensitivity when detecting probable psychiatric feigning, but were less successful in detecting probable neurocognitive feigning. Likewise, van Impelen, Merckelbach, Jelicic, and Merten (2014) note that the SIMS, despite inclusion of cognitive symptom subscales, “cannot be relied upon” to detect feigned cognitive impairment, reflecting the need for inclusion of measures of neurocognitive and psychiatric feigning. Using a sample of patients with either verified or unverified brain injury, Egeland et al. (2015) argued the degree of influence of content versus method, demonstrating a dissociation between symptom domains and method of assessment. In contrast to a “general badness” factor or unitary conceptualization of malingering (Rogers, 2008b), Egeland and colleagues found that a two-factor model, comprised symptom validity and performance validity indicators, best fit the data.

Despite prevalent recommendations and supportive evidence to utilize multiple measures of malingering when faced with possible feigned PTSD, much less is known regarding how various measures operate interdependently, both in terms of method and domain content, when head injury is not alleged or documented. Most studies demonstrating the divergence between SVT and PVT measures of malingering have done so using a sample of litigants or compensation seekers with concurrent head injury (Alwes et al., 2008; Greiffenstein et al., 1995; Ruocco et al., 2008; Egeland et al., 2015). As sustained head injury is not a necessary condition for the development of PTSD, and not all individuals who experience a trauma experience concurrent physical injury (Fujita & Nishida, 2008), there is a need to further replicate the above findings with a sample of participants for whom head injury is not alleged. Further, most studies detailed above have utilized known-group, patient samples, who present with heterogeneous neurocognitive and psychiatric histories, as well as varied motivations to falsify symptoms. The current study utilizes a simulated personal injury paradigm, in which all participants hypothetically experience the same precipitating event and symptoms of PTSD. As PTSD consists of both psychiatric and cognitively based symptoms, it is hypothesized that litigants in a simulated personal injury paradigm will feign on both SVT and PVT measures; however, consistent with previous studies, there will be a dissociation between assessment type. The use of simulated litigants allows for greater homogeneity, and thus a clearer understanding of the relationships between measures of malingering for a specific condition.

Researchers interested in improving techniques of malingering detection have identified the need to better understand degree of interdependence between measures of symptom validity, performance validity, and peripheral functions (e.g., cognitive functioning, memory functioning, psychiatric symptom influence). An initial, exploratory method of examining this interdependence is via correlational studies and the use of a multi-trait, multi-method matrix (MTMM; Campbell & Fiske, 1959). With a MTMM, strong inter-correlations are expected among measures which purportedly assess the same underlying construct (convergent validity), while weak or negative correlations are expected among measures that assess different constructs (discriminant validity). Divergence between measures intended to capture the same construct, such as general malingering, is suggestive of underlying sub-constructs.

The present study aims to expand upon previous research by further exploring the relationship between performance validity measures and symptom validity measures in the detection of malingering. Specifically, the present study utilizes an exploratory MTMM framework, within a simulated personal injury paradigm without the allegation of physical injury (specifically head injury or traumatic brain injury). An MTMM matrix was employed to examine the correlations among symptom-based measures of malingering and performance-based or effort-based measures of malingering in order to examine their convergent and discriminant validity.

One method of evaluating the interoperations between malingering measures is by using a simulation design, in which participants are instructed to simulate conditions of interest. Simulation designs are common in assessment of malingering literature (see van Impelen et al., 2014 for examples). In many simulation studies, participants are explicitly told to feign; however, as noted by Christiansen and Vincent (2012), it is unclear how likely it is that personal injury litigants would be explicitly told to exaggerate or fabricate symptoms. Rather, it is more likely that litigants would be informed that monetary damages would be associated with their level of distress and impairment as it has been shown that the possibility of compensation through litigation results in inflated reports of symptoms, even when genuine pathology exists and regardless of the type of trauma experienced (Frueh, Gold, & de Arellano, 1997; Peace & Masliuk, 2011). Suggestion to feign, distinguished from coaching or a directive to do so, has not been thoroughly investigated in the literature. Thus, in their experimental paradigm, Christiansen and Vincent (2012) incorporated the simulated suggestion or implication to feign, rather than explicit instruction. Results from this study found that suggestion to malinger yielded incremental increases in a participant’s likelihood of exaggerating symptoms. While these increases were not statistically significant, findings demonstrated that involvement in litigation, even in a simulated paradigm, resulted in more extreme symptom endorsement and that suggestion to feign has an additive effect. The present study, as a part of a larger research initiative developed to replicate the results of Christiansen and Vincent (2012), also incorporates the use of simulated suggestion to malinger.

The aim of this study was to evaluate and describe relationships between symptom validity and performance validity measures within a simulated personal injury paradigm using an MTMM framework. Consistent with evidence of convergent and divergent validity, it was hypothesized that participant scores would correlate according to measurement of underlying traits and methods, such that measures psychological symptom endorsement (TSI-2 ATR, SIMS, M-FAST) would demonstrate strong, positive associations with other measures of symptom endorsement and SVTs would demonstrate weaker, but significant correlations with measures of performance or effort (i.e., TOMM Trial 2). Furthermore, it was predicted that measures would correlate based on similar method, such that in-person measures (M-FAST and TOMM) would be more highly correlated than with measures completed by computer.

Method

Sample and Participant Selection

This study was a part of a larger research initiative which recruited 465 participants, who were enrolled in psychology courses at a large, public, Southwestern university. Students received course credit for their involvement in this study, which required approximately 3 h of in-person assessment time. Inclusion criteria required that participants be over the age of 18, enrolled as an undergraduate student at the university, and proficient in English, based on self-report. Following study completion, 13 cases were excluded due to errors related to inattentive responding. In order to maintain the presumption that participants could reasonably assume the role of the victim of a motor vehicle accident situation similar to those described in the experimental condition vignettes, completed protocols were excluded for data analysis if the participant indicated that they do not hold a valid driver’s license.

Design

The present study involved an exploratory analysis of data collected from a larger experimental research initiative investigating the role of litigation and the suggestion to malinger in a personal injury paradigm. As a part of the larger research initiative, participants were randomly assigned to one of four experimental conditions, as detailed through a written vignette. The larger research initiative served as a replication and extension to the procedure used by Christiansen and Vincent (2012).

Measures

Pre- and Post-questionnaires

Participants were first presented with a 52-item questionnaire developed for the current study. The pre-questionnaire collected demographic and background information including age, gender, marital status, ethnic identification, education level and college major, occupational status, voter registration status, and history of military involvement. Information was also collected about past motor vehicle accident and litigation involvement. After completing the primary measures of the study, participants completed a 15-item post-questionnaire designed to evaluate perceptions of the study and the extent to which participants felt they faithfully responded to the measures.

Trauma Symptom Inventory–2–Alternative (TSI-2-A; Briere, 2011)

The TSI-2-A is a 126-item, broadband self-report rating scale of trauma-related symptoms and behaviors. It was designed for use with individuals 18 years and older and requires a fifth-grade reading level. The TSI-2-A is an alternative form of the larger TSI-2, which consists of 136 items and includes questions specific to sexual trauma. Like the original TSI, the TSI-2 is applicable for use in a variety of inpatient, outpatient, and community settings (Briere, 2011). Items consist of a variety of cognitive, affective, and physiological symptoms in addition to behaviors and experiences commonly associated with trauma. The TSI-2 yields two validity scales (RL and ATR), four factors, and 12 clinical scales. The TSI-2 ATR scale addresses respondent’s tendency to over-endorse trauma symptoms, even when compared to those with confirmed, severe posttraumatic symptomatology. A high score on the TSI-2 ATR scale can be reflective of a variety of phenomena, including generalized over-endorsement across all items, specific over-endorsement on items associated with PTSD, random responding with over-endorsement on clinically rare symptoms, or very significant levels of distress. Subscale scores on the TSI-2 ATR range from 0 to 24, with higher scores indicating a greater likelihood of an invalidated clinical profile due to excessive symptom endorsement.

Test of Memory Malingering (TOMM; Tombaugh, 1996; Tombaugh, 2006)

The TOMM is a three-part testing battery which consists of 50 items per trial. It is used to assess exaggeration and fabrication of memory impairment and is used as a proxy for evaluation of effort. In trials 1 and 2, subjects are presented with a series of 50 line-drawings of ordinary objects which they are instructed to commit to memory. After the initial learning phase, subjects are presented with 50 sets of pairs of items, in which they are to select the one item in each pair that is identical to a drawing previously presented. The TOMM has been validated on several populations, including neurological patients, college student normal controls and simulators, persons feigning traumatic brain injury and true brain injury litigants, persons with depression, and elderly patients (Tombaugh, 2006). It is the most widely used PVT, having reportedly been used by 78% of a sample of practicing neuropsychologists (Martin et al., 2015). Scores less than the designated cut-off on any trial call into question the validity of the test-taker’s overall performance. Scores on the TOMM trials 1 and 2 range from 0 to 50, with higher scores indicating optimal effort or normative performance.

Structured Inventory of Malingered Symptomatology (SIMS; Widows & Smith, 2005; Smith and Burger, 1997)

The SIMS is a 75-item self-report measure developed to serve as a screening instrument for the detection of exaggerated or feigned psychopathology and cognitive dysfunction. It was designed for use with individuals aged 18 years and older across a variety of clinical and forensic settings. The SIMS is a domain specific, multi-scale measure which yields both a total score that reflects general feigned presentations, as well as five non-overlapping subscales including: psychosis, neurologic impairment, amnestic disorders, low intelligence, and affective disorders (Widows & Smith, 2005). The SIMS has demonstrated utility with a wide range of clinical, forensic, and community samples, including a sample of inpatient trauma survivors (Rogers, Robinson, & Gillard, 2014; Heinze & Purisch, 2001; van Impelen et al., 2014; Wisdom, Callahan, & Shaw, 2010). The SIMS is the most widely used, standalone SVT according to a sample of 316 professional neuropsychologists, of whom 73% indicated conducting civil forensic evaluations (Martin et al., 2015). The SIMS total score is interpreted as an overall estimate of the likelihood that the subject is feigning or exaggerating psychological symptoms or cognitive impairment. On the SIMS, subscale scores range from 0 to 15, with greater scores serving as indicative of elevated endorsement of symptoms consistent with each of the five subscale domains. The SIMS Total score ranges from 0 to 75, with higher scores reflecting a greater endorsement of atypical, improbable, or inconsistent symptoms. The SIMS, compared to other measures, holds broader coverage of potentially feigned symptoms, including cognitive impairment and psychiatric symptomatology (Rogers et al., 2014).

Miller Forensic Assessment of Symptoms Test (M-FAST; Miller, 2001)

The M-FAST is a 25-item structured interview designed as a screening measure for the determination of malingered psychopathology. It is a widely used forensic assessment measure intended to assess response style. The M-FAST includes seven subscales based on empirically validated response styles and reporting strategies used by malingering individuals. The subscales measure: discrepancies between observed and reported symptoms, extreme symptomatology, rare combinations of symptoms, unusual hallucinations, unusual symptom course, overly negative self-image, and suggestibility. While some subscales have been reported as consistently able to discriminate between honest responders, known malingerers, and those instructed to malinger, the total score is the most effective and most frequently used. The M-FAST has been used with both known-groups clinical samples and simulation non-clinical samples (see Guriel-Tennant & Fremouw, 2006, Guy et al., 2006; Jackson, Rogers, and Sewell, 2005). Scores on the M-FAST range from 0 to 25, with greater scores indicating greater likelihood of a feigned presentation. As noted by Rogers and colleagues, the M-FAST focuses predominantly on a single detection strategy, “bogus symptoms”, rather than exaggeration or amplification of potentially valid symptoms (Rogers et al., 2014).

Procedure

Participants completed all aspects of the research protocol in a lab space on a university campus. All self-report measures, including pre- and post-questionnaires, were administered via an online survey platform. In-person assessments (i.e., M-FAST and TOMM) were administered in a private office space by trained research assistants. Following receipt of informed consent, participants completed the pre-questionnaire and other self-report measures included in the larger research initiative.

Participants were then randomly assigned to read one of four vignettes. All vignettes described a fictional scenario in which the participant had recently been involved in motor vehicle accident and was continuing to experience psychological and cognitive symptoms associated with the accident. Symptoms included jumpiness/nervousness while driving, avoidance of the accident location and talking about the accident, bad dreams about the accident, fogginess, exaggerated startle response, and difficulty concentrating and remembering things. All participants were informed that they were not experiencing physical pain nor ongoing physical injuries as a result of the fictional accident.

As a part of the experimental conditions, participants were informed that they were either (1) asked to complete an evaluation at the request of a physician following the conclusion of a lawsuit (condition 1: post-litigation, no suggestion), (2) asked to complete an evaluation at the request of a physician (condition 2: no litigation and no suggestion), (3) asked to complete an evaluation at the request of his or her attorney for the purposes of an ongoing case (condition 3: active litigation, no suggestion); or (4) asked to complete an evaluation at the request of his or her attorney for the purposes of an ongoing case, with the suggestion, by the attorney, that greater impairment would lead to a larger monetary award (condition 4: active litigation and suggestion). Experimenters were blinded to the participant’s assigned condition.

Under this paradigm, participants instructed to respond to a set of measures as if they were the person in the vignette they received. Per standard administration, the M-FAST and TOMM were presented in person by trained research assistants. The SIMS and TSI-2 were administered via computer, along with pre- and post-questionnaires and other measures included in the larger research initiative. All post-manipulation measures were presented in a random, counterbalanced order to prevent ordering effects. Following the completion of post-questionnaires, participants were debriefed and thanked for their participation.

Results

The initial sample consisted of 458 completed protocols, of which 34 cases were excluded for not holding a valid driver’s license. After application of the exclusionary criteria, the final sample included 411 undergraduate students. A total of 283 participants self-identified as being female (68.9%) and 128 participants identified themselves as male (31.2%). Participant ages ranged from 18 to 58 (M = 23.04; SD = 5.59). The analytic sample was racially and ethnically diverse, with 22.9% of participants self-identifying as Caucasian; 16.1% as African American; 23.1% as Asian-American, 28% as Hispanic, and 10% as Other or Multi-Racial. A summary of descriptive statistics is presented in Table 1.

Table 1 Descriptive statistics by full sample and condition

There were no significant differences across conditions based on participant age [F (3,407) = 1.501, p = .212], gender [X2(3) = 2.97, p = .397], race/ethnicity [X2 (12) = 12.69, p = .392], or level of education [X2 (12) = 8.59, p = .737]. Conditions also did not significantly differ in terms of personal motor vehicle accident history [X2 (3) = .60, p = .897]. A summary of descriptive statistics by condition is presented in Table 1.

Descriptive test statistics, including means, standard deviations, and failure rates for the TSI-2 ATR, SIMS, M-FAST, and TOMM Trial 2, across the full sample and by experimental condition are presented in Table 2. Testing statistics per condition are also presented. Notably, over 70% of the full sample evidenced failure on the SIMS, having scored a total of 15 points or higher. Based on failure rates, a sizeable proportion of the total sample would be recommended for further evaluation of malingering based on either the SIMS or the M-FAST. Failure rates trended in the expected direction, based on instructional condition, with larger failure rates typically found in conditions 3 and 4.

Table 2 Testing statistics by full sample and condition

As part of the larger research initiative, investigation of the effect of litigation and the suggestion to malinger was conducted. Results from these analyses, presented in Tables 3 and 4, demonstrated that significant differences across conditions were found in terms of continuous outcome scores for the TSI-2 ATR and the SIMS (p = .021 and .000, respectively), but not for the M-FAST or TOMM Trial 2. Post hoc comparisons using Tukey HSD revealed significant differences between condition 4 and conditions 1 and 2 for total scores on the TSI-2 ATR and between condition 4 and condition 1 for total scores on the SIMS. When evaluating differences in terms of number of participants exceeding recommended clinical cut-offs, significant differences across groups were found for the TSI-2-ATR and TOMM trial 2 (p = .010 and .026, respectively). As such, instructional condition was not controlled in subsequent analyses by means of pooled-within group correlational analyses.

Table 3 Analysis of variance for continuous outcome scores by condition
Table 4 Number of failures by measure by condition

Hypotheses for the present study were tested by generating bivariate correlations across all four measures of malingering, in order to examine convergent and discriminant validity through a multi-trait, multi-method matrix. Correlations were produced using a pooled-within group correlation matrix, which assumes invariance across instructional conditions. Results of total score correlations are presented in Table 5. Absolute values of correlations ranged from 12 to .65.

Table 5 Bivariate, pooled-within group correlations between SVT and PVT total scores, total sample (N = 411)

Convergent validity between symptom validity measures was demonstrated through moderate, but statistically significant correlations between the SIMS, M-FAST, and TSI-2 ATR. Notably, the correlation between the TSI-2 ATR and the M-FAST (r = .51, p < .01, two-tailed), while significant, was weaker than correlations between the TSI-2 ATR and the SIMS, or the SIMS and the M-FAST, suggesting that these two measures may capture different subdomains of the malingering construct.

As predicted, correlations between the PVT (the TOMM) and the SVTs were weaker than correlations across the three SVTs (ranging from − .12 to .28, p < .01, two-tailed). It is noted that correlations between the SVTs and the TOMM were negative, as lower scores on the TOMM reflected poorer performance indicative of dishonest responding. This is reversed from the SVTs in which lower scores reflected a more “honest” performance. Therefore, negative correlations between the TOMM and SVTs indicated convergent validity, however weak.

Limited evidence was found for a method effect. The correlation between the TOMM and the M-FAST, which are both in-person administration tasks, was weak (r = − .12, p < .01, two-tailed) when compared to the SIMS and TSI-2 ATR, which were presented via computer.

Follow-up Analyses

Of the total sample, it is notable that approximately a third of the sample (N = 137) endorsed prior involvement in a motor vehicle accident, which was not the participant’s fault and in which they sustained psychological or physical injuries. Acknowledging the potential effect of prior exposure to a serious motor vehicle accident, the above bivariate, pooled-within group correlations were calculated with the subset of the original sample who endorsed prior exposure to motor vehicle accidents. Results of total score correlations for the subset are presented in Table 6. Absolute values ranged from .16 to .73. In this sample, strong, significant correlations were found between the TSI-2 ATR and the SIMS (r = .73, p < .01, two-tailed). Compared to the full sample correlations, weaker but still significant, correlations were found between the M-FAST and both the TSI-2 ATR and SIMS, suggestive further of convergent validity between the SVT measures. Correlations between the TOMM and SVTs remained weak, yet significant (ranging from − .18 to − .28, p < .05, two-tailed), consistent with the full sample.

Table 6 Bivariate, pooled-within group correlations between SVT and PVT total scores, MVA sample (N = 137)

Using published cut-points, classification rates for failures on none, one, two, three, or all four tests were also examined. Proportions of test failures per condition are presented in Fig. 1. Notably, over 50% of individuals in each condition failed at least one measure, with over 60% of individuals in the “no litigation” condition (condition 2) failing at least one task. The proportion of individuals per group that failed at least two measures ranged from 3.8 to 9.4%. Approximately 10% of the total sample failed three or more measures.

Fig. 1
figure 1

Proportion of failed tests, by condition. Failure rates were calculated using recommended cut-offs

Discussion

Reported findings expand upon previous research by further exploring the relationship between performance validity measures and domain-specific, symptom validity measures in the detection of malingering within a simulated personal injury paradigm. As noted previously, current practice in the assessment of malingered symptomatology, when PTSD is alleged, supports the use of performance validity measures in addition to traditional symptom validity measures; however, this recommendation stems largely from research conducted using civil litigants with documented physical injuries (Alwes et al., 2008; Greiffenstein et al., 1995; Ruocco et al., 2008; Egeland et al., 2015). Given the financial and legal implications associated with successful malingering of PTSD in a civil litigation case, the examination of the relationships among measures of malingering when physical injury is not sustained is necessary.

The results of the present study demonstrated expected relationships between performance validity and symptom validity measures. As predicted, measures of symptom validity were more strongly correlated with other symptom validity measures and demonstrated weaker relationships to a measure of performance validity (the TOMM). These findings are suggestive of a convergence among symptom validity indicators and a divergence between measures of symptom and performance validity, and this pattern of relationships remained true for participants who reported prior exposure to a motor vehicle accident in which injury, physical or psychological, was sustained. The divergence suggests that inclusion of performance validity indicators, when assessing PTSD, is valuable regardless of the presence of physical injury following the event. Rather, performance validity indicators appear to capture a separate, yet associated, facet of malingering. Further, weaker associations between the M-FAST and other measures (TSI-2 and TOMM) suggest that this measure, which focuses more heavily on feigned psychotic symptoms, may be less relevant in the context of PTSD, particularly when more mild symptoms are prominent.

Findings are consistent with previous research on the relationship between PVT and SVT failure as it relates to posttraumatic injury (Demakis et al., 2008; Egeland et al., 2015; Greiffenstein et al., 1995), suggesting that malingering in PTSD is a non-unitary construct composed of both performance and symptom-endorsement factors. Present findings lend support to the view that a monolithic, or dichotomous, conceptualization malingering, such as one being either “generally honest or dishonest,” inaccurately captures the nature of the construct (Egeland et al., 2015). Utilizing a psychological injury paradigm in which concurrent physical injury (e.g., head injury, TBI) was not alleged, present findings support a dissociation between performance and symptom validity indicators outside the context of physical injury. These findings support clinical recommendations to administer a variety of malingering measures when conducting an assessment, being sure to select measures that will tap into the sub-constructs of performance and symptom validity, even when physical injuries, such as a brain injury, are not sustained or alleged.

Present analysis found limited evidence for a method effect, as measures administered using similar methods (TOMM and M-FAST) did not demonstrate stronger associations. A possible explanation for this lack of effect is the degree of difference between the purported targeted constructs of each task. As noted elsewhere, the TOMM is designed as measure of effort which specifically uses a memory task. The M-FAST, alternatively, has been argued to serve as a screen of “general badness” or a generally dishonest presentation, which happens to emphasize questions of severe disturbance and psychotic spectrum symptoms. As Egeland and colleagues noted, “when the mode of reporting is the same, the type of symptom plays a role” (2015), highlighting the importance of being cognizant of the content and intended construct of a selected measure when conducting an assessment of malingering. This is particularly salient for assessment of PTSD and other post-trauma conditions, which are highly heterogeneous and often co-occur with physical and cognitive sequalae.

As noted by Boone (2011), it is not atypical for honestly responding individuals to fail a single malingering measure. It is, however, unusual for individuals to fail more than one measure, and even rarer for someone to fail more than two (Larrabee, 2003). Evidence suggests that the majority of those who malinger do not do so on every test and demonstrate considerable variability in their approaches to feigning, with some focusing more on exaggerated memory impairment or processing speed (Boone, 2009). Those who feign certain neurocognitive deficits are likely to pass other tasks. It is noted that these findings stem from evaluations of effort-tests or performance-based measures and does not include administration of screening measures such as the M-FAST and SIMS; however, findings from the present study support this assertion. In the present study, over 50% of participants, regardless of instructional condition failed at least one measure, suggesting that clinicians should administer at least two measures of malingering to lend incremental validity to conclusions, including in situations where physical injury is not alleged. Thus, caution should be used when utilizing these screening measures in isolation, without corroboration from other indicators. When generally dishonest responding is suspected, assessment validity is improved by utilizing more than one malingering measure.

A further consideration is the noted and remarkable heterogeneity that is present across those with PTSD. Unlike other diagnostic categories, PTSD and other traumatic reactions require the presence of an etiological stressor, and subsequent emotional, behavioral, physiological, and cognitive sequalae. As varied in the ways in which individuals may be exposed to trauma (e.g., sexual violence, natural disasters, workplace, or automotive accidents), so varied are the ways in which individuals can respond to trauma. Peace and Masliuk (2011) identified that the nature of the trauma plays a role in malingering assessment, noting that experience of sexual assault, even when simulated, results in more severe and extreme symptom reports. This is consistent with findings that validity scales on measures such as the MMPI-2 and SIMS may be invalidated due to extreme, yet genuine distress. As such, it appears likely that assessments of malingering should be tailored, and interpretation of results should be cognizant of the specific circumstances that reportedly brought about the condition. The present study utilized a simulated motor vehicle accident as contributing to PTSD symptoms; however, as noted above, specific symptom profiles and severity of distress often function as a result of the type of trauma experienced. Future studies could evaluate the role of performance and symptom validity in presentations following other forms of trauma.

Given the number of available malingering measures (for both symptom and performance validity), future research should continue to address the inter-relationships among these tasks in various medico-legal contexts. It is noted that the present analysis was largely exploratory in nature due to limitations in the data. As noted above, the present study was a secondary analysis using data collected from a larger research initiative investigating the role of suggestion on feigning. More sophisticated methods of data analysis (e.g., structural equation modeling and confirmatory factor analysis) would allow for a deeper look into these relationships by means of modeling latent variables and controlling for factors such as prior accident and litigation exposure. Structural or latent variable modeling methods would allow for the partitioning out of variance due to method, shared constructs, and other variables, but often require substantially larger sample sizes than is available in the present study. Due to this limitation, present analyses were restricted in scope. Further, the present study only utilized one performance validity indicator. As latent variable models are more robust when multiple indictors are used (Schreiber, Nora, Stage, Barlow, & King, 2006), future studies using multiple indicators of performance and symptom validity and larger samples could further elucidate the broader construct of malingering.

The present study recruited from an undergraduate sample in which participants were asked to simulate symptoms, which poses several limitations. The utilization of a simulation design, while common in malingering research, poses risks in that it is not possible to determine known-groups, or to have certainty in group membership (Rogers 2008b). As noted by Rogers and Bender (2013), it is recommended to utilize a known-group design where possible in order to better capture information about response style and malingering presentations. In contrast to these arguments, potential benefits of using a simulation design include that undergraduate students may be more informed about conditions such as PTSD through their coursework, and thus may serve as more sophisticated “pretend” malingerers than community samples (Gray et al., 2010). Future research thus could benefit from replication of these results using a known-group, civil litigation population.

Consistent with known-group design, the present study also did not include a control group, or a group of participants who were asked to respond to measures honestly and without any information from an instructional condition. While condition 2 was reflective of situation in which participants were not involved in on-going litigation or had suggestion to feign, individuals were still dictated which symptoms to express on outcome measures. Likewise, it is acknowledged that the present study did not evenly vary independent variables across instructional conditions, thus making interpretation of condition-related differences difficult. Despite this, expected trends were found on outcome measures, such that those in more extreme conditions did endorse more symptoms or more extreme symptoms on specific measures (i.e., TSI-2 ATR and SIMS), and greater proportions of individuals in more extreme conditions exceeded recommended cut-offs for some measures (i.e., TSI-2 ATR and TOMM T2). Efforts were also made to control for the possible effects of condition membership using pooled-within group correlations; however, future research would benefit from the inclusion of a control group and a more evenly balanced experimental paradigm, such that differences across groups may be compared with greater confidence. Further, more sophisticated statistical analyses such as latent variable modeling could be incorporated in future studies to partition out covariance due to group membership.

Qualitative feedback from the study participants suggests that motivations to faithfully approach each task in the research protocol may have also varied, and there is concern for inattentive or careless responding. In other simulated malingering research designs, researchers often incentivize subjects in order to increase ecological validity, as individuals involved in actual civil litigation cases are motivated by the opportunity for financial compensation. Financial compensation was decidedly not offered in this study, as the goal for the larger research initiative was to evaluate the impact of suggestion, absent of external motivators. It is possible that a portion of the sample may have had substantial difficulty retaining the information in the case vignettes and their expectations during the study. These variations in attention, motivation, and ability to retain provided information may have contributed to the lack of significant differences across instructional conditions. As such, it is possible that the present results are an underestimate of the true associations between SVT and PVT measures.

In conclusion, the current study supports extant clinical recommendations to administer more than one malingering measures, when conducting a comprehensive evaluation of PTSD symptomatology in personal injury case. This recommendation appears valid regardless of the presence or allegation of concurrent physical injury and regardless of the potential suggestion, from attorneys or other parties, to malinger. The present findings are consistent with a conceptualization of malingered PTSD as a multi-faceted, or non-unitary construct, which incorporates both cognitive, affective, and physical symptom domains. As such, current findings suggest the need to incorporate measures into malingering assessment that are designed to capture these facets of PTSD. Clinicians conducting assessments of PTSD in the context of psychological injury are encouraged to use two or more measures, which tap symptom and performance validity across a variety of symptom domains, in order to increase accuracy in the detection of malingering.