Posttraumatic stress disorder (PTSD) is one of the most common psychological disorders claimed in a personal injury context and may form the basis for a disability or worker’s compensation claim. PTSD is nearly unique among psychiatric disorders, in that the diagnostic criteria specify the presumed etiology: exposure to a significant trauma or stressor. Thus, the very definition of the disorder helps to establish the legal requirement, in a personal injury suit, of proximate cause (Shuman 2003).

The Diagnostic and Statistical Manual of Mental Disorders TR (DSM-IV-TR; American Psychiatric Association 2000) states that malingering should be ruled out in medicolegal contexts, and the DSM-IV-TR Differential Diagnosis section for PTSD specifically includes this possibility. PTSD may be easily faked on self-report measures. Many are face-valid scales with little or no check on overendorsement or fabrication, and, not surprisingly, most subjects are able to portray PTSD when presented with symptom checklists (Burgess and McMillen 2001; Hickling et al. 2002; Lees-Haley 1990). Guriel and Fremouw (2003) reviewed approaches to assessing exaggeration or malingering of PTSD and concluded no extant instrument could be considered fully adequate. The present article will seek to update readers with substantial developments since the Guriel and Fremouw paper, most notably that of cognitive symptom validity testing now so prevalent in neuropsychology.

Rosen (2004a, b, 2006; see also McNally 2003; Rosen and Taylor 2007) argued that the literature on PTSD may be badly compromised by the failure of researchers to assess for malingering among presenting patients, even when the entire sample is compensation seeking. This failure potentially contaminates much of what is known about the disorder. For example, one correlate of PTSD is antisocial behavior (McNally 2003), which may include deception, exploitation, and substance abuse. Antisocial behavior and drug use are sometimes viewed as a consequence of PTSD without making any serious attempt to determine if such traits were present before the alleged injury. Further, antisocial personality disorder is one of four DSM-IV-TR indicators of potential malingering.

Numerous studies report high levels of psychiatric comorbidity with PTSD (Brady 1997; Brady et al. 2000; Brunello et al. 2001; Kessler et al. 1995; Skodol et al. 1996). There are at least five explanations for this covariance with very different implications: (a) that the comorbidity is “real,” [e.g., that depression and other symptoms are frequent, co-occurring responses to trauma], (b) that apparent comorbidity is due to intentional symptom overendorsement, (c) that comorbidity reflects underlying neuroticism or negative affectivity (and thus is real, but artifactual), (d) that apparent comorbidity is the result of acquiescent response style, and (e) that apparent comorbidity is the result of a dramatizing communication style. Thus, the basic issue of comorbidity is a morass. The failure to consider response styles has resulted in several published recommendations that journal editors demand disclosure of the litigation status of study participants and that those with incentives to exaggerate be identified and analyzed separately from those without such motivations (Rosen 2004a, b, 2006; Rosen and Taylor 2007). Without such provisions, mean profiles of supposed genuine PTSD patients or claimants on the Minnesota Multiphasic Personality Inventory 2 (MMPI-2) or Personality Assessment Inventory (PAI) may be uninterpretable.

PTSD appears widely accepted in the early twenty-first century by both professional groups and the lay public, but its core features and its integrity as a discreet diagnostic syndrome have been challenged (McNally 2003, 2004; Rosen 2004a; Taylor and Asmundson 2008). Some studies find that the degree of trauma experienced has little relationship to the severity of symptoms or impairment, while indices of adjustment before the trauma are solid predictors (Bowman and Yehuda 2004). A recent meta-analysis of 77 studies found that previous psychiatric history, childhood abuse, and family psychiatric history were consistently associated with development of PTSD. Less consistent predictors included gender, race, age, education, previous trauma, and general childhood adversity (Brewin et al. 2000). Another review reported lower intelligence, neuroticism, negativistic personality traits, and dissociation surrounding the trauma as predictors of subsequent PTSD diagnosis (McNally 2003).

The focus on personality and historical predictors is a shift from early conceptions of the disorder. Even life-threatening events often do not lead to PTSD: Breslau and Kessler (2001) found 89.6% of Detroit adults reported experiencing at least one traumatic event meeting DSM-IV-TR criterion A, yet only 9.2% developed PTSD. Further, two thirds of those who initially showed symptoms showed improvement or resolution within 3 months (Rosen 2004a). Thus, persisting, chronic PTSD is the exception, not the rule. Yet, the public perception appears quite different. Noting that PTSD has seemingly become the expected response to tragedy, Summerfield (2004) asserted:

An expansive mental health industry has in effect promoted the idea that the trials of life represent noxious influences easily able to penetrate the average citizen, not just to hurt but to disable. This is to endorse a much thinner-skinned version of a person than previous generations would have recognized or respected (p. 234).

Early writers conceptualized PTSD as a disorder produced by a stressor that would cause extreme distress to virtually anyone (Bowman and Yehuda 2004; Young and Yehuda 2006). However, predictive studies have long demonstrated the role of personality traits, such as neuroticism (Miller 2003), which are long-standing and are associated with a variety of negative emotions and maladaptive coping mechanisms (McCrae and Costa 1990). Emphasizing the contribution of personality factors, Lees-Haley (1997a, b) noted that the most common personal injury claimant MMPI-2 profile is 1/3 and offered this synopsis:

Many of the profiles which appear in this sample are indicative of chronic conditions in addition to any current discomfort.…These elevations also suggest poor insight and denial, which may support the thesis that plaintiff exaggeration is better thought of in terms of pathology or rationalization than malingering, which implies conscious intent. Descriptions such as sad, bitter, cynical, miserable, pessimistic, and dysphoric along with lack of energy, concentration problems, and physical problems support the conclusion that these are genuinely disturbed people, regardless of exaggeration.

The modal plaintiff appears to be an unhappy somatisizer involved in a social context that encourages rationalization, projection of blame, and complaining (p. 753).

In less inferential terms, a model case of PTSD may be the result of a stressful event on someone already above average in neuroticism, leading to an increase of distress over an already elevated baseline. Even emotionally hardy people might suffer prototypical PTSD symptoms as the result of a severe stressor, while those scoring average in neuroticism would have an intermediate risk of PTSD development.

Many treating clinicians and writers appear overly trusting about the honesty of their PTSD patients or subjects. For example, Daly and Johnson (2002), while reporting that all their subjects were seeking compensation, minimized the possibility of malingering: “The victims [italics added] in this study appear to have been genuine, honest people… They were largely a law-abiding group who had previously shown respect for, and trust in, authority” (p. 463). Blanchard and Hickling (1997) did not collect medical records and stated that MMPI-2 findings may falsely label their patients as exaggerating. Ironically, these same two investigators (Hickling et al. 2002) earlier researched their own clinicians’ ability to detect actors trained as simulators and found troubling results. Despite having access to various self-report scales and psychophysiological data, clinicians described as “highly experienced with a trauma population” failed to identify any of the six actors. When told the nature of the study, clinicians identified three of the six actors but misidentified three patients.

While clinicians who work in treatment settings may believe malingering is rare (Blanchard and Hickling 1997), there is little check on this assumption and much disagreement. Drukteinis (2003) and Resnick (2003) asserted that outright malingering is rare, although exaggeration is common. However, Frueh et al. (2005) reported that 69–89% of veterans treated in psychiatric trauma clinic went on to seek disability—an extraordinarily high percentage compared to disability rates among World War II veterans (Goldstein et al. 1987), Vietnam era naval aviator prisoners of war (POWs; Nice et al. 1996), and holocaust survivors (Eaton et al. 1987). Freeman et al. (2008) reported that 53% of veterans presenting for treatment failed the Structured Interview for Reported Symptoms (SIRS; Rogers 1992) while another 24% obtained scores in the indeterminate range. Burkett and Whitley (1998) estimated, based on review of military records, that about 75% of Vietnam veterans who receive disability due to PTSD were never exposed to combat. Merten et al. (2006) reported 51.1% of PTSD claimants undergoing independent medical evaluations failed cognitive validity tests. Demakis et al. (2008) reported 29% of PTSD claimants failed at least one cognitive validity measure, while 49% scored above the predetermined cutoff scores on MMPI-2 validity scales (F, F(p), Fake Bad Scale (FBS)) or the negative bias scale of the Detailed Assessment of Posttraumatic Stress (DAPS; Briere 2001).

Neuropsychologists have been particularly active in the assessment of response style, which they extend to ostensible cognitive tasks. Because neuropsychological tests require maximal effort to be valid, researchers have developed tests sensitive to less than adequate effort or intentional failure. The National Academy of Neuropsychologists (NAN) recently issued a position statement declaring symptom validity testing (SVT) medically necessary for all neuropsychological examinations (Bush et al. 2005). Not infrequently, these examinations involve claimants without brain injury, including conditions such as depression, chronic pain, fibromyalgia, and PTSD. Prominent SVT tests include the Word Memory Test (Green 2005), the Computerized Assessment of Response Bias (Allen et al. 1997), the Victoria Symptom Validity Test (Slick et al. 1997), and the Test of Memory Malingering (TOMM; Tombaugh 1996). Aside from these freestanding tests, indices utilizing scores from within existing cognitive tests, such as the Wechsler Adult Intelligence Scale (WAIS-III; Mittenberg et al. 2001; Greve et al. 2003), and Wechsler Memory Scale-III (see Larrabee 2007a), and for neuropsychological test batteries (Meyers and Volbrecht 2003) have been developed.

As more tests of response style were developed, psychologists proposed more objective ways of interpreting and combining test results and other observations (Greiffenstein et al. 2004; Pankrantz 1998; Rogers 1997a). Greve and Bianchini (2004a) proposed setting cutoff scores of individual scales to achieve a specificity of 0.95 (false-positive error rate of 5%), allowing sensitivities (ability to detect suboptimal effort) to fall where they may. Often, these are below 0.30. Since most response-style tests have low to moderate sensitivities, many authors (Bush et al. 2005; Iverson and Franzen 1996; Lynch 2004; Meyers and Volbrecht 2003; Nelson et al. 2003) recommend employing multiple validity tests and procedures. Although Otto (2008) cautioned that using multiple highly correlated feigning measures increases the risk of a false-positive error, several neuropsychologists have argued that such tests have very small intercorrelations among clinical groups. More important, empirical evaluations have found better classification figures for failure on two or three validity indicators than for any of the individual tests (Larrabee 2003, 2008; Meyers and Volbrecht 2003; Vickery et al. 2004; Victor et al. 2008).

The most successful effort to synthesize clinical observations of potential malingering has been the Slick criteria proposed by Slick et al. (1999), which focuses on detection of feigned cognitive deficits. Briefly summarized, the criteria require the presence of an external incentive and some combination of implausible performance on one or more neuropsychological or self-report tests and inconsistent or implausible presentation. The scoring rules yield groups described as definite, probable, and possible malingering. Below-chance performance on a forced-choice validity test is the one sign that, given only the presence of an external incentive, is sufficient to render a judgment of definite exaggeration: no amount of impairment can account for a performance that is significantly below chance. Several response-style studies have used the Slick criteria classifications to create criterion groups, and some forensic examiners reference these classifications in their reports, potentially providing a reliable, validated, and generally accepted classification process. Larrabee et al. (2007) made suggestions for modifications to the original Slick criteria, noting that multiple studies have suggested that failure on any three validity indicators, including either psychiatric or cognitive measures, provides a highly sensitive and specific decision rule. Victor et al. (2008) recently argued for a “two-failure” rule.

Assessment of PTSD in Forensic and Compensation Contexts

There are differing opinions on the appropriate focus of PTSD assessment, but most writers emphasize the need for use of structured interviews or tests and use of collateral information. Aside from these basic guidelines, there are widely differing recommendations. Vasterling and Bailey (2005) asserted, “Attention to neuropsychological impairment is paramount in clinical settings” (p. 200). Keane et al. (1998) suggested that psychophysiological assessment had strong empirical support and surpassed the classification accuracy of the dexamethasone suppression test, while a subsequent chapter by Keane et al. 2003 was far less bullish. Other psychophysical assessment investigators, such as Pitman and Orr (2003), concluded that such measures “are limited and require careful scrutiny to avoid misuse” (p. 222).Footnote 1 Psychometric assessment of level of functioning/behavioral impairment is rarely mentioned.

While psychiatric symptoms predominate over cognitive ones in the DSM-IV-TR diagnostic criteria for PTSD, inability to recall aspects of the trauma is counted toward criterion C, and “concentration problems” can be counted toward criterion D (persistent symptoms of increased arousal). PTSD patients and claimants often report or display memory impairment (Sachinvala et al. 2000; Solomon and Mikulincer 2006). Nonetheless, few chapters on forensic PTSD evaluation address cognitive assessment. Although several authors (Beblo et al. 2005; Brewin et al. 2007; Moore 2009; Vasterling and Bailey 2005) concluded that there is substantial evidence of cognitive impairment in PTSD, Vasterling and Bailey acknowledged that extant studies have failed to assess for optimal effort or feigning. Others (Cromwell et al. 2002; Twamley et al. 2004) observed no significant deficits. Horner et al. (2007) reported that veterans diagnosed with PTSD who passed the TOMM produced lower scores on measures of attention, psychomotor speed, visuoconstruction, memory, and cognitive flexibility than did veterans without psychiatric diagnoses. However, PTSD patients differed from other psychiatric patients only on attention. Demakis et al. (2008) found virtually no correlation between the severity of PTSD symptoms and cognitive impairment on neuropsychological tests. Further, those that passed cognitive validity tests achieved a mean T-score of 48.8 on the cognitive battery, suggesting no substantial impairment.

The Word Memory Test (WMT) scoring program (Green 2005) lists scores for various diagnostic groups, including PTSD, distinguishing between those that passed or failed the validity portion of the test. The five normal groups produced scores on the WMT’s functional memory scales from 0.1 to 1.1 SDs above PTSD patients that passed the WMT. However, the normal groups also scored higher by 0.38 SDs on two of the three primary effort indicators. This suggests some patients may have not performed to the best of their ability, despite passing the WMT.Footnote 2

Some recent studies examined cognitive deficits in PTSD without assessing effort but yielded interesting findings. Burriss et al. (2007) reported that depressive symptoms mediated the cognitive deficits they observed in PTSD patients (see also Johnsen et al. 2008), although Rohling et al. (2002) and Green (2008) reported that depression does not produce lowered cognitive scores when examinees exert good effort. Johnsen et al. (2008) reported deficits on delayed recall, which was mediated by less efficient learning in the PTSD sample. However, attention and recognition memory was intact. Lastly, two recent large studies (Kremen et al. 2007; Parslow and Jorm 2007) and a review (Moore 2009) have concluded that cognitive deficits in PTSD precede the trauma and, in fact, act as risk factors for development of PTSD symptoms. Parslow and Jorm found that those experiencing PTSD symptoms showed less improvement in an immediate word recall task from the first to second testing date. However, PTSD symptoms of reexperiencing and arousal were predicted by lower word recall, digit span, coding speed, and verbal intelligence assessed 3 years before the trauma. Kremen et al. found that twins with PTSD had Armed Forces Qualification Test scores that averaged 3.87 percentile points lower than their cotwins before their military service. Genetic factors were assessed to account for 5% of the variance in PTSD status through cognitive ability, plus an additional 26% through noncognitive factors.

Many influential PTSD writers devote relatively little attention to the assessment of response style (Blanchard and Hickling 1997; Briere 2002; Keane et al. 2003; Koch et al. 2005; Shuman 2003; Simon 2003; Young et al. 2006). Others devote a chapter in a book (Koch et al. 2005), while some suggest it is a primary issue to be addressed (Druketeinis 2003; Greiffenstein et al. 2004; Guriel and Fremouw 2003; Knoll and Resnick 2006; McGuire 1999; Resnick 2003, Resnick et al. 2008; Rubenzer 2005, 2006; Young et al. 2007). Apart from agreement on the need for objective measures and, to a lesser extent, use of collateral information, there appears to be little agreement between those who trust the clinical presentation of claimants and those who view assessment of response style as crucial.

Malingering PTSD

There are indications that malingering may be common among PTSD claims. In the Aleutian Enterprise sinking, 86% of survivors reported PTSD symptoms leading to disability, far exceeding the more typical figures of 25 to 40% in similar tragedies. Postlitigation interviews with these claimants found that attorneys coached most litigants (Rosen 1995). Some veterans who claimed PTSD never experienced combat or, in some cases, were never even in the armed services (Burkett and Whitley 1998; Frueh et al. 2000; Monnier et al. 2005). Such findings suggest that forensic examiners should not assume exposure to the claimed traumatic event in the absence of reliable collateral information.

Some signs of PTSD malingering are provided by Resnick (1997, 2008), Hall and Hall (2006), and Greiffenstein et al. (2004). The latter authors used the sum of such indicators to classify subjects and it is possible to calculate an effect size, although the authors did not report one. For men, it was quite large (d = 2.9), while, for women, it was enormous (d = 7.2). There were only 15 men in the clinical comparison group so the difference between sexes may not be reliable. Such results are promising, but there is a need for further empirical evaluation of this and similar lists of improbable symptom patterns.

Self-report Inventories and Structured Interviews

Observers have long commented on the tendency of PTSD patients to produce elevated scores on MMPI validity indices (Hyer et al. 1987, 1988). At first, many viewed this as a function of the severity of the disorder and the variety of its symptoms. Over time, however, others noted that the extremely pathological test scores observed were inconsistent with the outpatient status of most PTSD patients and that the disability rates observed far exceeded that seen in previous wars or tragedies.

Most PTSD diagnostic interviews and self-report scales represent straightforward queries about symptoms and allow motivated persons to present themselves as having the requisite symptoms to meet the diagnostic criteria (Burges and McMillan 2001; Lees-Haley and Dunn 1994; Hickling et al. 2002). Many instruments have no means to detect exaggeration or unreliable responding. One structured interview, the Clinician-Administered PTSD Scale (CAPS; Blake et al. 1995), has a consistency scale to assess unreliable responding, but the only study that examined its utility found it completely ineffective at identifying exaggeration (Hickling et al. 2002). The Atypical Responding Scale on the Trauma Symptom Inventory (TSI; Briere 1995), a self-report inventory, has produced inconclusive results (discussed below).

The Minnesota Multiphasic Personality Inventory-2

The MMPI-2 (Butcher et al. 1989) has two scales, PS and PK, designed to assess PTSD symptoms. These scales, however, appear highly sensitive to general distress and are not specific to PTSD (Scheibe et al. 2001), and they are not very sensitive to PTSD as well (Senior and Douglas 2000). More useful are the MMPI-2 validity scales, although most traditional indices (L, F, F-K, K) show mixed evidence of ability to distinguishing malingerers from those with genuine PTSD (Greene 2000; Greiffenstein et al. 2007; Perconte and Goreczny 1990; Rogers et al. 2003). Recommended cutoff scores for the F scale vary dramatically: Lees-Haley (1992) recommended T-scores equal or greater than 62, while Elhai et al. (2000) stated that even a T-score of 120 should not be considered strong evidence of malingering. Rogers et al. (2003) recommended use of F(p) and Ds for assessment of malingered PTSD, noting both substantial effect sizes and stable cutoff scores across studies, and utilized cutoff scores of ≥35 (raw) for Ds and F(p) ≥ 7 in a subsequent study (Rogers et al. 2008).

The F scale has a number of items that also load on the clinical scales, and elevations among PTSD patients are thus somewhat ambiguous. The F(p) scale was developed to offer a purer measure of intentional exaggeration (Arbisi et al. 1995) and a number of studies found the F(p) scale to be the most effective MMPI-2 scale at detecting malingered PTSD (Arbisi et al. 2006; Bury and Bagby 2002; Elhai et al. 2000, 2001). A meta-analytic review by Rogers et al. (2003) came to the same conclusion. However, several authors (Greiffenstein et al. 2004; Larrabee 2005, 2007b; Nelson et al. 2006) noted that most of these studies had a fatal design flaw: they compared students asked to simulate PTSD with claimants or veterans diagnosed with PTSD, who are at high risk for malingering or exaggeration (Burkett and Whitley 1998; Frueh et al. 2005). The authors did not assess response style among patients or claimants. The effectiveness of F(p) with better-designed studies is far less impressive: Nelson et al. (2006), in a meta-analysis of response-style studies that included the FBS, found an average effect size for F(p) of only 0.35—substantially smaller than most other indices. Resnick et al. (2008) reported more positive findings in a quantitative summary of previous studies, but the basis for selection of studies was not clear; some important ones (e.g., Greiffenstein et al. 2004) were omitted, and many of those included had the design flaw described above. Another recent study (Crawford et al. 2006) found claimants actually scored lower than the clinical comparison group on F(p) and Berry and Schipper (2007) noted that F(p)’s effect size decreased from simulator to known group designs. While three other studies since the Nelson meta-analysis reported positive findings; one (Marshall and Bagby 2006) used the same faulty design as previous studies. Arbisi et al. (2006) used veteran claimants for both honest and simulating groups. Although the authors assured participants that their responses would not affect their claim status, it is doubtful the 25% possibility of a $25 reward would motivate many participants to risk a lifetime of monetary and educational benefits. Efendov et al. 2008 compared claimants and recovered PTSD patients asked to simulate current PTSD symptoms. However, the authors only assessed claimants for feigning on formal measures if the staff suspected malingering. Clinicians have not demonstrated the ability to make valid judgments of response style in the absence of formal measures (Hickling et al. 2002; Miller 2005; Rosen and Phillips 2004; but see Victor et al., 2008). In addition, authors’ decisions about malingering from testing were likely heavily influenced by F, Fb, and F(p), while FBS scores were not available (Sellbom, personal communication, January 5, 2009). The study thus does not adequately assess for feigning and confounds predictors and the criterion. Lastly, inspection of the item content of F(p) reveals that none of the items are clearly related to PTSD criteria and that endorsement of several items would seem quite undesirable in a personal injury context. As Lees-Haley et al. (1991) hypothesized, the goal of a claimant is to present oneself as an honest stable person who has suffered a terrible injury. A number of items on F(p), when endorsed in the keyed direction, would suggest otherwise. While a raw score of 7 or higher is substantial evidence of exaggeration or invalid responding (Greene 2000, 2008; Rogers et al. 2003), such scores may only occur in cases of confusion or blatant overreporting or falsification. When simulators are more sophisticated, its sensitivity is likely to be low. Consequently, a low F(p) should not be taken as strong evidence of honest responding,

Lees-Haley et al. (1991) developed the FBS to detect exaggeration in a civil context and Pearson Assessments recently added it to its extended score report. The 39-item FBS contains only four items that overlap with DSM-IV PTSD criteria and only six that are scored on the empirically derived PK scale. A number of studies found FBS to be the best or the only MMPI-2-valid response-style scale when psychological injury or disability due to a nonpsychotic disorder is claimed (Crawford et al. 2006; Greiffenstein et al. 2004; Larrabee 2003; Lees-Haley et al. 1991; Lees-Haley 1992), while others found it to be of little use. Most of the latter suffered from the design flaw discussed above for F(p). A recent meta-analysis of methodologically adequate FBS studies found an average effect size of 0.96, which was larger than for other MMPI-2 validity scales (Nelson et al. 2006). Only five effect sizes from three studies were available for PTSD, yielding an average d = 0.99 (95% CI = 0.76–1.21). Because of the limited database, the authors suggested caution in the use of FBS with PTSD patients. Even these limited data were compromised by making comparisons between all three groups in Greiffenstein et al. (2004; N. W. Nelson, personal communication, July, 2008), when the serious injury group was not screened for validity and thus did not meet inclusion requirements of the meta-analysis. Another included study (Lees-Haley 1992) contrasted pseudo-PTSD patents with other claimants who claimed personal injury but did not elevate the PK or PS scales and so did not appear to be endorsing PTSD symptoms.

Although there has been controversy surrounding the FBS and its value (Arbisi and Butcher 2004; Ben-Porath et al. 2009; Butcher et al. 2003, 2008; Greve and Bianchini 2004b; Lees-Haley and Fox 2004; Williams et al. 2009), recent studies (Ardolf et al. 2007; Bianchini et al. 2008; Demakis et al. 2008; Sellers et al. 2006; Wygant et al. 2007) provide evidence of FBS’s validity. Unlike the F family and other traditional response-style scales, FBS typically shows a substantial relationship to performance on cognitive SVTs (Larrabee 2003; Sellers et al. 2006; Wygant et al. 2007; but see Whitney et al. 2008). A strongly supportive study (Crawford et al. 2006) was published after the Nelson et al. analysis, although Rogers (2008a) was harshly critical of its methodology. Two recent reviews of MMPI-2 response-style indicators in compensation contexts supported the use of FBS relative to other MMPI-2 validity indicators (Berry and Schipper 2007; Greiffenstein et al. 2007). Lastly, a recent survey (Sharland and Gfeller 2007) of neuropsychologists, the discipline most active in creating and researching response-style measures over the past decade, found the FBS to be among the top five most frequently used validity indicators even before it was added to the Pearson scoring reports.

The FBS is not without potential problems. Widely different cutoff scores are maximally effective across studies, and two groups appear prone to elevated scores: females with a prior psychiatric history and patients with severe objectively manifested physical distress, such as drug withdrawal. Greiffenstein et al. (2007) reported that data aggregated across studies with known groups found a cutoff score of 25 or greater associated with a specificity of 0.95, whereas cutoff scores of 28 or greater had a specificity of 0.988. While the FBS has appeared more sensitive than other MMPI-2 scales in several studies, when cutoff scores are set to minimize false-positive errors, sensitivity naturally suffers as a result. Greve et al. (2006) found that FBS, at a cutoff score of greater than 27 set to achieve a specificity of 0.95, had sensitivities from 0.23 to 0.71 in groups with increasingly probable exaggeration (from suspect malingering to definite malingering). However, 8.9% and 11.1% of incentive only and no incentive subjects scored above the cutoff—about twice the number expected. While the most sensitive of the validity scales assessed, FBS also had the most false positives. Ben-Porath et al. (2009) acknowledged that litigation and legitimate injury or somatoform disorder can create significantly elevated FBS scores (raw scores 23–28), and such scores do not distinguish feigning from nonfeigning groups. However, scores above 28 were rarely found (false-positive rates from .01 to 0.03) in nonlitigating cases.

Other established MMPI-2 scales potentially useful in assessment of exaggerated PTSD are Ds, Ds-r, and Ego Strength (ES; Lees-Haley 1997a, b; Rogers et al. 2003). Meyers et al. (2002) created a validity index that combines seven response-style indicators (e.g., F, F(p), F-K, FBS, O-S, Ds-r, ES), and, although it outperformed its constituent parts in the original study, it has not exceeded other validity indices in cross-validation attempts by others (Bianchini et al. 2008; Greve et al. 2006).

Greene (2008) published clinical norms for the MMPI-2’s major response-style indicators based on Caldwell’s (2003, unpublished raw data) data set of over 50,000 protocols from various settings. Scores at the 95th percentile should have a false-positive rate of no more than 5%, whereas those at the 99th percentile should have false-positive rates of 1% (see Table 1). Three additional factors warrant attention. Rogers (1997a) reported estimated malingering rates of 7.4% and 7.8% in clinical samples, and some patient groups in the Caldwell data set probably had substantial potential for financial motivation. Thus, the percentile scores given are likely conservative estimates of false-positive rates among honest responders. This is supported by the very high T-scores required to reach the 98th or 99th percentile, particularly for F(p), since it overlaps minimally with actual psychopathology. Most notable is that the error rates for a given cutoff score on FBS are considerably higher than reported by Greiffenstein et al. (2007). This may be because the norms of Greiffenstein et al. used subjects without compensation motivation rather than unscreened clinical samples with unknown proportions of response bias.

Table 1 MMPI-2 validity scores for clinical settings adapted from Greene (2008)

Elhai et al. (2002) created the Fptsd to detect feigned or exaggerated PTSD. Although it showed incremental validity over other response-style scales in the original study and was sensitive to simulation in subsequent studies, these studies did not replicate the original finding of incremental validity (Elhai et al. 2004; Marshall and Bagby 2006; Whitney et al. 2008). Frueh et al. (2005) reported that Fptsd failed to discriminate groups differing in level of actual combat exposure despite claiming PTSD. The Fptsd’s method of construction (empirical identification of items that distinguished veterans asked to simulate PTSD with actual veterans claiming to suffer PTSD but not assessed for malingering) is problematic, and use of this score is not recommended, particularly with civilians.

Several specialized new scales and indices have also appeared. Gervais and colleagues empirically keyed items with SVT failure as the criterion. Referred to as the Response Bias Scale (RBS), there has been one published replication of the original scale (Nelson et al. 2007a) and a shortened revised scale developed through multiple regression (Gervais et al. 2007). A substantial correlation with failure on multiple SVTs has been observed (r = 0.49; Gervais et al. 2007), potentially providing corroboration of SVT results through a separate modality. A recent study found it more sensitive to memory complaints than other MMPI-2 response bias scales, yet uncorrelated with actual memory performance on the California Verbal Learning Test (Gervais et al. 2008). Henry et al. (2006) developed a 15-item scale designed to be highly sensitive and specific to somatic malingering—the exaggeration or feigning of somatic problems frequently found among personal injury claimants (Lees-Haley 1997a, b) and former Pacific theater POWs (Goldstein et al. 1987). Whitney et al. (2008) found the RBS and Henry-Heilbronner Index (HHI) the best MMPI-2 scales for predicting TOMM failure and found FBS relatively ineffective, in contrast to other studies.

Eakin et al. (2006) reported that the MMPI-2’s most established validity scales (F, Fb, F(p), Ds) were significantly elevated by PTSD symptoms in college students (Cohen’s ds = 0.71–1.40) but also showed substantial effect sizes between feigning and PTSD groups (ds = 0.84–1.04). Although the students were not screened for exaggeration or external motivation, they are a more credible group than those with obvious compensation motives. On the other hand, they probably represent the lower end of symptom severity of those that meet PTSD criteria. The authors examined optimal cutoff scores and reported diagnostic statistics for each, shown in Table 2. They are presented for expositional purposes only since the PTSD subjects were not actual patients or claimants, and specificities are much lower than desirable. Nonetheless, they are congruent with those of Lees-Haley et al. (1991) and Lees-Haley (1992).

Table 2 Diagnostic statistics for optimal cutoff scores in Eakin et al. (2006)

Given the dearth of well-designed studies and widely varying cutoff scores recommended across studies, no firm empirically based recommendation can be made for cutoff scores specifically for PTSD for most validity scales, other than to set them for high specificity as suggested by the Caldwell norms. Table 3 displays mean scores on MMPI-2 validity indices across several studies for PTSD subjects and feigners. It reports only studies that used a PTSD sample at relatively low risk for feigning (Elhai et al. 2001; Eakin et al. 2006) or used potent validity indicators (Lees-Haley 1992; Greiffenstein et al. 2004). Nonetheless, it is apparent that PTSD subjects and simulators in Elhai et al. (2001) produced much higher scores than either student subjects diagnosed with PTSD (but not seeking treatment) or claimants in the Greiffenstein et al. study. A recent study (Geraerts et al. 2006a) found that childhood sexual abuse survivors recruited from the community did not show exaggeration on two symptom validity measures, so the Elhai et al. patient data may reflect self-selection of a more symptomatic sample or one with potential secondary gain. The high score of simulators may be due the instructional set, which did not include warning of validity scales and detection strategies.

Table 3 MMPI-2 validity scores for PTSD subjects and feigners in known groups or in studies seemingly at low risk for exaggeration

The MMPI-2 RF (RF; Ben-Porath and Tellegan 2008) incorporates the MMPI-2’s recently introduced Restructured Clinical Scales while dropping most of the familiar clinical scales. It includes parallels to the traditional validity scales and the FBS, as well as a completely new index, the Infrequent Somatic Response scale (Fs). Validity scales on the MMPI-2 RF no longer suffer from significant item overlap with each other, and the original F scale has been refined so that all items are truly infrequent responses in the normative population (Y. Ben-Porath, personal communication, December 15, 2008). Reduced versions of the FBS, RBS, and HHI will all be scorable on the RF, although the official vender does not score the latter two. No effort was made to retain items for the Ds/Ds-r scale because of its high overlap with other validity scales (Ben-Porath, December 14, personal communication) and the four controversial items on the original F(p) scale, which overlapped with L, have been dropped for the new F(p)-r. Nonetheless, all revised validity scales correlate in excess of 0.90 with the originals, most in the high 0.90s, with norms derived from the MMPI-2 database (Y. Ben-Porath, personal communication, December 15, 2008). Wygant et al. (2008) presented evidence that the new scales correlate with cognitive SVTs and with response-style measures on Multifactor Health Inventory Extreme Physical Symptoms scale and the Pain Coping Inventory Symptom Magnification scale. Several researchers have presented data that the F(p)-r substantially outperforms the original F(p) scale (Pivovarova and Frederick 2009; Toomey et al. 2009; Tyner 2009). In sum, the MMPI-2 RF validity scales represent a potential advancement and a promising new scale for compensation settings (Fs), but the primary source of validity evidence is extrapolation from the original scales. While the conceptual leap is not large, at present, there is no peer-reviewed data on their comparability.

The Personality Assessment Inventory

The PAI (Morey 1991) has three major faking bad indices, the Negative Impression scale (NIM), the Malingering Index (MI; a constellation of eight unusual profile features associated with feigning), and the Rogers Discriminant Function (RDF). Morey (1996) recommended a cutoff score of T ≥ 77 for NIM and that an MI of 3 suggests concern about exaggeration, while scores of 5 or more are rarely found in legitimate clinical samples. Sellbom and Bagby (2008) exhaustively reviewed PAI validity scale studies, noting 15 publications with both simulation and known group designs. Although described as the best-researched scales among multiscale inventories, the authors deemed all indices best used as screening measures. The authors suggested that, in forensic contexts, the standard cutoff score for NIM (T ≥ 77) should give suspicion of malingering, while NIM ≥ 110 or MI ≥ 5 is very likely due to malingering but will miss most feigners due to low sensitivity. Sellbom and Bagby concluded that the three PAI exaggeration indices had excellent discrimination in simulation groups. NIM and MI also performed well in three known group studies, but the RDF was completely ineffective in the one known group study that examined it (see Table 4).

Table 4 Effect sizes for PAI validity indices in Sellbom and Bagby (2008)

Guriel and Fremouw’s (2003) review noted only two studies examining malingering PTSD on the PAI (Calhoun et al. 2000; Liljequist et al. 1998) and concluded that empirical data showed that the PAI’s response-style indicators did not equal the discrimination of corresponding MMPI-2 scales. However, in both of these studies, the patient groups were veterans unscreened for possible feigning. In Liljequist et al., this author calculated a Cohen’s d of 1.05 between the simulator and PTSD groups for NIM, indicating relatively good discrimination despite the potentially contaminated patient group. This is not true of Calhoun et al., however, and the high and anomalous false-positive rates for various indices are likely due to substantial feigning rates in the PTSD groups (see Table 5).Footnote 3 Another study (Scragg et al. 2000) was better designed and apparently overlooked by the Guriel and Fremouw review. It utilized a simulation design with nonstudents and clinical patients not in current litigation. Of 25 employees asked to feign PTSD, only 11 produced a prototypical PTSD profile on the PAI (defined as having a diagnosis of PTSD offered by the interpretive software), and six of these subjects had scores of greater than 85 on the NIM, whereas no subject in the clinical comparison group did so. RDF and MI were also effective. The three measures produced moderate sensitivities of 0.45–0.63, with very high specificities (0.94–1.00). Guriel-Tennant and Fremouw (2006) found that NIM and MI were fairly sensitive to feigning of PTSD by naïve subjects but much less so for subjects coached for only 15 min. Those with presumably honest PTSD symptoms were no more effective at feigning than those without. No actual PTSD subjects were included, and there was a minimal award offered for successful feigning. A dissertation (Eakin 2004) found that the PAI validity scales and several clinical scales effectively discriminated students with PTSD symptoms from those who were trauma-exposed but without PTSD symptoms. However, a subsequent publication by Eakin et al. (2006) found the PAI validity scales and indices strongly affected by actual PTSD, with no significant differences between actual PTSD students and feigners. In sum, the evidence for the PAI validity indicators is mixed, even in better-designed studies, and actual PTSD patients are likely to produce elevations on them. Table 6 summarizes average scores observed in presumed genuine PTSD and feigning groups in better-designed studies. The primary advantage of the PAI is that the MI (and potentially the RDF) adds a detection strategy not found on other instruments.

Table 5 Effect size and diagnostic statistics for PTSD feigning on PAI validity indicators
Table 6 Means and SDs of apparently genuine PTSD subjects and simulators on the PAI in better-designed studies

The Millon Multiaxial Clinical Inventory III (Millon et al. 1997)

Authors have long questioned the validity scales of the Millon Multiaxial Clinical Inventory III (MCMI-III) with PTSD claimants (Lees-Haley 1992). Berry and Schipper (2007) concluded, “The MCMI-III does not appear to be a viable choice for detecting feigned psychiatric symptoms” (p. 257), citing problems with the initial construction, only two published studies (Daubert and Metzler 2000; Schoenberg et al. 2003), modest sensitivity and specificity, and lack of known group studies. Sellbom and Bagby (2008) were even more emphatic, stating, “Under no circumstances should practitioners use this instrument in forensic evaluations to determine response style” (p. 205).

The Structured Interview for Reported Symptoms

Many have considered the SIRS (Rogers 1992) as the gold standard for assessing malingered psychopathology, but at the time of the Guriel and Fremouw review, there were no published studies utilizing actual PTSD patients or claimants. The only relevant study (Rogers et al. 1992) involved a number of PTSD simulators and found that, although some scales (blatant symptoms, subtle symptoms, selectivity of symptoms, severity of symptoms) effectively distinguished feigners from a mixed group of inpatients, several did not. Since this paper, there has been a dissertation (Eakin 2004) and two publications (Freeman et al. 2008; Rogers et al. 2008) that examined the SIRS with compensation-seeking samples.

Eakin (2004) improved on many previous studies by requiring that all subjects, both simulators and controls, actually experienced one or more trauma. Actual PTSD and trauma-only (non-PTSD) groups were established by clinical interviews and the CAPS. Subjects that did not report PTSD symptoms were assigned to either naïve or coached groups. Naïve test takers were warned about validity scales on the Personality Assessment Inventory and received both this test and the SIRS with no further instruction. Coached fakers received a lecture and a list of PTSD symptoms, were warned of the PAI’s validity scales, and given a chance to ask questions. All simulators were eligible for a cash bonus for successful feigning. Presumed genuine PTSD students showed somewhat higher scores on subtle symptoms and selectivity of symptoms scales than control subjects, but not on the other SIRS primary scales. Naïve feigners scored significantly higher than PTSD subjects on all eight primary SIRS scales and four supplemental ones, while coached feigners did so on all of the primary scales but improbable and absurd symptoms and symptom combinations. The average score for naive feigners was in the definite feigning range for selectivity of symptoms and severity of symptoms, while for coached feigners these scores fell in the probable feigning range. For naïve feigners, the average score for subtle symptoms fell in the probable feigning range. The average score for genuine PTSD subjects approached the probable feigning range on selectivity of symptoms, while scores for subtle symptoms and severity of symptoms were in the indeterminate range. The author did not apply the standard SIRS decision rules or report diagnostic statistics or effect sizes. This author calculated effect sizes for individual scales and reports them in Table 7, along with findings from Rogers et al. (2008; discussed below).

Table 7 Mean scores for PTSD and simulator groups and Cohen’s d for the SIRS’ eight primary scales

Freeman et al. (2008) administered the SIRS to 74 chronic PTSD patients but did not provide any external criterion with which to validate SIRS classifications. While the authors reported total SIRS scores for clearly exaggerated and apparently honest groups based on the SIRS’s primary two decision rules, they calculated the total score incorrectly, summing the eight primary scales (p. 376).

Rogers et al. (2008) carried out the most extensive study to date on the use of the SIRS with personal injury and disability claimants using a bootstrapping design. Most of the 569 subjects carried multiple diagnoses, but only 7.3% carried a diagnosis of PTSD. Subjects were classified by MMPI-2 validity scales (F(p) and Ds) or cognitive symptom validity test measures (Victoria Symptom Validity Scale, Test of Memory Malingering, Letter Memory Test) into genuine, indeterminant, and feigning groups. Presumed genuine groups (psychiatric or cognitively intact) had no indications, even borderline ones, of possible exaggeration in each respective domain. The eight primary SIRS scales correlated highly with group assignment for feigning mental disorder and performed surprisingly well at predicting group membership regarding cognitive effort. Very large effect sizes (M Cohen’s d = 1.94) were observed between subjects judged to be feigning mental disorder and those who were classified as valid responders on both psychiatric and cognitive SVT measures. Table 7 shows effect sizes observed for each scale for the two feigning groups relative to the claimants who passed all psychiatric and cognitive validity indicators.

The authors performed analyses to show that different diagnoses (depression, anxiety, PTSD) showed minimal differences on the SIRS scales. However, none of the feigning mental illness claimants were diagnosed with PTSD, possibly because clinicians were not blind to the results of the SIRS or MMPI-2. However, test scores alone assigned subjects to criterion groups.

Because the authors deemed bootstrapping studies as inappropriate for setting cutoff scores, they did not report diagnostic statistics. Despite the high Cohen’s ds reported, the average scores for those judged to be definitely feigning do not quite result in a judgment of feigning on the SIRS’s primary decision rules: no mean scores are in the definite feigning range and scores for only two (subtle symptoms and severity of symptoms) fall in the probable feigning range (although the score for selectivity of symptoms very nearly does). Together, these observations suggest the sensitivity of the SIRS’s first two decisions rules is 0.50 or less (since the mean of third score is slightly below the cutoff for probable feigning).Footnote 4 However, the SIRS has a tertiary rule, to be invoked when the first two conditions are not met (Rogers et al. 1992): select items are summed and, if the total exceeds 76, feigning is suggested (this score is referred to as “total” in Table 7). In the Rogers et al. (2008) data, the average total SIRS score of 99.41 (SD = 23.81) well exceeded the recommended cutoff score, and based on normal curve percentages suggests a sensitivity for this rule of about 0.84 and a false-positive rate of about 9.7%. Use of the other two SIRS rules would likely result in lower false positives at the cost of considerably less sensitivity.

Two methodological features may have bolstered the SIRS’s d values over those for other instruments. First, Eakin (2004) warned subjects about validity scales on the PAI but not the SIRS. Secondly, the use by Rogers et al. (2008) of an indeterminate group allowed the creation of purer criterion groups than in other studies, and this necessarily excluded a substantial portion of the sample. The result is higher d values than would be observed if the sample were cut in two, as in most studies. This is not a research flaw (Rogers, 1997b, 2008a), but some correction for the portion of the sample excluded is probably needed to ensure comparability across studies.

Screening Measures

The author’s position is that screening measures are inappropriate when the differentiation of legitimate from feigned presentations is a primary goal, although they may be very helpful in other settings.Footnote 5 Since Smith (2008) recently provided comprehensive reviews of two of the best contenders, the Structured Inventory of Malingered Symptoms (SIMS; Widows and Smith 2005) and Miller Forensic Assessment of Symptoms Test (M-FAST, Miller 2001), these two instruments are covered but briefly. The SIMS is a brief (75-item) test that assesses exaggerated endorsement of symptoms in a variety of areas, including depression, psychosis, and mental deficiency. There are several studies that support its validity and substantial discriminating power (Cohen’s d = 1.13–3.52; Smith 2008) and one demonstrating resistance to coaching (Jelicic et al. 2007), but none with PTSD patients. There is also concern that published cutoff scores produce excessive false-positive results among clinical control subjects: 60% in Poythress et al. (2001), 39% in Lewis et al. (2002), and 33–40% in a thesis cited by Smith (2008).

The M-FAST, a brief structured interview used for screening purposes, has shown good discrimination in several populations (Smith 2008), including PTSD. Guriel et al. (2004) found that the standard M-FAST cutoff score of ≥6 accurately detected 68% of student simulators, that coaching simulators about PTSD symptoms actually increased the detection rate (to 87%), and that coaching about escaping detection did not lower sensitivity (0.69). Used together, the M-FAST and TSI accurately detected 90% of simulators. Guriel-Tennant and Fremouw (2006) found that trauma history did not aid PTSD simulation, and in contrast to the earlier study, coaching decreased detection rates from 0.84 to 0.52. In the only study that examined discrimination of feigners from valid PTSD patients (noncompensation context, passed SIRS and MMPI-2 validity indicators), the M-FAST’s standard cutoff score of six generated a sensitivity of 0.63 and specificity of 0.85 separating student simulators (Guy et al. 2006).

The M-FAST is a well-researched and useful instrument, but because it is marketed as a screening test and because the SIRS is longer better-established and the M-FAST has substantial false-negative errors relative to the SIRS (Miller 2001), it would be hard to justify its use over the SIRS, even to rule out feigning in an otherwise highly credible disability case.

Specialized Self-report Measures

The TSI (Briere 1995) assesses various symptoms of PTSD and includes three validity scales. The most important of these is the Atypical Responding scale (ATR), which contains items similar to those found on the MMPI-2 scale 8 and F or F(p). Briere (1995) recommended a cutoff score of T > 65 for suspected exaggeration, with scores above 90 deemed invalid. While seven studies (Edens et al. 1998; Efendov et al. 2008; Elhai et al. 2005; Guriel et al. 2004; Guriel-Tennant and Fremouw 2006; Rosen et al. 2006; Porter et al. 2007) have examined the TSI’s ability to detect feigning of PTSD, there are no true known group studies and all existing studies are seriously limited. Where high effect sizes occurred, they are due to contrasting students responding honestly with simulators (see “Appendix”). When studies compared simulators with patient or presumed honest claimant groups, effect sizes were quite small. The ATR may be modestly effective, but the available research provides scant scientific evidence and little guidance in establishing an appropriate cutoff score or estimating diagnostic statistics for forensic application. The TSI has a sibling in the DAPS (Briere 2001), but its validity scales have so far only been indirectly examined (see Demakis et al. 2008).

Memory and concentration complaints are common among many psychiatric and neuropsychiatric populations, including those claiming PTSD (Sachinvala et al. 2000; Solomon and Mikulincer 2006; Resnick et al. 2008). The Memory Complaints Inventory (Green 2004) is a brief inventory of self-reported memory problems. Domains assessed include self-assessments of autobiographical memory, verbal memory, numerical memory, visuospatial memory, pain interfering with memory, memory interfering with work, and amnesia for antisocial behavior or complex actions. Norms are available for various psychiatric groups, broken down by whether or not the subjects passed the Word Memory Test. Green has argued persuasively for separating scores of subjects who are responding honestly and those that independent evidence shows are not. Unfortunately, the program does not report MCI scores for PTSD patients or claimants. High scores on the MCI may occur due to either exaggeration or depression, are not related to performance on neuropsychological tests, but are predictive of SVT failure (Green 2008). As depression often coexists with PTSD and can lead to elevated MCI scores, claimants’ scores should be compared with those of depressed patients in the MCI software database.

Cognitive Symptom Validity Tests

Symptom validity tests used in neuropsychology rely on a variety of strategies to detect poor effort or intentional failure. One might ask if the MMPI-2 has effective validity scales, why does one need other tests? The answer is, of course, that “effective” is a matter of degree. While the MMPI-2’s validity scales rack up impressive numbers in general psychiatric samples when simulators fake psychosis, discriminating the sort of distress typically seen in a disability or law suit is more challenging. Nearly all studies report a Cohen’s d of 0.8–1.2 for this task: a substantial effect size but one that separates the groups by only a single SD. For the SIRS, effects sizes may be larger, but in order to keep false-positive errors low, cutoff scores are set high at the expense of sensitivity, so the SIRS will also miss a substantial portion of feigners. In contrast, Green (2005) reported sensitivity of 0.97 or better in simulator groups and nearly perfect specificity with nondemented populations. Such figures suggest a Cohen’s d in the area of 4.0. Comparing PTSD subjects in the WMT scoring program database that passed or failed the WMT yields Cohen’s ds of 1.60–3.94 for the three major effort indices. Although these figures are contaminated with the criterion, they also would tend to underestimate performance of the test as a whole, which relies jointly on the three measures. Empirically, Larrabee (2003) found that the combination of SVT failure and scoring above the cutoff on FBS was the most potent combination of response-style indicators examined. Failure on a performance-based validity test can corroborate feigning in a modality distinct from self-report, in a mode that probably requires intentional poor performance (consciously not attending to the task, purposely answering incorrectly), and weighs against interpreting elevations on self-report validity scales as benign overreporting. Failure on both types of tests is considerably stronger evidence of feigning than evidence based on self-report alone, and performance on cognitive SVTs can shed light on moderately elevated self-report validity indices.

Below-chance performance has traditionally been the gold standard for attribution of malingering but typically results in identification of less than 10% or less of feigners (Greve et al. 2009). The TOMM allows scoring for both below-chance performance and norm-referenced criteria. TOMM performance is not significantly affected by anxiety (Ashendorf et al. 2004; O’Bryant et al. 2007), depression (Ashendorf et al. 2004; Iverson et al. 2007; Rees et al. 2001; Rohling et al. 2002; Yanez et al. 2006), psychosis (Duncan 2005), and pain (Etherton et al. 2005; Iverson et al. 2007), but it is affected by dementia (Teichner and Wagner 2004) and perhaps mild retardation (Hurley and Deal 2006; but see Simon 2007). There is no means to distinguish feigning from true cognitive impairment from TOMM test data alone.

The WMT (Green 2005) includes several memory tasks and three primary embedded validity indicators. It has been the subject of fewer formal investigations in psychiatric samples than the TOMM but has been extensively researched by a number of independent researchers (Bauer et al. 2007; Brockhaus and Merten 2004; Gervais et al. 2004; Greve et al. 2008; Morel 2008; Sullivan et al. 2007). The author has presented considerable data that the effort indicators are relatively unaffected by most neurological or psychological conditions short of dementia, and even very impaired groups, such as mentally retarded adults and fetal alcohol syndrome children, are usually able to pass them (Green 2005). The author recommends subjects have at least a third-grade reading level. Even highly sophisticated simulators such as psychologists and neurologists were unable to produce impaired but credible profiles nor were patients who previously passed the WMT and given an incentive for failing believably during a second administration (Green 2005). Because the WMT includes multiple actual measures of memory performance and these have both a theoretical and empirical hierarchy of performance, WMT profile information provides an internal check of validity, a feature lauded by Rogers (2008b). Lastly and what truly distinguishes the WMT is its sensitivity. While many SVTs have sensitivities between 0.30 and 0.60, the WMT achieves nearly 100% sensitivity among simulator groups (Green 2005) and has generated failure rates nearly three times as high as the TOMM in populations of suspected malingerers in some disability contexts (Demakis et al. 2008; Gervais et al. 2004; see also Bauer et al. 2007 and Green 2007). Importantly, these failures are corroborated by failure on other effort tests, external data, or retesting with incentives to improve performance (Green 2007). Other established cognitive SVTs have sensitivities of only from 0.29 (TOMM) to 0.62 (Medical Symptom Validity Test; Green 2004) relative to the WMT (Green 2007). Hartman (2002) proposed eight criteria for SVTs used with head injury patients and asserted that, among current SVTs, only the WMT met all of them. However, some of these (e.g., has been evaluated, known error rate) are not currently met in the published literature for PTSD examinees. There is less published research on the WMT with psychiatric groups than the TOMM, although several groups are included among the normative groups in the software program. In a study of schizophrenics, 72% failed the WMT (Gorissen et al. 2005). However, these subjects are not necessarily false positives, as schizophrenics are notorious for their lack of motivation, and in the Gorissen study, WMT failure correlated with independent assessments of negative symptoms.

Greve et al. (2008) reported that their study was the first to test the WMT in a known group design, although the design better fits Rogers’ (1997b, 2008c) bootstrapping category. The authors reported that, although the WMT produced a higher sensitivity than the TOMM, it also produced a large false-positive rate (24%) among subjects that included moderate head injuries. Contrary to previous claims, the two instruments showed comparable discriminating power. However, P. Green (personal communication, March 24, 2009) criticized this study for using validity measures that are inferior to the WMT in creating the criterion groups, failure to rule out external incentives, equating effort testing with malingering, and for not using the WMT’s profile validity checks.

Goldberg et al. (2007) surveyed the existing literature for the effects of psychiatric illness on all SVTs. They concluded that there is no evidence that depression, regardless of type or severity, affected any of 12 different validity tests and indicators. They noted that the effects of some conditions, such as bipolar illness, personality disorder, and PTSD, have not been investigated and that data regarding anxiety and somatoform conditions are limited. Despite the lack of direct examination, there is no reason to believe that PTSD, without significant neurological involvement (e.g., moderate head injury or worse), should present difficulty passing the TOMM or WMT if adequate effort is applied since (a) even groups with moderately severe cognitive impairment are able to do so and, (b) when cognitive deficits are observed in PTSD, they tend to be small and do not extend to recognition memory.

While the TOMM is well researched, insensitive to mood, pain, and psychiatric disorder, and widely accepted among ABBP forensic diplomates (Lally 2003), a recent survey suggests a rapid rise in the WMT’s use and acceptance among neuropsychologists (Sharland and Gfeller 2007). The WMT’s author has also developed both a simpler version called the Medical Symptom Validity Test (MSVT; Green 2004) and a nonverbal SVT (Nonverbal Medical Symptom Validity Test; Green 2007), both of which share the WMT’s positive features. Neither appears to match the WMT’s exquisite sensitivity although they do exceed the TOMM (Green 2006, 2007). The MSVT has several publications to its credit (Blaskewitz et al. 2008; Carone 2008; Howe and Loring 2008; Stevens et al. 2008) and offers an additional benefit over potential competitors: its subtests closely parallel the WMT’s but are considerably easier. Performance levels on the MSVT that are lower than comparable scales on the WMT would be highly suspicious. The MSVT lacks the sensitivity and extensive research base of the WMT and because of its relative simplicity, might be more transparent to brighter subjects. While the MSVT does not currently rival the WMT for choice of a single cognitive SVT with adults of near-average intelligence, it would be a good a good choice for a second one or for cognitively impaired subjects.

Because memory complaints or deficits are not core PTSD symptoms, some claimants may not exaggerate or intentionally fail an SVT based on memory. Thus, passing a cognitive SVT, in the absence of memory complaints, is ambiguous. However, failure is not, and if a subject presents with feigned memory problems, the WMT is quite likely to detect it.

If the examiner employs intelligence testing or neuropsychological batteries, many potential indices of effort are available. A full discussion of these techniques is beyond the scope of this paper, and Larrabee (2007a) and Boone (2007a) provide recent and comprehensive treatment of these topics. Over the past decade, levels of performance on many neuropsychological tests associated with failed SVTs have been documented and can be used in assessing the credibility of the test profile. The Meyers Neuropsychological Battery has ten built-in validity indicators. Other multifaceted tests, like the WAIS and WMS, have had such indices developed by researchers outside of the publishing company. However, these have often been developed and validated on rather specific groups (i.e., mild head injury claimants), using discriminant analysis or other methods that may result in overfitting or misfitting for different conditions, and none have been specifically validated for PTSD. Several simple indices are likely to be more robust across conditions and have been well validated with multiple conditions and have shown resistance to depressed mood or potentially distracting stimuli such as pain but not mental retardation or dementia (Babikian and Boone 2007). These include the age-corrected Digit Span scale score, the difference of the Vocabulary and Digit Span age-corrected scale scores (Vss − DSss), and Reliable Digit Span (the number of digits repeated forward and backward, scored correct for both trials, summed). Table 8 provides diagnostic statistics associated with the various indices and cutoff scores. The authors noted that the various digit span indicators correlate only modestly with other validity indicators, and they are not appropriate for the developmentally disabled or examinees with dementia.

Table 8 Digit-span-based validity indicators, cutoff scores, and diagnostic statistics from Babikian and Boone’s review

Other Tests of Potential Use

Morel (1998) developed the Morel Emotional Numbing Test (MENT; see also Morel and Shepherd 2008a) to distinguish real from feigned PTSD and focuses on a core PTSD symptom. Norms are available for legitimate PTSD claimants, other psychiatric groups, and for patients identified as probably exaggerating. No subject in the legitimate patient or claimant groups failed the MENT, as opposed to 80% of the noncredible group. A recent study (Morel 2007) found adequate sensitivity (0.64) and excellent specificity (1.00) among claimants that failed the WMT. The MENT was handicapped in this analysis, as only a small portion of this sample was diagnosed or claimed PTSD. Readers should be aware that the MENT used by European researchers (Geraerts et al. 2006a, b) uses color pictures and different faces than the original and is not comparable. Messer and Fremouw (2007) created the MENT-R (revised) when the authors had trouble locating the original instrument, and the MENT-R does not supplant the original. It differs from the original MENT in a number of ways, including number of pictures and number of emotions modeled. Because other researchers used their own variations of the MENT, these publications do not provide direct evidence for the MENT or its published cutoff scores, although they do support the MENT’s general rationale. A meta-analysis has been published (Morel and Shepherd 2008b), albeit as a letter to the editor, and the analysis included studies that used the variants of the MENT. Morel and Marshman (2008) argued that the MENT meets all of the criteria articulated by Hartman (2002) for appraising SVTs, including the Daubert criteria, although this conclusion may be premature.

True replication by independent researchers is needed but forthcoming: Merten et al. (2009, manuscript submitted for publication) reported that the MENT correlated 0.66 with the WMT among a group of consecutive PTSD claimants and that 41% of this group failed the MENT, compared to 51% failure rate on the WMT. Merten et al. (2009, manuscript submitted for publication) reported that informing simulating subjects about PTSD symptoms actually increased the MENT’s sensitivity from 0.70 to 0.95, best of all the measures examined, while warning subjects about the presence of validity scales reduced sensitivity to 0.65–0.75, which was slightly lower than figures for the WMT. Diederick and Merten (2009, manuscript submitted for publication) reported that the MENT demonstrated a sensitivity of 0.41 for analog patients simulating either PTSD or depression. Although they also reported a specificity of 1.00, the control subjects were not actual patients or claimants.

The MENT has unique strengths that warrant possible inclusion in a forensic PTSD assessment: it has demonstrated excellent specificity and good sensitivity even among PTSD plaintiffs, whereas many validity studies utilize PTSD patients as the control group. It is a welcome diversification of modalities beyond memory testing and self-report, and it addresses one of the core identified features of PTSD—emotional numbing. A possible limitation of the test is that some subjects may not find the MENT’s presented rationale convincing: Messer and Fremouw reported that subjects assigned a mean believability rating of 2.9 (SD = 1.06) on a 0–4 scale, where 4 was “very believable,” to their version of the MENT. Coached malingerers assigned a lower rating (2.5, SD = 0.9). The study by Merten et al. also suggests that the MENT may be vulnerable to general coaching. Thus, as with other measures discussed, passing the MENT does not rule out exaggeration.

Discussion

PTSD presents challenges to those who would assess claimants and disability applicants. Previous clinical lore has held that extreme elevations on MMPI validity scales are normal for such groups, confounding efforts to assess malingering or exaggeration. The present analysis of validity scores in known group and nonlitigation samples finds consistent, moderate elevations on most validity scales across different measures, with these being higher in treatment-seeking than student samples and higher still in compensation-seeking groups.

State-of-the-art assessment of feigning requires multiple modalities. None of the available methods has a truly satisfactory database for assessment of PTSD, and judgments must rely in part on studies that utilized other patient groups. The SIRS shows good validity of most of its primary scales and covers a broad, but not comprehensive, range of detection strategies. It also allows formal assessment of observed behaviors. Firm decision rules specific to PTSD are not available, but the standard criteria appear appropriate and conservative. The MMPI-2 supplies a number of potentially useful scales, including F, Fb, F(p), FBS, Ds, ES, and the newer Response Bias Scale of Gervais and colleagues, some of which have no direct parallel on the SIRS. Decision rules for the F family of scales (other than Fp) have varied widely across studies, with authors of known group or strong simulation designs recommending much lower scores than others. The PAI has mixed empirical support for the assessment of malingered PTSD, hamstrung by strong coaching manipulations in several studies and the anomalous findings of Eakin et al. (2006). While the non-PTSD literature supports the validity of the PAI’s fake bad indicators, the data for setting cutoffs for PTSD feigning are sparse. No studies have examined the MCMI-III response-style indicators with trauma patients or claimants, and two recent negative reviews weigh strongly against their use in response-style assessment.

Few publications have addressed the use of cognitive symptom validity assessment outside of neuropsychology, which provides a powerful and complementary assessment of response style. Rosen and Powel (2003) was the first published use of a cognitive SVT with a PTSD claimant, while two large recent studies have done so in compensation-seeking samples (Demakis et al. 2008; Rogers et al. 2008). Although most SVTs are focused toward cognitive symptoms, the MENT is not. Because general memory impairment is not a core feature of PTSD, the meaning of a passed cognitive SVT is somewhat ambiguous, even when the SVT has high sensitivity among those simulating cognitive impairment. Some test takers who simulate in other ways may not do so on such a task.

Should SVTs be given when memory or cognitive complaints are not reported during an unstructured interview or part of the claim? Rosen and Powel (2003) and Otto (2008) opined that they should not. Otto asserted it would be inappropriate to administer such measures if the examinee is not claiming cognitive impairment because passing the SVT would not be evidence of honest responding. The present author’s view is that the report of memory complaints is heavily dependent on the methods used to inquire about them. An open-ended question about psychiatric or mental symptoms will yield fewer positive responses than will a similar inquiry about memory or concentration problems. A structured inventory that asks about dozens of perceived memory problems will likely produce still more, while a performance test will likely identify some examinees who did not report memory problems on self-report measures. Given the ubiquity of memory complaints in clinical patients (and even the nonclinical public) and the fact that concentration and memory problems are cited in DSM-IV as symptoms of PTSD, I recommend that the Memory Complaints Inventory be given routinely. Whether or not memory problems are reported, in the absence of psychosis, serious cognitive impairment, or internal or external distraction, SVT failure (assuming the test has high specificity) strongly suggests that the test taker did not perform to the best of his or her ability.

Some may consider the use of multiple validity measures, particularly in the cognitive domain, as a fishing expedition. Others may argue that, even if someone fails a cognitive SVT, this has limited implications for a diagnosis of PTSD, which entails primarily psychiatric symptoms. However, DSM-IV-TR states, “Malingering should be ruled out in those situations in which financial remuneration, benefit eligibility, and forensic determination play a role” (p. 427), and defines malingering as the intentional production of false or grossly exaggerated symptoms. There is no prescription that feigned symptoms be limited to one distinct disorder, which would produce absurd outcomes. While performance on cognitive and psychopathology validity tests may constitute separate factors in disability contexts, the factors may show substantial correlations (Nelson et al. 2007b). Further, many researchers have found substantial correlations between self-report scales and SVT performance, including FBS (Larrabee 2003; Nelson et al. 2007a, b; Sellers et al., 2006; Vagnini 2003; Wygant et al. 2007), RBS (Gervais et al. 2007; Nelson et al. 2007a, b; Whitney et al. 2008), F and F(p) (Dearth et al. 2005; Whitney et al. 2008), Ds2 (Dearth et al.), and SIRS scales (Rogers et al. 2008).

Psychologists are obligated to provide examinees with basic information about the evaluation, including the purpose of the evaluation, who is requesting it, the confidentiality or lack thereof, and a general description of the procedures to be used (American Psychological Association 2002). Should examinees be warned that the validity of their responses and presentation will be assessed? Youngjohn et al. (1999) argued that such warnings do not deter benign exaggeration and make feigners more savvy, and this position was supported by Suhr and Gunstad (2007), even though their warning was nonspecific and the validity test was embedded in a neuropsychological battery. Boone (2007a) argued that medical professionals do not offer such warnings and Wetter and Corrigan (1995) asserted that such warnings violate test security. Warning subjects about MMPI-2 validity scales also lowers their effectiveness (Rogers et al. 1993). Slick et al. (2004) found neuropsychologists widely split on the issue, with 54% never warning examinees about the presence of effort or feigning tests, while 37.5% reported they always do. Considering these various factors, Boone (2007a) suggested that a statement in the informed consent emphasize the importance of giving one’s best effort at all times and warning that exaggeration “may make the results more problematic to interpret.”

In contrast to the forensic literature, much of the clinical literature is sympathetic to the point of advocacy, and very few researchers have attempted a rigorous assessment of external motivation or independent assessment of response style in patient or claimant groups. To the contrary, authors casually labeled compensation-seeking subjects as “genuine PTSD” or “bona fide PTSD,” even in studies that purport to investigate malingering. Gervais et al. (2001) reported that SVT failure is about as frequent in persons who have already obtained disability status as those applying for it. Given this observation, such persons should not be classified as noncompensation subjects but ideally should be analyzed separately from both compensation-seeking and non-compensation-seeking groups. Lastly, financial awards do not exhaust the potential motivations for malingering, which may include continued prescriptions for controlled substances, reduced demand for employment or other duties, or sympathy from others.

Another methodological issue in need of redress is determining the actual effort expended by subjects asked to simulate. While plaintiffs may be eligible for a lifetime of income or a multimillion-dollar award, the typical subject in a simulation study typically has less than a one in four chance at a $30 prize. Researchers need to be more creative in establishing motivational schemes that more closely approximate real life, which may include negative as well as positive consequences depending on performance (see Rogers 2008a). Regardless of the motivation scheme used, manipulation checks, such as a simple rating scale (see Fig. 1), are needed. Similarly, 15 minutes of coaching and preparation probably does not match the investment of a claimant who has a lifetime of income to gain and access to the internet, local libraries, and friends for months preceding the evaluation. Known group or high-quality bootstrapping designs circumvent these problems, and the addition of cognitive SVTs and the MENT can greatly assist in creating valid criterion groups, as can checklists such as used by Greiffenstein et al. (2004). Rogers (1997b, 2008a) made numerous thoughtful recommendations regarding research design, and few studies to date have come close to meeting them. Methodologically deficient studies will not yield valid or useful data, and I urge journal editors to reject future submissions that do not meet the majority of concerns raised by Rogers.

Fig. 1
figure 1

A brief hypothetical scale to assess motivation in subjects asked to simulate

Considering the substantial base rates of exaggeration in compensation-seeking populations and the DSM-IV-TR’s and the National Association of Neuropsychologists’ statements on assessment of malingering or response bias, I assert that thorough assessment of presentation validity is the first and most important task of every personal injury or disability examination. Franklin and Thompson (2005) argued that assuming a high base rate biases the examination and argued against the proposition I have just proposed. However, not to consider response bias, when base rates approach 50%, amounts bias in the other direction. Exaggeration and feigning is a topic examiners must routinely confront and do so competently. A respectable work or personal history or friendly demeanor is not evidence of valid responding. Nearly half of college students surveyed reported they would be willing to feign mental illness to escape criminal responsibility or to recover money in a lawsuit (Iverson 1996). Ford (2005) gave numerous examples of physicians malingering by proxy for their patients’ benefit, while Stone and Boone (2007) cited examples of saints malingering illness of various sorts. The only condition that would justify the omission of fake bad instruments and scales in a compensation setting would be an absence of symptoms and claimed disability, and this should prompt use of fake good indicators such as the MMPI-2 K scale.

Recommendations for a Response-Style Battery

Having surveyed the available literature, I conclude, like Guriel and Fremouw (2003), that there is yet no gold standard for assessing feigning of PTSD. While several instruments and indices have strong validity evidence across various psychiatric or neurological conditions, few have more than two well-designed supportive studies specific to PTSD. Nonetheless, this author recommends the SIRS, MMPI-2, Memory Complaints Inventory, the original MENT, and Word Memory Test. Because cutoff scores are now set to achieve high specificity at the cost of sensitivity and the centrality of psychiatric symptoms, use of all three self-report measures is encouraged, despite some duplication in detection strategies.

The SIRS has been studied in both simulator and known group designs and has been shown resistant to coaching. Two well-designed studies (Eakin 2004; Rogers et al. 2008) have produced positive results in assessment of personal injury claimants, although latter contained few subjects with a PTSD diagnosis. The SIRS utilizes eight different strategies of response bias detection and is the most commonly used measure of malingering in forensic evaluations (Archer et al. 2006). Although sensitivity for the two primary decision rules is limited, specificity appears strong and the SIRS scoring protocol allows for an indeterminate classification when there are indications of feigning short of the criteria. Practitioners should also score the SIRS total score as detailed in the manual for a more sensitive indicator but should be mindful that it may produce about a 10% false-positive rate.

I also recommend the MMPI-2. The F(p) scale provides a good assessment of rare symptoms, although this strategy is somewhat redundant with the SIRS scales. The FBS, RBS, and ES scales add additional detection strategies. Validity indicators that function well for feigned psychosis and in criminal populations can be insensitive in personal injury contexts. While strong elevation of any “fake bad” validity scale (>98th percentile on Caldwell’s norms) is evidence of exaggeration or feigning (in the absence of inconsistent responding, confusion, or poor reading ability), the F family of scales will likely have poor sensitivity in more sophisticated subjects and F and Fb may be elevated in severely impaired individuals. FBS performs well in personal injury contexts when methodologically adequate studies are considered, while other scales such as Ds and ES have demonstrated validity, but typically at lower levels than FBS. Some neuropsychologists are already routinely using newer scales like RBS and the HHI, which are generating new research rapidly. While arguably not ready for forensic use yet, when each scale has three methodologically sound published studies by different researchers showing consistent results (Lilienfeld et al. 2000), they will be. Given the current pace of research in this field, this may happen by this article’s publication date. Until then, they might provide supportive evidence for more established scales (e.g., supporting interpretation of exaggeration from an elevated FBS or F(p) scale). Scales such as FBS and RBS are distinct from traditional validity scales, as demonstrated by a recent factor analysis among personal injury claimants (Nelson et al. 2007b). They are more related to malingering of somatic problems and failure on cognitive SVTs than scales such as F and F(p). Lastly, the MMPI-2-RF’s new and refined validity scales will make it a strong alternative to its stable mate if independent research confirms its success at identifying response styles.

The Memory Complaints Inventory offers a way to assess reported memory and concentration symptoms in a standardized manner. Until norms are available for PTSD examinees, I recommend that examiners refer to the norms for major depression, both those that passed and failed the WMT, as well as the respective norms for anxiety disorders.

The MENT has unfortunately mutated into three separate tests. It has demonstrated insensitivity to even the most impairing psychiatric disorders, and its effective cutoff scores have remained consistent across studies. Several currently submitted papers will remove its primary limitation at the time of this writing: the absence of published replication of the original instrument by outside researchers.

The Word Memory Test should also be included in assessment of PTSD response validity. There is no reason to believe that failure on a well-validated SVT would be due to anything other than poor effort in the absence of serious neurological involvement. Even if cognitive problems exist and were previously undetected, profile analysis features of the WMT allow distinction between simulators and those with genuine impairment, and it appears resistant to faking even by bright and very well-educated subjects. While there have been no published reports of the WMT’s effectiveness with PTSD claimants, it has a broad and impressive literature with various medical and psychiatric groups that undoubtedly have greater cognitive impairment than PTSD patients. Whether or not the subject presents with memory complaints or poor performance on memory tasks, failing the WMT is strong evidence of poor effort and probable intentional failure in the absence of inability to attend to the task, retardation, or dementia. Although the WMT has very high sensitivity to those that simulate cognitive problems, not all malingering PTSD claimants will do so. While use of multiple cognitive SVTs is often recommended, adding the TOMM to the WMT in two databases of 1,315 disability claimants yielded only eight subjects that failed the TOMM but not the WMT. In contrast, 330 failed the WMT but not the TOMM (Green 2007). There are little data available whether the addition of additional SVTs can add to the accuracy of the WMT’s classifications.

While there are a number of useful instruments available, there is room for new approaches. For example, no self-report inventory or structured interview has a validity scale comprising plausible but inaccurate PTSD-like symptoms. There are many details about PTSD symptoms, such as time of onset, duration during a single symptom episode, amount of reported distress, and interference with work or play, that might distinguish actual from malingered PTSD. Checklists of features proposed by Resnick (2003) or used by Greiffenstein et al. (2004) deserve further formal study: the days of unvalidated validity indicators should have receded into pre-Daubert history. While collateral sources can provide valuable information, they are potentially subject to deception by the claimant and may have a vested interest in the claim. Although there are a very few inventories of psychiatric symptoms designed to be completed by family members (e.g., the Katz Adjustment Scale; Katz and Warren 1999), they do not contain validity scales. Lastly, psychophysiological assessment may yet have a valuable role to play. While current studies indicate that only about 60% of PTSD patients show the expected psychophysical reactions, this may well be the result of contamination of subject pools by simulating patients.

Integrating findings from multiple instruments and domains can present conceptual challenges. When a claimant either passes or fails all validity measures, there is little question of the interpretation. But what if only cognitive SVTs are failed, and self-report indices are either borderline or within normal limits? Such a pattern could well occur in a prepared or coached examinee. Failure on the TOMM or WMT is strong evidence of invalid presentation and cannot responsibly be ignored. Conversely, failure on self-report and structured interview measures without accompanying SVT failure should not cause question of the former data, given that the latter may be perceived as irrelevant. As an initial step in integrating various test data, I present some of the more promising response-style indicators across instruments in Table 9, along with suggested cutoff scores, associated false-positive rates, and a preliminary weighting system.

Table 9 Selected psychometric indicators of feigning

A sizable portion of claimants may receive coaching from their attorneys or engage in their own research prior to the examination (Lees-Haley 1997b). Suhr and Gunstad (2007) found 75% of trial attorneys spent between 25 and 60 min preparing clients for a psychological evaluation and that information about popular validity tests is increasingly available on the Internet. Forty-four percent of students in one study who were asked to simulate brain damage spent more than an hour researching their role, including accessing the Internet, reading books or articles, and talking to friends and even to physicians and psychologists. Thus, examiners need to be cognizant of possibility that one or more instruments in their arsenal may be compromised and may choose to use or have available one or two relatively unknown techniques.

The instruments in this review assess feigning as manifested by exaggeration or fabrication of symptoms. However, malingering can occur by misattribution of symptoms to a different event or by denial of preexisting psychiatric problems. Collateral sources and review of complete medical and psychiatric records will usually be necessary to address these issues. In addition, as with most forensic interviewing, examiners should begin with open-ended questions, progressing to more direct inquires and structured assessment instruments. While many feigning subjects can easily portray PTSD on rating scales, most have difficulty doing so during open-ended questioning (Burges and McMillan 2001). Interviews should be recorded on audio or audio–video media, consistent with recommendations for preserving forensic data (Committee on Ethical Guidelines for Forensic Psychologists 1991). Formal testing, however, should not be recorded to preserve test integrity (NAN 2003).

Lastly, examiners must decide whether, in the face of exaggeration, they can reliably distinguish malingering from factitious motivations or conscious from unconscious motivation. Hamilton et al. (2008) and P. Green (personal communication, May 22, 2009) have argued that somatoform patients who fail validity tests are doing so consciously and intentionally. The concept of somatization is a carryover from psychodynamic views that postulated unconscious motivations. A much more parsimonious explanation is that failure on effort tests, in the context of litigation, is a deliberate attempt to maintain a role perceived as more rewarding than that of functional and working adult. Boone (2007b) argued that, although such subjects are probably aware of their behavior (failing effort tests), they may be unaware of their motivation for doing so. In the face of such difficulties, the examiner may forgo trying to detangle consciousness or motivation and simply note the presence of exaggeration and its impact on the reliability of other examination data.

Conclusion

Assessment of exaggeration or feigning of PTSD presents substantial challenges, but psychologists have a number of tools that are well validated in multiple clinical populations, with growing support in PTSD-specific populations. While self-report and structured interview approaches will likely remain vital, the addition of collateral data, cognitive SVTs, and instruments like the MENT can improve professionals’ ability to not only detect feigning but to efficiently identify truly impaired patients and claimants.